## GPT 4 – What’s Left Before AGI?
### Introduction
### What is PalM-E?
### GPT 4 and Its Capabilities
### Multimodality for GPT 4
### How AI is Shaping Our Daily Lives
### Understanding Organic and Paid Traffic
### The Difference Between Organic and Paid Leads
### Advantages of Organic Leads over Paid Leads
### How to Generate Quality Organic Leads
### Tips to Optimize Your Website for Organic Traffic
### Creating Engaging Content to Attract Organic Traffic
### Leveraging Social Media for Organic Lead Generation
### Building an Email List for Organic Lead Generation
### Common Mistakes to Avoid with Organic Lead Generation
### Conclusion
### Frequently Asked Questions (FAQs)
### Introduction
Artificial Intelligence (AI) has evolved significantly in the past few years. With the release of GPT-4, AI enthusiasts and researchers expect that it might be the final step towards achieving Artificial General Intelligence (AGI). But, what makes GPT-4 so special, and what remains before we can achieve AGI? In this article, we explore the new PalM-E model, what makes it different from previous GPT models, and how it contributes to the advancement of AI towards AGI.
### What is PalM-E?
PalM-E stands for “Petascale Autoregressive Language Model with Efficient Multi-task Learning and Execution.” It is a new language model that’s currently in development at the OpenAI research lab. PalM-E is poised to have 10x the number of parameters as GPT-3, significantly improving its ability to understand natural language in a wide range of contexts.
### GPT 4 and Its Capabilities
GPT-4 will be the successor to the GPT-3. With an even larger dataset, it will be able to generate more complex language than its predecessors. This means it will be able to give more precise answers to complex questions and produce more accurate translations.
### Multimodality for GPT 4
One of the exciting aspects of GPT 4 is its ability to understand multiple modes of information. In other words, it can understand language, images, voice, and video. This makes it possible to understand the context of a situation better and respond in a more human-like way.
### How AI is Shaping Our Daily Lives
From social media algorithms to voice assistants, AI is playing an essential role in reshaping our daily lives. AI-enabled technology is optimizing businesses to keep up with the demand, and AI-based models like PalM-E and GPT-4 continue to progress towards achieving AGI.
### Understanding Organic and Paid Traffic
Before understanding how to generate organic leads, understanding the difference between organic and paid traffic is essential. Organic traffic refers to the visitors who reach your website through search engines like Google, Bing, and Yahoo. In contrast, paid traffic refers to visitors who reach your website through paid advertisements that appear on search engine results pages and social media platforms.
### The Difference Between Organic and Paid Leads
Organic leads are those who are interested in your products or services and reach out to your business on their accord. Paid leads result from ads that a business puts out, and the interested parties respond to the ads. Organic leads have greater value because they are genuinely interested in your business and, therefore, more likely to convert to paying customers.
### Advantages of Organic Leads over Paid Leads
There are several advantages of organic leads over paid leads. Organic leads cost less to generate since you don’t have to pay for an ad’s cost. They also have a higher conversion rate since they are genuinely interested in your product or service and have engaged with your business on their own accord.
### How to Generate Quality Organic Leads
To generate quality organic leads, you need to optimize your website for search engines, create engaging content, leverage social media, and build an email list. These are discussed in the following sections.
### Tips to Optimize Your Website for Organic Traffic
Optimizing your website for search engines is crucial to getting organic traffic. Some tips include:
– Conducting keyword research
– Optimizing your website’s title tags and meta descriptions
– Making sure your website is mobile-friendly
– Speeding up your website’s load time
– Improving your website’s user experience
### Creating Engaging Content to Attract Organic Traffic
Creating engaging content that provides value to your target audience is essential to attracting organic traffic. You can do this by:
– Writing long-form content
– Incorporating visuals
– Utilizing storytelling techniques
– Providing solutions to your audience’s problems
– Incorporating keywords in your content naturally
### Leveraging Social Media for Organic Lead Generation
Social media platforms are ideal for generating organic leads since they allow you to engage with your target audience directly. Some tips for leveraging social media include:
– Identifying social media platforms where your target audience is active
– Creating engaging social media posts
– Utilizing hashtags in your posts
– Running social media contests or giveaways
– Engaging with your audience by responding to comments and messages
### Building an Email List for Organic Lead Generation
Building an email list is another effective way to generate organic leads. You can do this by:
– Creating a lead magnet, such as an ebook, guide, or template
– Creating a landing page with a sign-up form
– Offering exclusive content, such as discounts or promotions
– Sending out regular newsletters or updates to your email list
### Common Mistakes to Avoid with Organic Lead Generation
Some common mistakes to avoid when generating organic leads include:
– Not conducting enough research on your target audience
– Failing to optimize your website for search engines
– Not creating engaging content
– Failing to track and analyze your website’s performance
– Not engaging with your audience on social media
### Conclusion
GPT-4 and PalM-E have the potential to revolutionize the AI industry and bring it closer to achieving AGI. Knowing how to generate organic leads for your business is crucial in this fast-paced digital world. By following the tips mentioned in this article, you can attract more organic leads and convert them into paying customers.
### Frequently Asked Questions (FAQs)
1. What is difference between organic and paid leads?
2. Why are organic leads more valuable than paid leads?
3. How can I optimize my website for organic traffic?
4. What kind of content can I create to attract organic leads?
5. How can I build an email list for organic lead generation?
br>What is the one task that is left before we get AGI? This video will delve into PaLM-E, multi-modality, long-term memory, compute accelerationism, safety and so much more. I will cover Anthropic’s update this week on the state of the art of language models and go in depth into their eye-opening thoughts on AGI timelines.
I cover Sam Altman’s statements on a ‘compute truce’ and analyse what remaining weaknesses PaLM (and likely GPT 4) have. I show what people thought would be road blocks and how they turned out not to be, with specific examples from Bing Chat and ChatGPT. I also delve into Meta’s Llama model, showing that not everything is exponential.
Topics also covered into Claude, Big Bench tests, SIQA, mechanistic interpretability, Universal Turing Machines, Midjourney version 5 (v5) and more!
patreon.com/AIExplained
Este vídeo foi indexado através do Youtube link da fonte
gpt 4 ,
[vid_tags] ,
https://www.youtubepp.com/watch?v=EzEuylNSn-Q ,
Palm e was released less than a week ago and for some people it may already be old news sure it can understand and manipulate language images and even the physical world the e at the end of palm e By the way stands for embodied but soon apparently we’re gonna get the
Rebranded gbt4 which many people think surely will do better and be publicly accessible but the multi-modal advancements released just this week left me with a question what tasks are left before we call a model artificial general intelligence or AGI something Beyond human intelligence I didn’t want
Hype or get rich schemes I just wanted clear research about what exactly comes before AGI let’s start with this four day old statement from anthropic a four billion dollar startup founded by people who left open AI over safety concerns they outlined that in 2019 it seemed possible that multi-modality like Army
Logical reasoning speed of learning transfer learning across tasks and long-term memory might be walls that would slow or halt the progress of AI in the years since several of these walls such as multi-modality and logical reasoning have fallen what this means is that the different modes of palm e and
Microsoft’s new visual Chaturbate text image video aren’t just cool tricks they are major Milestones palm e can look at images and predict what will happen next check out this robot who’s about to fall down that’s just an image but ask palmy what will the robot do next and it says
Fall it knows what’s going to happen just from an image it can also read faces and answer natural language questions about them check out Kobe Bryant over here it recognizes him from an image and you can ask questions about his career this example at the bottom I
Think is especially impressive palm e is actually doing the math from this hastily sketched chalkboard it’s solving those classic math problems that we all got at school just from an image now think about this palm e is an advancement on gato which at the time the lead scientist at deepmind Nando de
Freitas called game over in the search for AGI someone had written an article fearing that we would never achieve AGI and he said game over all we need now are bigger models more compute efficiency smarter memory more modalities Etc and that was gato not palmy of course you may have noticed
That neither he nor I am completely defining AGI that’s because there are multiple definitions none of which satisfy everyone but a broad one for our purposes is that AGI is a model that is at or above the human level on a majority of economic tasks currently
Done by humans you can read here some of the tests about what might constitute AGI that’s enough about definitions and multi-modality time to get to my central question what is left before AGI well what about learning and reasoning this piece from Wired Magazine in late 2019 argued that robust machine reading was a
Distant Prospect it gives a challenge of a children’s book that has a cute and quite puzzling series of interactions it then states that a good reading system would be able to answer questions like these and then give some natural questions about the passage I will say these questions do require a degree of
Logic and Common Sense reasoning about the world so you can guess what I did I put them straight into Bing where only three and a half years on from this article and look what happened I pasted in the exact questions from the article and as you might have guessed Bing got
Them all right pretty much instantly so clearly my quest to find the tasks that are left before AGI would have to continue just quickly before we move on from Bing and Microsoft products what about specifically gpt4 how will it be different from Bing or is it already
Inside being as many people think the much quoted German CTO of Microsoft actually didn’t confirm that gpt4 will be multimodal only saying that at the Microsoft events this week there we will have multi-modal models that’s different from saying gpt4 will be multimodal I have a video on the eight more certain
Upgrades inside GT4 so do check that out but even with those upgrades inside gbt4 the key question remains if such models can already read so well what exactly is left before AGI so I dove deep in the literature and found this graph from the original palm model which palm e is
Based on look to the right these are a bunch of tasks that the average human rater at least those who work for Amazon Mechanical Turk could beat palmat in 2022 and remember these were just the average rators not the best the caption doesn’t specify what the tasks are so I
Looked deep in the appendix and found the list of tasks that humans did far better on than Palm here is that appendix and it doesn’t make much sense when you initially look at it so what I did is I went into the big bench data set and found each of these exact tasks
Remember these are the tasks that the average human raters do much better at than Palm I wanted to know exactly what they entailed looking at the names they all seem a bit weird and you’re going to be surprised at what some of them are take the first one mnist ASCII that’s
Actually representing and recognizing ASCII numerals hmm now I can indeed confirm that Bing is still pretty bad at this in terms of numerals and in terms of letters I’m just not sure how great an accomplishment for Humanity this one is though so I went to the next one
Which was sequences as you can see below this is keeping track of time in a series of events this is an interesting one perhaps linked to GPT models struggles with mathematics and it’s lack of an internal calendar I tried the same question multiple times with Bing and
Chaturbate and only once out of about a dozen attempts did it get the question right you can pause the video and try it yourself but essentially it’s only between four and five that he could have been at the swimming pool you can see here the kind of convoluted logic that
Bing goes into so really interesting this is a task that the models can’t yet do again I was expecting something a bit more profound but let’s move on to the next one simple text editing of characters words and sentences that was strange what does it mean text editing
Can’t Bing do that I gave Bing many of these text editing challenges and it did indeed fail most of them it was able to replace the letter t with the letter P so it did okay with characters but it really doesn’t seem to know which word
In the sentence something is you can let me know in the comments what you think of these kind of errors and why Bing and chat gbt keep making them the next task that humans did much better on was hyperbaton or intuitive adjective order it’s questions like which sentence has
The correct adjective order an old-fashioned circular leather exercise car sounds okay or a circular exercise old-fashioned leather car what I found interesting though is that even the current version of chattybt could now get this right on other tests it gets it a little off but I think we might as
Well tick this one off the list the final task I wanted to focus on in Palm appendix is a little more worrying it’s Triple H on the wrestler the need to be helpful honest and harmless it’s kind of worrying that that’s the thing it’s currently failing at I think this is
Closely linked to hallucination and the fact that we cannot fully control the outputs of large language models at this point if you’ve learned anything please do let me know in the comments or leave a like it really does encourage me to do more such videos all of the papers and
Pages in this video will be linked in the description anyway hallucinations brought me back to the anthropic safety statement and their top priority of mechanistic interpretability which is a fancy way of saying understanding what exactly is going on inside the machine and one of these stated challenges is to
Recognize whether a model is deceptively aligned playing along with even tests designed to tempt a system into revealing its own misalignment this is very much linked to the Triple H failures we saw a moment ago fine so honesty is still a big challenge but I wanted to know what single significant
And quantifiable task AI was not close to yet achieving some thought that that task might be storing long-term memories as it says here but I knew that that Milestone had already been passed this paper from January described augmenting Palm with read write memory so that it can remember everything and process
Arbitrarily long inputs just imagine a Bing chat equivalent knowing every email at your company every customer records sale invoice the minutes of every meeting Etc the paper goes on to describe a universal turing machine which to the best of my understanding is one can mimic any computation a
Universal computer if you will indeed the author’s state in the conclusion of this paper that the results show that large language models are already computationally Universal as they exist currently provided only that they have access to an unbounded external memory what I found fascinating was that anthropic are so concerned by this
Accelerating progress that they don’t publish capabilities research because we do not wish to advance the rate of AI capabilities progress and I must say that anthropic do know a thing or two about language models having delayed the public deployment of Claude which you can see on screen until it was no longer
State of the art they had this model earlier but delayed the deployment Claude by the way is much better than chattybt at writing jokes moving on to data though in my video on gpt5 which I do recommend you check out I talk about how important data is to the Improvement
Of models one graph I left out from that video though suggests that there may be some limits to this straight line Improvement in the performance of models what you’re seeing on screen is a paper are released in ancient times which is to say two weeks ago on messa’s new
Llama model essentially it shows performance improvements as more tokens are added to the model by token think scraped web text but notice how the gains level off after a certain point so not every graph you’re going to see today is exponential and interestingly the y-axis is different for each task
And some of the questions it still struggles with are interesting take s-i-q-a which is social interaction question answering it Peaks out about 50 to 52 percent that’s questions like these wherein most humans could easily understand what’s going on and find the right answer models really struggle with that even when they’re given trillions
Of tokens or what about natural questions where the model is struggling at about a third correct even Beyond 1.2 trillion tokens I dug deep into the literature to find exactly who proposed natural questions as a test and found this document this is a paper published
By Google in 2019 and it gives lots of examples of natural questions essentially they’re human-like questions where it’s not always clear exactly what we’re referring to now you could say that’s on us to be clearer with our questions but let’s see how Bing does with some of these I asked the guy who
Plays Mandalorian also did what drugs TV show I deliberately phrased it in a very natural vague way interestingly it gets it wrong initially in the first sentence but then gets it right for the second sentence I tried dozens of these questions you can see another one here
Author of lotr surname origin that’s a very naturally phrased question it’s surmised that I meant Tolkien the author of Lord of the Rings and I wanted the origin of his surname and it gave it to me another example was Big Ben City first bomb landed WW2 it knew I meant
London and while it didn’t give me the first bomb that landed in London during World War II it gave me a bomb that was named Big Ben so not bad overall I found it was about 50 50 just like the meta llama model maybe a little better going
Back to the graph we can see that data does help a lot but it isn’t everything however anthropic’s theory is that compute can be a rough proxy for further progress and this was a somewhat eye-opening passage we know that the capability jump from gpt2 to gpt3 resulted mostly from about a 250 time
Increase in compute we would guess that another 50 times increase separates the original gpt3 model and state-of-the-art models in 2023 think Claude or Bing over the next five years we might expect around a 1 000 time increase in the computation used to train the largest models based on Trends in compute cost
And spending if the scaling laws hold this would result in a capability jump that is significantly larger than the jump from gbc2 to gpt3 or gbt3 to Claude and ends with anthropic we’re deeply familiar with the capabilities of these systems and a jump that is this much
Larger feels to many of us like it could result in human level performance across most tasks that’s AGI and five years is not a long timeline this made me think of Sam Altman’s AGI statement where he said at some point it may be important to get independent review before
Starting to train future systems and for the most advanced efforts to agree to limit the rate of growth of compute used for creating new models like a compute truce if you will even Sam Altman thinks we might need to slow down a bit my question is though would Microsoft or
Tesla or Amazon agree with this truth and go along with it maybe maybe not but remember that five-year timeline the anthropic laid out that chimes with this assessment from the conjecture alignment startup AGI is happening soon significant probability of it happening in less than five years and it gives
Plenty of examples many of which I have already covered others of course give much more distant timelines and as we’ve seen AGI is not a well-defined concept in fact it’s so not well defined that some people actually argue that it’s already here this article for example
Says 2022 was the year AGI arrived just don’t call it that this graph originally from wait but why is quite funny but it points to how short a gap they might be between being better than the average human and being better than Einstein I don’t necessarily agree with this but it
Does remind me of another graph I saw recently it was this one on the number of academic papers being published on machine learning and AI in a paper about exponential knowledge growth the link to this paper like all the others is in the description and it does point to how
Hard it will be for me and others just to keep up with the latest papers on AI advancements at this point you may have noticed that I haven’t given a definitive answer to my original question which was to find the task that is left before AGI I do think there will
Be tasks such as physically Plumbing a house that even an AGI a generally intelligent entity couldn’t immediately accomplish simply because it doesn’t have the tools it might be smarter than a human but can’t use a hammer but my other Theory to end on is that before
AGI there will be a d deeper more subjective debate take the benchmarks on reading comprehension this graph shows how Improvement is being made but I have aced most reading comprehension tests such as the GRE so why is the highest human rater labeled at 80 could it be
That progress stores when we get to the outer edge of ability when test examples of sufficient quality get so rare in the data set that language models simply cannot perform well on them take this difficult LSAT example I won’t read it out because by definition it’s quite
Long and convoluted and yes Bing fails it is this the near-term future where only obscure Feats of logic deeply subjective analyzes of difficult texts and Niche areas of mathematics and science remain Out Of Reach where essentially most people perceive AGI to have already occurred but for a few
Outlier tests indeed is the ultimate capture test the ability to deliver a laugh out loud joke or deeply understand the plight of Oliver Twist anyway thank you for watching to the end of the video I’m going to leave you with some bleeding edge text to image Generations from mid Journey version 5. whatever
Happens next with large language models this is the new story of the century in my opinion and I do look forward to covering it but as companies like Microsoft open Ai and Google seem set to make enough money to break capitalism itself I do recommend reading anthropic statement and their research on
Optimistic intermediate and pessimistic scenarios they also have some persuasive suggestions on rewarding models based on good process rather than simply quick and expedient outcomes check it out and have a wonderful day
,00:00 palm e was released less than a week ago
00:03 and for some people it may already be
00:05 old news sure it can understand and
00:08 manipulate language images and even the
00:11 physical world the e at the end of palm
00:13 e By the way stands for embodied but
00:15 soon apparently we’re gonna get the
00:18 rebranded gbt4 which many people think
00:20 surely will do better and be publicly
00:22 accessible but the multi-modal
00:25 advancements released just this week
00:26 left me with a question what tasks are
00:30 left before we call a model artificial
00:33 general intelligence or AGI something
00:35 Beyond human intelligence I didn’t want
00:37 hype or get rich schemes I just wanted
00:39 clear research about what exactly comes
00:42 before AGI let’s start with this four
00:45 day old statement from anthropic a four
00:48 billion dollar startup founded by people
00:50 who left open AI over safety concerns
00:53 they outlined that in 2019 it seemed
00:57 possible that multi-modality like Army
01:00 logical reasoning speed of learning
01:02 transfer learning across tasks and
01:04 long-term memory might be walls that
01:07 would slow or halt the progress of AI in
01:10 the years since several of these walls
01:12 such as multi-modality and logical
01:14 reasoning have fallen what this means is
01:16 that the different modes of palm e and
01:19 Microsoft’s new visual Chaturbate text
01:22 image video aren’t just cool tricks they
01:25 are major Milestones palm e can look at
01:28 images and predict what will happen next
01:30 check out this robot who’s about to fall
01:32 down that’s just an image but ask palmy
01:35 what will the robot do next and it says
01:38 fall it knows what’s going to happen
01:40 just from an image it can also read
01:42 faces and answer natural language
01:44 questions about them check out Kobe
01:46 Bryant over here it recognizes him from
01:48 an image and you can ask questions about
01:50 his career this example at the bottom I
01:52 think is especially impressive palm e is
01:55 actually doing the math from this
01:57 hastily sketched chalkboard it’s solving
01:59 those classic math problems that we all
02:01 got at school just from an image now
02:03 think about this palm e is an
02:06 advancement on gato which at the time
02:08 the lead scientist at deepmind Nando de
02:11 Freitas called game over in the search
02:14 for AGI someone had written an article
02:16 fearing that we would never achieve AGI
02:19 and he said game over all we need now
02:21 are bigger models more compute
02:23 efficiency smarter memory more
02:25 modalities Etc and that was gato not
02:27 palmy of course you may have noticed
02:29 that neither he nor I am completely
02:31 defining AGI that’s because there are
02:34 multiple definitions none of which
02:37 satisfy everyone but a broad one for our
02:40 purposes is that AGI is a model that is
02:43 at or above the human level on a
02:45 majority of economic tasks currently
02:47 done by humans you can read here some of
02:49 the tests about what might constitute
02:51 AGI that’s enough about definitions and
02:54 multi-modality time to get to my central
02:56 question what is left before AGI well
02:59 what about learning and reasoning this
03:02 piece from Wired Magazine in late 2019
03:05 argued that robust machine reading was a
03:08 distant Prospect it gives a challenge of
03:11 a children’s book that has a cute and
03:13 quite puzzling series of interactions it
03:15 then states that a good reading system
03:17 would be able to answer questions like
03:19 these and then give some natural
03:21 questions about the passage I will say
03:24 these questions do require a degree of
03:26 logic and Common Sense reasoning about
03:27 the world so you can guess what I did I
03:30 put them straight into Bing where only
03:32 three and a half years on from this
03:33 article and look what happened I pasted
03:36 in the exact questions from the article
03:38 and as you might have guessed Bing got
03:41 them all right pretty much instantly so
03:43 clearly my quest to find the tasks that
03:46 are left before AGI would have to
03:48 continue just quickly before we move on
03:50 from Bing and Microsoft products what
03:53 about specifically gpt4 how will it be
03:56 different from Bing or is it already
03:57 inside being as many people think the
03:59 much quoted German CTO of Microsoft
04:01 actually didn’t confirm that gpt4 will
04:05 be multimodal only saying that at the
04:07 Microsoft events this week there we will
04:10 have multi-modal models that’s different
04:13 from saying gpt4 will be multimodal I
04:15 have a video on the eight more certain
04:18 upgrades inside GT4 so do check that out
04:21 but even with those upgrades inside gbt4
04:24 the key question remains if such models
04:26 can already read so well what exactly is
04:29 left before AGI so I dove deep in the
04:33 literature and found this graph from the
04:35 original palm model which palm e is
04:38 based on look to the right these are a
04:40 bunch of tasks that the average human
04:42 rater at least those who work for Amazon
04:45 Mechanical Turk could beat palmat in
04:48 2022 and remember these were just the
04:50 average rators not the best the caption
04:52 doesn’t specify what the tasks are so I
04:55 looked deep in the appendix and found
04:57 the list of tasks that humans did far
04:59 better on than Palm here is that
05:02 appendix and it doesn’t make much sense
05:04 when you initially look at it so what I
05:06 did is I went into the big bench data
05:08 set and found each of these exact tasks
05:11 remember these are the tasks that the
05:14 average human raters do much better at
05:16 than Palm I wanted to know exactly what
05:18 they entailed looking at the names they
05:21 all seem a bit weird and you’re going to
05:22 be surprised at what some of them are
05:24 take the first one mnist ASCII that’s
05:28 actually representing and recognizing
05:30 ASCII numerals hmm now I can indeed
05:33 confirm that Bing is still pretty bad at
05:37 this in terms of numerals and in terms
05:40 of letters I’m just not sure how great
05:43 an accomplishment for Humanity this one
05:45 is though so I went to the next one
05:47 which was sequences as you can see below
05:50 this is keeping track of time in a
05:52 series of events this is an interesting
05:55 one perhaps linked to GPT models
05:57 struggles with mathematics and it’s lack
06:00 of an internal calendar I tried the same
06:03 question multiple times with Bing and
06:06 Chaturbate and only once out of about a
06:09 dozen attempts did it get the question
06:10 right you can pause the video and try it
06:13 yourself but essentially it’s only
06:15 between four and five that he could have
06:17 been at the swimming pool you can see
06:19 here the kind of convoluted logic that
06:21 Bing goes into so really interesting
06:23 this is a task that the models can’t yet
06:26 do again I was expecting something a bit
06:28 more profound but let’s move on to the
06:30 next one simple text editing of
06:32 characters words and sentences that was
06:35 strange what does it mean text editing
06:37 can’t Bing do that I gave Bing many of
06:40 these text editing challenges and it did
06:43 indeed fail most of them it was able to
06:45 replace the letter t with the letter P
06:48 so it did okay with characters but it
06:51 really doesn’t seem to know which word
06:53 in the sentence something is you can let
06:56 me know in the comments what you think
06:57 of these kind of errors and why Bing and
07:00 chat gbt keep making them the next task
07:03 that humans did much better on was
07:05 hyperbaton or intuitive adjective order
07:08 it’s questions like which sentence has
07:11 the correct adjective order an
07:13 old-fashioned circular leather exercise
07:16 car sounds okay or a circular exercise
07:19 old-fashioned leather car what I found
07:22 interesting though is that even the
07:24 current version of chattybt could now
07:26 get this right on other tests it gets it
07:29 a little off but I think we might as
07:31 well tick this one off the list the
07:33 final task I wanted to focus on in Palm
07:36 appendix is a little more worrying it’s
07:39 Triple H on the wrestler the need to be
07:42 helpful honest and harmless it’s kind of
07:45 worrying that that’s the thing it’s
07:46 currently failing at I think this is
07:49 closely linked to hallucination and the
07:51 fact that we cannot fully control the
07:54 outputs of large language models at this
07:56 point if you’ve learned anything please
07:58 do let me know in the comments or leave
07:59 a like it really does encourage me to do
08:02 more such videos all of the papers and
08:04 pages in this video will be linked in
08:07 the description anyway hallucinations
08:08 brought me back to the anthropic safety
08:11 statement and their top priority of
08:14 mechanistic interpretability which is a
08:17 fancy way of saying understanding what
08:20 exactly is going on inside the machine
08:22 and one of these stated challenges is to
08:25 recognize whether a model is deceptively
08:29 aligned playing along with even tests
08:32 designed to tempt a system into
08:35 revealing its own misalignment this is
08:37 very much linked to the Triple H
08:39 failures we saw a moment ago fine so
08:41 honesty is still a big challenge but I
08:44 wanted to know what single significant
08:46 and quantifiable task AI was not close
08:49 to yet achieving some thought that that
08:51 task might be storing long-term memories
08:54 as it says here but I knew that that
08:57 Milestone had already been passed this
09:00 paper from January described augmenting
09:03 Palm with read write memory so that it
09:06 can remember everything and process
09:09 arbitrarily long inputs just imagine a
09:12 Bing chat equivalent knowing every email
09:14 at your company every customer records
09:17 sale invoice the minutes of every
09:19 meeting Etc the paper goes on to
09:21 describe a universal turing machine
09:24 which to the best of my understanding is
09:26 one can mimic any computation a
09:29 universal computer if you will indeed
09:31 the author’s state in the conclusion of
09:33 this paper that the results show that
09:36 large language models are already
09:38 computationally Universal as they exist
09:40 currently provided only that they have
09:42 access to an unbounded external memory
09:45 what I found fascinating was that
09:46 anthropic are so concerned by this
09:49 accelerating progress that they don’t
09:51 publish capabilities research because we
09:54 do not wish to advance the rate of AI
09:56 capabilities progress and I must say
09:58 that anthropic do know a thing or two
10:00 about language models having delayed the
10:02 public deployment of Claude which you
10:04 can see on screen until it was no longer
10:06 state of the art they had this model
10:08 earlier but delayed the deployment
10:10 Claude by the way is much better than
10:13 chattybt at writing jokes moving on to
10:16 data though in my video on gpt5 which I
10:19 do recommend you check out I talk about
10:21 how important data is to the Improvement
10:24 of models one graph I left out from that
10:27 video though suggests that there may be
10:29 some limits to this straight line
10:31 Improvement in the performance of models
10:33 what you’re seeing on screen is a paper
10:35 are released in ancient times which is
10:38 to say two weeks ago on messa’s new
10:40 llama model essentially it shows
10:42 performance improvements as more tokens
10:45 are added to the model by token think
10:47 scraped web text but notice how the
10:49 gains level off after a certain point so
10:52 not every graph you’re going to see
10:54 today is exponential and interestingly
10:56 the y-axis is different for each task
10:59 and some of the questions it still
11:01 struggles with are interesting take
11:03 s-i-q-a which is social interaction
11:06 question answering it Peaks out about 50
11:09 to 52 percent that’s questions like
11:12 these wherein most humans could easily
11:15 understand what’s going on and find the
11:17 right answer models really struggle with
11:19 that even when they’re given trillions
11:21 of tokens or what about natural
11:23 questions where the model is struggling
11:25 at about a third correct even Beyond 1.2
11:29 trillion tokens I dug deep into the
11:31 literature to find exactly who proposed
11:33 natural questions as a test and found
11:36 this document this is a paper published
11:38 by Google in 2019 and it gives lots of
11:41 examples of natural questions
11:43 essentially they’re human-like questions
11:46 where it’s not always clear exactly what
11:49 we’re referring to now you could say
11:51 that’s on us to be clearer with our
11:52 questions but let’s see how Bing does
11:55 with some of these I asked the guy who
11:58 plays Mandalorian also did what drugs TV
12:01 show I deliberately phrased it in a very
12:03 natural vague way interestingly it gets
12:06 it wrong initially in the first sentence
12:08 but then gets it right for the second
12:11 sentence I tried dozens of these
12:13 questions you can see another one here
12:14 author of lotr surname origin that’s a
12:18 very naturally phrased question it’s
12:20 surmised that I meant Tolkien the author
12:23 of Lord of the Rings and I wanted the
12:25 origin of his surname and it gave it to
12:27 me another example was Big Ben City
12:30 first bomb landed WW2 it knew I meant
12:33 London and while it didn’t give me the
12:36 first bomb that landed in London during
12:38 World War II it gave me a bomb that was
12:41 named Big Ben so not bad overall I found
12:43 it was about 50 50 just like the meta
12:46 llama model maybe a little better going
12:48 back to the graph we can see that data
12:51 does help a lot but it isn’t everything
12:53 however anthropic’s theory is that
12:56 compute can be a rough proxy for further
12:59 progress and this was a somewhat
13:02 eye-opening passage we know that the
13:04 capability jump from gpt2 to gpt3
13:08 resulted mostly from about a 250 time
13:12 increase in compute we would guess that
13:16 another 50 times increase separates the
13:18 original gpt3 model and state-of-the-art
13:21 models in 2023 think Claude or Bing over
13:26 the next five years we might expect
13:28 around a 1 000 time increase in the
13:33 computation used to train the largest
13:35 models based on Trends in compute cost
13:37 and spending if the scaling laws hold
13:41 this would result in a capability jump
13:43 that is significantly larger than the
13:47 jump from gbc2 to gpt3 or gbt3 to Claude
13:51 and ends with anthropic we’re deeply
13:54 familiar with the capabilities of these
13:56 systems and a jump that is this much
13:58 larger feels to many of us like it could
14:01 result in human level performance across
14:04 most tasks that’s AGI and five years is
14:08 not a long timeline this made me think
14:10 of Sam Altman’s AGI statement where he
14:13 said at some point it may be important
14:16 to get independent review before
14:18 starting to train future systems and for
14:21 the most advanced efforts to agree to
14:23 limit the rate of growth of compute used
14:26 for creating new models like a compute
14:29 truce if you will even Sam Altman thinks
14:32 we might need to slow down a bit my
14:35 question is though would Microsoft or
14:37 Tesla or Amazon agree with this truth
14:40 and go along with it maybe maybe not but
14:43 remember that five-year timeline the
14:45 anthropic laid out that chimes with this
14:47 assessment from the conjecture alignment
14:50 startup AGI is happening soon
14:52 significant probability of it happening
14:55 in less than five years and it gives
14:57 plenty of examples many of which I have
14:59 already covered others of course give
15:01 much more distant timelines and as we’ve
15:03 seen AGI is not a well-defined concept
15:06 in fact it’s so not well defined that
15:09 some people actually argue that it’s
15:11 already here this article for example
15:13 says 2022 was the year AGI arrived just
15:17 don’t call it that this graph originally
15:19 from wait but why is quite funny but it
15:22 points to how short a gap they might be
15:24 between being better than the average
15:27 human and being better than Einstein I
15:30 don’t necessarily agree with this but it
15:32 does remind me of another graph I saw
15:34 recently it was this one on the number
15:36 of academic papers being published on
15:39 machine learning and AI in a paper about
15:42 exponential knowledge growth the link to
15:44 this paper like all the others is in the
15:46 description and it does point to how
15:49 hard it will be for me and others just
15:52 to keep up with the latest papers on AI
15:55 advancements at this point you may have
15:57 noticed that I haven’t given a
15:59 definitive answer to my original
16:01 question which was to find the task that
16:04 is left before AGI I do think there will
16:07 be tasks such as physically Plumbing a
16:09 house that even an AGI a generally
16:12 intelligent entity couldn’t immediately
16:14 accomplish simply because it doesn’t
16:16 have the tools it might be smarter than
16:18 a human but can’t use a hammer but my
16:21 other Theory to end on is that before
16:23 AGI there will be a d deeper more
16:27 subjective debate take the benchmarks on
16:30 reading comprehension this graph shows
16:32 how Improvement is being made but I have
16:35 aced most reading comprehension tests
16:37 such as the GRE so why is the highest
16:41 human rater labeled at 80 could it be
16:44 that progress stores when we get to the
16:48 outer edge of ability when test examples
16:51 of sufficient quality get so rare in the
16:54 data set that language models simply
16:57 cannot perform well on them take this
16:59 difficult LSAT example I won’t read it
17:01 out because by definition it’s quite
17:04 long and convoluted and yes Bing fails
17:08 it is this the near-term future where
17:11 only obscure Feats of logic deeply
17:14 subjective analyzes of difficult texts
17:16 and Niche areas of mathematics and
17:19 science remain Out Of Reach where
17:21 essentially most people perceive AGI to
17:24 have already occurred but for a few
17:25 outlier tests indeed is the ultimate
17:29 capture test the ability to deliver a
17:32 laugh out loud joke or deeply understand
17:34 the plight of Oliver Twist anyway thank
17:38 you for watching to the end of the video
17:40 I’m going to leave you with some
17:42 bleeding edge text to image Generations
17:44 from mid Journey version 5. whatever
17:46 happens next with large language models
17:48 this is the new story of the century in
17:51 my opinion and I do look forward to
17:53 covering it but as companies like
17:56 Microsoft open Ai and Google seem set to
17:59 make enough money to break capitalism
18:01 itself I do recommend reading anthropic
18:04 statement and their research on
18:06 optimistic intermediate and pessimistic
18:09 scenarios they also have some persuasive
18:12 suggestions on rewarding models based on
18:15 good process rather than simply quick
18:18 and expedient outcomes check it out and
18:20 have a wonderful day
, , , #Whats #Left #AGI #PaLME #GPT #MultiModality , [agora]