What you're doing is excellent; interesting and useful interrogation of the 'stochastic parrot'. All I've done so far is to ensure the LLM replies with UK English spelling and does not, in its responses to me, mimic human conversation. Your exploration of questioning LLMs reminds me of Asimov's prescient 1950s sci-fi short story 'Jokester'. Be alert for an abrupt loss of sense of humour (:)). It also reminds me of a much older conundrum. When people in classical Greece travelled to question the oracle at Delphi (or its priestly doorkeepers) they didn't get straight answers - instead gnomic aphorisms were delivered.
Anthropic did pretty well. I wouldn't give it a ten though, but that's me, I'm a hard marker😉 Even Athropic is missing the pop culture humour, colloquial, and sarky attitude that Jess weaves into her work, even within her formal, grown up academic writing.😉
That's a key for peeps in the interim for separating LLMs work being passed off as original- currently AI suck big time at humour, sarcasm, and parody nuances. It just doesn't get it. It can mimic slapstick humour, even though it doesn't "understand" it. But witty repartee and banter, are currently still beyond. Eventually, as more people pour endless hours into training and refining it, it will get there. 🤔😐😉
Personally, i think the efficiency and productivity has a cost that few truly comphrehend, and eventually when they do and look back with 20/20 hindsight, I think they will realise it wasn't worth the price over the long run, for themselves or humanity.
But the genie's out now, and the winner in the new work paradigm is he/she who makes AI his/her biaach best.😉
Really? Well I guess I stand corrected 😉 I'd love to see if other punters could distinguish the difference. Personally, I find the LLMs fairly obvious to pick out. I think that will become a field in and of itself- new business service industry specialising in recognising AI vs human. And as well parallel industry, currently spawning at light years rate, personalised or trained AIs. 😁
I wish you wouldn’t say that a computer or a computer program (that’s what an LLM is) can “understand” things.
They understand the subject matter in the same way your checkbook understands money or your spreadsheet understands math, which is to say not at all.
There is no “understanding” going on. The naïve may think it looks like it, but the clue is in the name: large language MODEL.
This anthropomorphization of computer programs is part of the marketing strategy of these things: they want consumers to believe that they are infallible oracles.
And I’m no luddite: I use ChatGPT and Gemini at work daily. They have saved me lots of time. But they have also gone badly off-track. In those cases, were I not experienced and able to apply what little wisdom the Creator may have generously bestowed on me, the outcome would have been unhelpful.
I'll guess that the medical content is something we all instinctively know, but somehow forget while getting on with life. All three models captured the essence of that advice.
In my humble opinion, there were significant differences expressed by the models when describing the goals of the medical system. I can't hazard a guess as to how that came about, I'm just mentioning it as an opinion.
I'd be interested to know whether the same results would be delivered if the same questions were asked again?
Jessica’s post “The effects of turning off AI safety protocols” has a pretty interesting conversation she had with Grok. I have more trust in Grok after reading it.
What you're doing is excellent; interesting and useful interrogation of the 'stochastic parrot'. All I've done so far is to ensure the LLM replies with UK English spelling and does not, in its responses to me, mimic human conversation. Your exploration of questioning LLMs reminds me of Asimov's prescient 1950s sci-fi short story 'Jokester'. Be alert for an abrupt loss of sense of humour (:)). It also reminds me of a much older conundrum. When people in classical Greece travelled to question the oracle at Delphi (or its priestly doorkeepers) they didn't get straight answers - instead gnomic aphorisms were delivered.
Anthropic did pretty well. I wouldn't give it a ten though, but that's me, I'm a hard marker😉 Even Athropic is missing the pop culture humour, colloquial, and sarky attitude that Jess weaves into her work, even within her formal, grown up academic writing.😉
That's a key for peeps in the interim for separating LLMs work being passed off as original- currently AI suck big time at humour, sarcasm, and parody nuances. It just doesn't get it. It can mimic slapstick humour, even though it doesn't "understand" it. But witty repartee and banter, are currently still beyond. Eventually, as more people pour endless hours into training and refining it, it will get there. 🤔😐😉
Personally, i think the efficiency and productivity has a cost that few truly comphrehend, and eventually when they do and look back with 20/20 hindsight, I think they will realise it wasn't worth the price over the long run, for themselves or humanity.
But the genie's out now, and the winner in the new work paradigm is he/she who makes AI his/her biaach best.😉
Actually, there was a version that captured all of that but Jessica found it too "spooky", so I dialled it down a bit.
Really? Well I guess I stand corrected 😉 I'd love to see if other punters could distinguish the difference. Personally, I find the LLMs fairly obvious to pick out. I think that will become a field in and of itself- new business service industry specialising in recognising AI vs human. And as well parallel industry, currently spawning at light years rate, personalised or trained AIs. 😁
I wish you wouldn’t say that a computer or a computer program (that’s what an LLM is) can “understand” things.
They understand the subject matter in the same way your checkbook understands money or your spreadsheet understands math, which is to say not at all.
There is no “understanding” going on. The naïve may think it looks like it, but the clue is in the name: large language MODEL.
This anthropomorphization of computer programs is part of the marketing strategy of these things: they want consumers to believe that they are infallible oracles.
And I’m no luddite: I use ChatGPT and Gemini at work daily. They have saved me lots of time. But they have also gone badly off-track. In those cases, were I not experienced and able to apply what little wisdom the Creator may have generously bestowed on me, the outcome would have been unhelpful.
Nikola Tesla has 100 Questions for ChatGPT, Deepseek, Google & Grok #FAIL https://teslaleaks.com/f/nikola-tesla-100-questions-chatgpt-deepseek-google-grok-fail
Hi Joel - thanks for another interesting post.
I'll guess that the medical content is something we all instinctively know, but somehow forget while getting on with life. All three models captured the essence of that advice.
In my humble opinion, there were significant differences expressed by the models when describing the goals of the medical system. I can't hazard a guess as to how that came about, I'm just mentioning it as an opinion.
I'd be interested to know whether the same results would be delivered if the same questions were asked again?
I wonder how Grok would compare to the others.
So do I! I'm going to get a key. But I'm so in love with Anthropic right now, it is going to have to be very impressive.
Jessica’s post “The effects of turning off AI safety protocols” has a pretty interesting conversation she had with Grok. I have more trust in Grok after reading it.
The only LLM or related lettering I recall is LSMFT...Lucky Strike Means Fine Tobacco. I know, let's ask Gunk if it is true.