Advanced AI models suffer a near-total collapse on classic psychology test as cognitive demands increase

sanitation@lemmy.today · 3 days ago

Advanced AI models suffer a near-total collapse on classic psychology test as cognitive demands increase

zbyte64@awful.systems · 1 day ago

If you are not interested in how it completes the task then you are not an authority on how it works.

Communist@lemmy.frozeninferno.xyz · 1 day ago

I’m academically interested, what I mean when I say I’m not interested is that I just don’t see the significance when we’re talking about if it’s capable of the task.

zbyte64@awful.systems · 1 day ago

How are you able to understand it’s capability without understanding what tools it is capable of manipulating to effect?

Communist@lemmy.frozeninferno.xyz · 1 day ago

You aren’t, and that’s exactly what I’m saying, it’s capable of doing these things with tools, therefore it’s capable of doing these things.

zbyte64@awful.systems · 24 hours ago

So why are you allergic to people talking about the quality of the tools in regards to capability?

Communist@lemmy.frozeninferno.xyz · 23 hours ago

I don’t know what you mean, I wasn’t the one who claimed they couldn’t do something they clearly can.

zbyte64@awful.systems · 23 hours ago

You are the one collapsing tool use into a binary when there are varying degrees of competency and hand holding.

Communist@lemmy.frozeninferno.xyz · 22 hours ago

I am not, you inaccurately said that the math olympiad was not bested by llm’s because they had a tool that told them if they were close but incorrect and can just try an infinite number of times. This is incorrect, they had a number of tries with python. This just isn’t a true statement. I think them besting it with use of python is equally significant and still counts as them besting it, and saying they can’t do math work is absurd.

zbyte64@awful.systems · edit-2 12 hours ago

It’s not “bested” by the LLM though, a mathematician used the LLM as a tool to disprove a conjecture. Subtract the mathematicians from the process and the LLM would not have successfully completed the task. It would be more accurate to say a mathematician with an LLM was able to best a mathematician who did not have an LLM. Which is cool, but we don’t need to pretend the LLM is not a tool but something that “understands” math like a mathematician