Stop telling me AI is the future [Still Vreni]

cannedtuna@lemmy.world · 7 天前

Stop telling me AI is the future [Still Vreni]

SystemDisc@feddit.org · 6 天前

Doing with it, sure, but the creation of LLMs, and the algorithms behind them, especially the training, are what I’m talking about. It’s a lot of very impressive, complicated math

I think it’s pretty pathetic that “fuck AI” has become the trendy, cool thing. It really misses the mark. It should be fuck capitalism and the sociopathic CEOs abusing AI and shoving it down our throats. AI is not the problem.

AnyOldName3@lemmy.world · 6 天前

It’s actually just a lot of pretty simple maths from decades ago, but it’s a lot of it. The big changes in those decades have been the feasibility of doing enough of that simple maths to achieve anything useful, and domain-specific network architecture stuff that’s rarely transferable, e.g. LLMs are possible because of the invention of the transformer architecture in 2017, and that’s also turned out to be useful for a few things like image generation and protein folding simulation, but not for all neural network based techniques, and then most of the things that have made successive LLMs better haven’t also been useful for the few other transformer-architecture-based neural networks. Most not-LLM AI isn’t going to be meaningfully easier to create than it would have been had the world got bored after GPT-2 and we’d only focussed on doing image and video generation.

pfried@reddthat.com · 6 天前

Transformer is useful for damn near anything. At the end of the day, what we consider intelligence is the ability to predict what comes next, whether that is what our senses will tell us next or what the next hypothesis to test should be based on the data we have seen so far.

AnyOldName3@lemmy.world · 6 天前

It’s not damn near anything. There’s loads of stuff that computers can do much more quickly and more accurately without it just by virtue of computers already being fast and effective at maths and obeying logic. With or without the transformer architecture, a neural network is never going to be as fast or reliable at, for example, summing a collection of numbers as just adding them would be, and loads of real-world tasks are like this, hence why we’ve built billions of computers even before the transformer architecture was invented.

Also, in particular, I didn’t say that the transformer architecture wasn’t useful for things that aren’t LLMs, I said that most of the work done specifically to improve LLMs has no applications outside LLMs, so the next big leap towards making computers intelligent isn’t helped more by working on LLMs than it would be by working on any other kind of AI.

pfried@reddthat.com · 6 小时前

I’m saying there is no “big leap” necessary. As the paper that introduced the transformer said, attention is all you need.

AnyOldName3@lemmy.world · 6 小时前

If we’re going to pull up other people’s pithy phrases that aren’t intended to be taken entirely literally, then the relevant one here is machine learning is the second best solution to any problem. In the (approximately, depending on how you define it) century people have been thinking about computers, we’ve already found better solutions to lots of problems. If a transformer-based neural network can get 99% accuracy in sixty seconds on 92 billion transistors of GPU and billions more for its VRAM, that’s pretty useless if we can also do it with 100% accuracy in sixty microseconds on a $1 microcontroller, or even faster on a less constrained device.

The attention is all you need phrase is specifically in the context of sequence transduction models, and specifically referring to the discovery that they don’t need a combination of attention, recurrence and convolution, but actually only need attention if it’s used in the novel way introduced by the paper. If you don’t need to transduce any sequences, then this isn’t necessarily relevant, and it’s critically not a claim that you can do everything by transducing sequences. It was a surprise that applying it to generating new text instead of just converting it worked as well as it did, and a surprise that it kept getting better with larger models instead of plateauing around the GPT-1 and GPT-2 era, and a surprise that the text generation could be used to do other things, even ones as basic as addition. These things weren’t predicted by the Attention Is All You Need paper.