Huawei unveils new Atlas 350 AI accelerator with 1.56 PFLOPS of FP4 compute and up to 112GB of HBM — claims 2.8x more performance than Nvidia's H20

commander@lemmy.world · 1 day ago

Huawei unveils new Atlas 350 AI accelerator with 1.56 PFLOPS of FP4 compute and up to 112GB of HBM — claims 2.8x more performance than Nvidia's H20

Buffalox@lemmy.world · 1 day ago

I don’t understand how FP4 is useful for anything, unless I misunderstand it only uses 4 bits, and with 4 bit’s you can only have 16 different values?! With FP4 this is even reduced to 15 values as the first digit signifies negative, so 0000 and 1000 are both zero.

https://en.wikipedia.org/wiki/Minifloat

fallaciousBasis@lemmy.world · 22 hours ago

Because it scales logarithmically or exponentially just fine. Precision here matters more than accuracy.

You get a handful of values around zero. A handful of medium values. And a handful of increasing large values.

Like in unsigned 4-bit integers(for AI) you’d likely have something binary/exponential like… 0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384…

Instead of 0 to 15(linear.) this is also how posits work. John Gustafson designed posits with AI in mind and explains better than I could how these tiny 4/8 bit types can fill in for much bigger types with minimal cons and massive pros(reduced memory and reduced compute). Like 16,384 is what 14 typical bits gets you(2^14). But by scaling you can get a similar range(precision) sacrificing ‘fine grain’ accuracy that AI doesn’t really benefit from. Which is kind of similar to how floats work with sign bit, exponent, and mantissa. But most times people want binary floats for AI.

So you might get something like:

0, 1/8, 1/4, 1/2, 1, 2, 4, 8 (also negatives)

Or even:

0, 1/32, 1/16, 1/4, 1, 4, 16, 32 (also negatives)

Or even:

0, 1/1000, 1/100, 1/10, 1, 10, 100, 1000 (also negatives)

Honestly, AI doesn’t really care as long as you stick with the same scheme.

fuckwit_mcbumcrumble@lemmy.dbzer0.com · 1 day ago

Ai work is usually very low precision. FP8 being the lowest you could go for a while meant it’s been the standard.

Buffalox@lemmy.world · 1 day ago

Ai work is usually very low precision.

Apparently 😋