Intel Structure Handbook Updates: bfloat16 for Cooper Lake Xeon Scalable Solely?

Intel has lately launched a brand new model of its software program developer doc describing extra particulars in regards to the upcoming Xeon Scalable processors "Cooper Lake-SP". It seems that the brand new CPUs help the AVX512_BF16 directions and due to this fact the bfloat16 format. Now the primary assault right here is the truth that AVX512_BF16 is supported at this level solely by the Cooper Lake SP microarchitecture, however not its direct successor, the Ice Lake SP microarchitecture.

The bfloat16 is a condensed 16-bit single-precision floating-point format model of 32-bit IEEE 754 that retains Eight exponent bits, however reduces the importance of the significand from 24-bit to Eight-bit to extend reminiscence house save up. Bandwidth and processing sources whereas sustaining the identical vary. The bfloat16 format is designed primarily for machine studying and near-sensor computing functions the place close to zero accuracy is required, however not a lot within the most vary. The quantity illustration is supported by the upcoming FPGAs from Intel, in addition to Nervana processors for neural networks and Google's TPUs. On condition that Intel helps the bfloat16 format for 2 of its product traces, it is sensible to help it elsewhere as properly. That is what the corporate goes to do by supporting its AVX512_BF16 directions on its upcoming Xeon Scalable & # 39; Cooper Lake SP & # 39; platform.

Help of AVX-512 help by varied Intel CPUs
Newer uArch helps older uArch

Xeon
Common
Xeon Phi

Skylake-SP
AVX512BW
AVX512DQ
AVX512VL
AVX512F
AVX512CD
AVX512ER
AVX512PF
Knights touchdown

Cannon Lake
AVX512VBMI
AVX512IFMA
AVX512_4FMAPS
AVX512_4VNNIW
Knights Mill

Cascade Lake SP
AVX512_VNNI

Cooper Lake
AVX512_BF16

Ice Lake
AVX512_VNNI
AVX512_VBMI2
AVX512_BITALG
AVX512 + VAES
AVX512 + GFNI
AVX512 + VPCLMULQDQ
(not BF16)
AVX512_VPOPCNTDQ

Supply: Programmer's Reference "Intel Structure Command Extension Extensions and Future Options" (pages 16)

The record of Intel AVX512_BF16 Vector Neural Community directions consists of VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. All can run in 128-bit, 256-bit, or 512-bit mode, permitting software program builders to buy one among 9 variations as wanted.

Intel AVX512_BF16 Directions
Intel C / C ++ Compiler Intrinsic Equal

directions
description

VCVTNE2PS2BF16
Changing Two Packed People right into a Packed BF16 Information

Intrinsic Equal of Intel C / C ++ Compiler:
VCVTNE2PS2BF16 __m128bh _mm_cvtne2ps_pbh (__m128, __m128);
VCVTNE2PS2BF16 __m128bh_mm_mask_cvtne2ps_pbh (__m128bh, __mmask8, __m128, __m128);
VCVTNE2PS2BF16 __m128bh_mm_maskz_cvtne2ps_pbh (__mmask8, __m128, __m128);
VCVTNE2PS2BF16 __m256bh _mm256_cvtne2ps_pbh (__m256, __m256);
VCVTNE2PS2BF16 __m256bh _mm256_mask_cvtne2ps_pbh (__m256bh, __mmask16, __m256, __m256);
VCVTNE2PS2BF16 __m256bh _mm256_maskz_cvtne2ps_pbh (__mmask16, __m256, __m256);
VCVTNE2PS2BF16 __m512bh _mm512_cvtne2ps_pbh (__m512, __m512);
VCVTNE2PS2BF16 __m512bh _mm512_mask_cvtne2ps_pbh (__m512bh, __mmask32, __m512, __m512);
VCVTNE2PS2BF16 __m512bh _mm512_maskz_cvtne2ps_pbh (__mmask32, __m512, __m512);

VCVTNEPS2BF16
Transformed packed knowledge into packed BF16 knowledge

Intrinsic equal of Intel C / C ++ compiler:
VCVTNEPS2BF16 __m128bh _mm_cvtneps_pbh (__m128);
VCVTNEPS2BF16 __m128bh _mm_mask_cvtneps_pbh (__m128bh, __mmask8, __m128);
VCVTNEPS2BF16 __m128bh _mm_maskz_cvtneps_pbh (__mmask8, __m128);
VCVTNEPS2BF16 __m128bh _mm256_cvtneps_pbh (__m256);
VCVTNEPS2BF16 __m128bh _mm256_mask_cvtneps_pbh (__m128bh, __mmask8, __m256);
VCVTNEPS2BF16 __m128bh _mm256_maskz_cvtneps_pbh (__mmask8, __m256);
VCVTNEPS2BF16 __m256bh _mm512_cvtneps_pbh (__m512);
VCVTNEPS2BF16 __m256bh _mm512_mask_cvtneps_pbh (__m256bh, __mmask16, __m512);
VCVTNEPS2BF16 __m256bh _mm512_maskz_cvtneps_pbh (__mmask16, __m512);

VDPBF16PS
Level product of BF16 pairs in packed single precision

Intel C / C ++ Compiler Intrinsic Equal:
VDPBF16PS __m128 _mm_dpbf16_ps (__ m128, __m128bh, __m128bh);
VDPBF16PS __m128 _mm_mask_dpbf16_ps (__m128, __mmask8, __m128bh, __m128bh);
VDPBF16PS __m128 _mm_maskz_dpbf16_ps (__ mmask8, __m128, __m128bh, __m128bh);
VDPBF16PS __m256 _mm256_dpbf16_ps (__ m256, __m256bh, __m256bh);
VDPBF16PS __m256 _mm256_mask_dpbf16_ps (__ m256, __mmask8, __m256bh, __m256bh);
VDPBF16PS __m256 _mm256_maskz_dpbf16_ps (__ mmask8, __m256, __m256bh, __m256bh);
VDPBF16PS __m512 _mm512_dpbf16_ps (__ m512, __m512bh, __m512bh);
VDPBF16PS __m512 _mm512_mask_dpbf16_ps (__ m512, __mmask16, __m512bh, __m512bh);
VDPBF16PS_m512 _mm512_maskz_dpbf16_ps (mmask16, m512, m512bh, m512bh);

Cooper Lake solely?

When Intel mentions an announcement in its Intel Structure Command Extension and Programming Reference for Future Capabilities the corporate normally cites the primary microarchitecture it helps, stating that its successors additionally help it ( or are set) help it) by typing it & # 39; later & # 39; identify and the phrase microarchitecture For instance, Intel's authentic AVX is by Intel & # 39; s Sandy Bridge and later & # 39; s; supported.

gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw== - Intel Structure Handbook Updates: bfloat16 for Cooper Lake Xeon Scalable Solely? gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw== - Intel Structure Handbook Updates: bfloat16 for Cooper Lake Xeon Scalable Solely?

This isn’t the case with AVX512_BF16. This ought to be supported by "Future Cooper Lake". After the Cooper Lake SP platform comes the long-awaited 10nm Ice Lake SP server platform, and it will likely be just a little unusual if it helps one thing its predecessor doesn’t help. Nevertheless, this isn’t a very inconceivable situation. Intel plans to supply differentiated options lately, so it could be the case to regulate Cooper Lake SP for particular workloads and to focus Ice Lake SP on others.

We have now requested Intel for extra data and can replace the story as we study extra about this matter.

Associated Studying

Supply: Programming Reference for Intel Structure Command Units and Future Options (by way of InstLatX64 / Twitter )

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.