Skip to content

core_arch::x86::avx512f intrinsics with rounding do not compile #140352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TechPizzaDev opened this issue Apr 26, 2025 · 1 comment
Open

core_arch::x86::avx512f intrinsics with rounding do not compile #140352

TechPizzaDev opened this issue Apr 26, 2025 · 1 comment
Labels
C-bug Category: This is a bug. F-stdarch_x86_avx512 `#![feature(stdarch_x86_avx512)]` needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. O-x86_64 Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64) T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@TechPizzaDev
Copy link

TechPizzaDev commented Apr 26, 2025

I tried this code:

core_arch::x86::avx512f::_mm512_cvtps_ph::<_MM_FROUND_TO_NEAREST_INT>(value)

I expected this to compile.
Instead, this happened:

error[E0080]: evaluation of `std::arch::x86_64::_mm512_cvtps_ph::<0>::{constant#0}` failed

                static_assert_sae!(SAE);
      |         ^^^^^^^^^^^^^^^^^^^^^^^ evaluation panicked: Invalid IMM value

It seems that the assertion macros static_assert_sae and static_assert_mantissas_sae are not checking the flags properly and error on most valid values.
https://github.com/rust-lang/stdarch/blob/e907456b2e10622ccd854a3bba8d02ce170b5dbb/crates/core_arch/src/x86/macros.rs#L18-L22

As an added note, some docs seem to be inconsistent with Intel® Intrinsics Guide:
Rustdoc of _mm512_cvtps_ph only mentions _MM_FROUND_NO_EXC, while Guide mentions many flags.
Rustdoc of _mm_maskz_cvt_roundps_ph mentions flags with _MM_FROUND_NO_EXC, while Guide does not.

Meta

rustc --version --verbose:

rustc 1.88.0-nightly (b4c8b0c3f 2025-04-25)
binary: rustc
commit-hash: b4c8b0c3f0533bb342a4873ff59bdad3883ab8e3
commit-date: 2025-04-25
host: x86_64-unknown-linux-gnu
release: 1.88.0-nightly
LLVM version: 20.1.2

@TechPizzaDev TechPizzaDev added the C-bug Category: This is a bug. label Apr 26, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 26, 2025
@usamoi
Copy link
Contributor

usamoi commented Apr 28, 2025

documentation of Intel® Intrinsics Guide 3.3 about `_mm512_cvt_roundps_ph`
__m256i _mm512_cvt_roundps_ph (__m512 a, int sae)
Synopsis
__m256i _mm512_cvt_roundps_ph (__m512 a, int sae)
#include <immintrin.h>
Instruction: vcvtps2ph ymm, zmm {sae}
CPUID Flags: AVX512F
Description
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
Operation
FOR j := 0 to 15
	i := 16*j
	l := 32*j
	dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ENDFOR
dst[MAX:256] := 0
documentation of Intel® Intrinsics Guide 3.4 about `_mm512_cvt_roundps_ph`
__m256i _mm512_cvt_roundps_ph (__m512 a, int rounding)
Synopsis
__m256i _mm512_cvt_roundps_ph (__m512 a, int rounding)
#include <immintrin.h>
Instruction: vcvtps2ph ymm, zmm {sae}, imm8
CPUID Flags: AVX512F
Description
Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
Rounding is done according to the rounding[3:0] parameter, which can be one of:
    (_MM_FROUND_TO_NEAREST_INT) // round to nearest
    (_MM_FROUND_TO_NEG_INF)     // round down
    (_MM_FROUND_TO_POS_INFC)    // round up
    (_MM_FROUND_TO_ZERO)        // truncate
    _MM_FROUND_CUR_DIRECTION    // use MXCSR.RC; see _MM_SET_ROUNDING_MODE
    (_MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC) // round to nearest, and suppress exceptions
    (_MM_FROUND_TO_NEG_INF |_MM_FROUND_NO_EXC)     // round down, and suppress exceptions
    (_MM_FROUND_TO_POS_INF |_MM_FROUND_NO_EXC)     // round up, and suppress exceptions
    (_MM_FROUND_TO_ZERO |_MM_FROUND_NO_EXC)        // truncate, and suppress exceptions
    (_MM_FROUND_CUR_DIRECTION |_MM_FROUND_NO_EXC)  // use MXCSR.RC and suppress exceptions; see _MM_SET_ROUNDING_MODE
Operation
FOR j := 0 to 15
	i := 16*j
	l := 32*j
	dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ENDFOR
dst[MAX:256] := 0

I think it's because these intrinsics were written before the release of Intel® Intrinsics Guide 3.4.

cc #111137

@lolbinarycat lolbinarycat added F-stdarch_x86_avx512 `#![feature(stdarch_x86_avx512)]` T-libs Relevant to the library team, which will review and decide on the PR/issue. O-x86_64 Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64) labels Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. F-stdarch_x86_avx512 `#![feature(stdarch_x86_avx512)]` needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. O-x86_64 Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64) T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants