Commit Graph

53243 Commits

Author SHA1 Message Date
Andreas Rheinhardt 3144652588 avcodec/x86/lossless_videoencdsp_init: Don't read too often
sub_median_pred_mmxext() calculates a predictor from the left, top
and topleft pixel values. The topleft values need to be initialized
differently for the first loop initialization than for the others
in order to avoid reading ptr[-1]. So it has been initialized before
the loop and then read again at the end of the loop, so that the last
value read was never used. Yet this can lead to reads beyond the end
of the buffer, e.g. with
ffmpeg -cpuflags mmx+mmxext -f lavfi -i "color=size=64x4,format=yuv420p" \
-vf vflip -c:v ffvhuff -pred median -frames 1 -f null -

Fix this by not reading the value at the end of the loop.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 10:16:29 +01:00
Andreas Rheinhardt 2b9aea7756 avcodec/x86/lossless_videoencdsp_init: Don't read from before the buffer
sub_median_pred_mmxext() calculates a predictor from the left, top
and topleft pixel values. The left value is simply read via
ptr[-1], although this is not guaranteed to be inside the buffer
in case of negative strides. This happens e.g. with

ffmpeg -i fate-suite/mpeg2/dvd_single_frame.vob -vf vflip \
       -c:v magicyuv -pred median -f null -

Fix this by reading the first value like the topleft value.
Also change the documentation of sub_median_pred to reflect this
change (and the one from 791b5954bc).

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-14 10:16:25 +01:00
Rémi Denis-Courmont 71db4f3cc1 lavc/llvidencdsp: R-V V sub_median_pred
SpacemiT X60:
sub_median_pred_c:                                  297862.8 ( 1.00x)
sub_median_pred_rvb_b:                              101992.2 ( 2.92x)
sub_median_pred_rvv_i32:                              4820.0 (61.80x)
2025-12-14 10:33:40 +02:00
Rémi Denis-Courmont 87190fff6e lavc/llvidencdsp: R-V B sub_median_pred
SiFive U74:
sub_median_pred_c:                                  238947.3 ( 1.00x)
sub_median_pred_rvb_b:                              106686.9 ( 2.24x)

SpacemiT X60:
sub_median_pred_c:                                  297862.8 ( 1.00x)
sub_median_pred_rvb_b:                              101992.2 ( 2.92x)
2025-12-14 10:33:40 +02:00
Tomasz Szumski 08db850159 avcodec: add JPEG-XS decoder and encoder using libsvtjpegxs
Co-Authored-by: James Almer <jamrial@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-13 19:00:35 -03:00
James Almer 52c097065c avcodec: add a JPEG-XS parser
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-13 18:45:17 -03:00
Tomasz Szumski 4243e6c870 avcodec/codec_id: add JPEG-XS
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-13 18:45:17 -03:00
Lynne 9e8e34d475 vulkan_ffv1: remove unused RCT shader files
The 2 files were made redundant when the RCT was merged into encode/decode.
2025-12-13 22:12:26 +01:00
Lynne 5bb9cd23b7 vulkan_dpx: fix GRAY16BE and big-endian marked 8-bit samples 2025-12-13 21:35:56 +01:00
Lynne c3291993eb vulkan_ffv1: use proper rounded divisions for plane width and height
Fixes #20314
2025-12-13 19:12:24 +01:00
Lynne 91deb96d3c vulkan_decode: don't set unnecessary function pointers for FFHWAccel
Invalidate is not used for SDR decoders, since they don't use session
parameters.
2025-12-13 19:12:24 +01:00
Lynne 72e83b42d1 vulkan_decode: clean up decoder initialization
Now that we don't reset on every seek, we can simplify it.
2025-12-13 19:12:24 +01:00
Lynne 018ba6b612 vulkan_decode: do not reset the decoder when flushing
The issue is that .flush gets called asynchronously, and modifies the
video session state while its being used for decoding. This did not
result in issues since all known vendors do not keep important state
there, but its not compliant with the specs.

Its not necessary to flush the decoder at all when seeking,
so simply don't.

Fixes #20487
2025-12-13 19:12:20 +01:00
Andreas Rheinhardt 3da2a21710 avcodec/hq_hqadata: Avoid relocations for HQProfiles
Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-13 05:57:47 +01:00
Andreas Rheinhardt 2718874724 avcodec/hq_hqadata: Remove padding from tables
Each table needs only tab_w*tab_h*2 entries.

Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-13 05:55:44 +01:00
Andreas Rheinhardt 0cf187471f avcodec/hq_hqa: Don't rederive value
perm gets incremented in the loop in such a manner that
it already has the value it is set to here except for
the first loop iteration.

Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-13 05:55:20 +01:00
Ruikai Peng c48b8ebbbb avcodec/vulkan: fix DPX unpack offset
The DPX Vulkan unpack shader computes a word offset as

    uint off = (line_off + pix_off >> 5);

Due to GLSL operator precedence this is evaluated as
line_off + (pix_off >> 5) rather than (line_off + pix_off) >> 5.
Since line_off is in bits while off is a 32-bit word index,
scanlines beyond y=0 use an inflated offset and the shader reads
past the end of the DPX slice buffer.

Parenthesize the expression so that the sum is shifted as intended:

    uint off = (line_off + pix_off) >> 5;

This corrects the unpacked data and removes the CRC mismatch
observed between the software and Vulkan DPX decoders for
mispacked 12-bit DPX samples. The GPU OOB read itself is only
observable indirectly via this corruption since it occurs inside
the shader.

Repro on x86_64 with Vulkan/llvmpipe (531ce713a0):

    ./configure --cc=clang --disable-optimizations --disable-stripping \
        --enable-debug=3 --disable-doc --disable-ffplay \
        --enable-vulkan --enable-libshaderc \
        --enable-hwaccel=dpx_vulkan \
        --extra-cflags='-fsanitize=address -fno-omit-frame-pointer' \
        --extra-ldflags='-fsanitize=address' && make

    VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.json

PoC: packed 12-bit DPX with the packing flag cleared so the unpack
shader runs (4x64 gbrp12le), e.g. poc12_packed0.dpx.

Software decode:

    ./ffmpeg -v error -i poc12_packed0.dpx -f framecrc -
    -> 0, ..., 1536, 0x26cf81c2

Vulkan hwaccel decode:

    VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.json \
    ./ffmpeg -v error -init_hw_device vulkan \
        -hwaccel vulkan -hwaccel_output_format vulkan \
        -i poc12_packed0.dpx \
        -vf hwdownload,format=gbrp12le -f framecrc -
    -> 0, ..., 1536, 0x71e10a51

The only difference between the two runs is the Vulkan unpack
shader, and the stable CRC mismatch indicates that it is reading
past the intended DPX slice region.

Regression since: 531ce713a0
Found-by: Pwno
2025-12-12 20:13:16 +00:00
James Almer 9c14527f1a avcodec/vvc/refs: export in-band LCEVC side data in frames
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 15:21:49 -03:00
James Almer 94c491287c avcodec/vvc/sei: parse Registered and Unregistered SEI messages
Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 15:21:48 -03:00
James Almer 6dad70507f avcodec/cbs_sei: store a pointer to the start of Registered and Unregistered SEI messages
Required for the following commit, where a parsing function expects the buffer
to include the country code bytes.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 15:21:48 -03:00
James Almer b6655e9594 avcodec/dpx: make the lack of break in a switch case explicit
Should fix CID 1676036

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-12 18:18:46 +00:00
Cameron Gutman 0637a28dc0 lavc/vulkan_video: fix leak on CreateVideoSessionKHR failure
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
2025-12-12 12:43:00 +00:00
Cameron Gutman 4e4677bf58 lavc/vulkan_video: fix double-free if ff_vk_decode_init() fails
ff_vk_video_common_init() calls ff_vk_video_common_uninit() on failure
which leaves dangling object handles. Those get freed again when the
destructor of FFVulkanDecodeShared calls ff_vk_video_common_uninit()
again.

Signed-off-by: Cameron Gutman <aicommander@gmail.com>
2025-12-12 12:43:00 +00:00
Andreas Rheinhardt a72e01b4ec avcodec/ppc/vc1dsp_altivec: Don't read too much data
vc1_inv_trans_8x4_altivec() is supposed to process a block
of 8x4 words, yet it read and processed eight lines. This led
to ASAN failures (see [1]) that this commit intends to fix.
It should also lead to performance improvements, but I don't have
real hardware to bench it.

[1]: https://fate.ffmpeg.org/report.cgi?time=20251207214004&slot=ppc64-linux-gcc-14.3-asan

Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-12 09:44:01 +01:00
James Almer 04df80f973 avcodec/cavs_parser: check return value of init_get_bits8
Fixes Coverity issue CID 1676035

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-11 20:01:01 -03:00
Rémi Denis-Courmont a4cb6c724b lavc/llvidencdsp: R-V V sub_left_predict
SpacemiT X60:
sub_left_predict_c:                                  51836.0 ( 1.00x)
sub_left_predict_rvv_i32:                             5843.1 ( 8.87x)
2025-12-11 17:24:38 +02:00
Leo Izen 37858dc6bd avcodec/libjxlenc: add EXIF box to output
We already parse the EXIF side data to extract the orientation, so we
should add it to the output file as an EXIF box.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2025-12-11 05:38:36 -05:00
Leo Izen e349118b4c avcodec/libjxlenc: avoid calling functions inside if statements
It leads to messier, less readable code, and can also lead to bugs.
I prefer this code style.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2025-12-11 05:38:35 -05:00
Leo Izen 6ec4b3a9cb avcodec/libjxlenc: give display matrix sidedata priority
Before this commit, we ignore the display matrix side data if any EXIF
side data is present, even if that side data contains no orientation
tag. This allows us to calculate the orientation from the display
matrix sidedata first, if present. Ideally the decoder will have
removed the orientation tag upon decoding and attached the data as
display matrix side data instead, so this makes our orientation code
respect this behavior.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2025-12-11 05:38:33 -05:00
Hyunjun Ko 6726359326 vulkan_vp9: fix subsampling source and show_frame flag 2025-12-10 18:41:20 +00:00
Kacper Michajłow 04a46a2ae4 avcodec/d3d12va_encode_av1: don't ignore return value
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 21:31:13 +00:00
Kacper Michajłow f4fc14fb38 avcodec/d3d12va_encode_av1: fix size_t format specifier 2025-12-08 21:31:13 +00:00
Kacper Michajłow 5b2bd6f88d avcodec/d3d12va_encode_av1: remove unused variables
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 21:31:13 +00:00
Kacper Michajłow 1f7182a991 avcodec/libx265: add explicit enum cast to suppress compiler warnings
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 21:31:13 +00:00
Kacper Michajłow eaa2b3d4be avcodec/libsvtav1: add explicit enum cast to suppress compiler warnings
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 21:31:13 +00:00
Kacper Michajłow 490af2d4cf avcodec/libaomdec: add explicit enum cast to suppress compiler warnings
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 21:31:13 +00:00
Andreas Rheinhardt dc843cdd9a avcodec/x86/vp9mc: Reindent after the previous commit
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:35:07 +01:00
Andreas Rheinhardt 65e71b0837 avcodec/x86/vp9mc: Deduplicate coefficient tables
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:35:01 +01:00
Andreas Rheinhardt 38e2174ce4 avcodec/x86/vp9mc: Avoid MMX regs in width 4 hor 8tap funcs
Using wider registers (and pshufb) allows to halve the number of
pmaddubsw used. It is also ABI compliant (no more missing emms).

Old benchmarks:
vp9_avg_8tap_smooth_4h_8bpp_c:                          97.6 ( 1.00x)
vp9_avg_8tap_smooth_4h_8bpp_ssse3:                      15.0 ( 6.52x)
vp9_avg_8tap_smooth_4hv_8bpp_c:                        342.9 ( 1.00x)
vp9_avg_8tap_smooth_4hv_8bpp_ssse3:                     54.0 ( 6.35x)
vp9_put_8tap_smooth_4h_8bpp_c:                          94.9 ( 1.00x)
vp9_put_8tap_smooth_4h_8bpp_ssse3:                      14.2 ( 6.67x)
vp9_put_8tap_smooth_4hv_8bpp_c:                        325.9 ( 1.00x)
vp9_put_8tap_smooth_4hv_8bpp_ssse3:                     52.5 ( 6.20x)

New benchmarks:
vp9_avg_8tap_smooth_4h_8bpp_c:                          97.6 ( 1.00x)
vp9_avg_8tap_smooth_4h_8bpp_ssse3:                      10.8 ( 9.08x)
vp9_avg_8tap_smooth_4hv_8bpp_c:                        342.4 ( 1.00x)
vp9_avg_8tap_smooth_4hv_8bpp_ssse3:                     38.8 ( 8.82x)
vp9_put_8tap_smooth_4h_8bpp_c:                          94.7 ( 1.00x)
vp9_put_8tap_smooth_4h_8bpp_ssse3:                       9.7 ( 9.75x)
vp9_put_8tap_smooth_4hv_8bpp_c:                        321.7 ( 1.00x)
vp9_put_8tap_smooth_4hv_8bpp_ssse3:                     37.0 ( 8.69x)

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:34:35 +01:00
Andreas Rheinhardt dd5dc254ff avcodec/x86/vp9mc: Avoid reloads, MMX regs in width 4 vert 8tap func
Four rows of four bytes fit into one xmm register; therefore
one can arrange the rows as follows (A,B,C: first, second, third etc.
row)

xmm0: ABABABAB BCBCBCBC
xmm1: CDCDCDCD DEDEDEDE
xmm2: EFEFEFEF FGFGFGFG
xmm3: GHGHGHGH HIHIHIHI

and use four pmaddubsw to calculate two rows in parallel. The history
fits into four registers, making this possible even on 32bit systems.

Old benchmarks (Unix 64):
vp9_avg_8tap_smooth_4v_8bpp_c:                         105.5 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      16.4 ( 6.44x)
vp9_put_8tap_smooth_4v_8bpp_c:                          99.3 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      15.4 ( 6.44x)

New benchmarks (Unix 64):
vp9_avg_8tap_smooth_4v_8bpp_c:                         105.0 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      11.8 ( 8.90x)
vp9_put_8tap_smooth_4v_8bpp_c:                          99.7 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      10.7 ( 9.30x)

Old benchmarks (x86-32):
vp9_avg_8tap_smooth_4v_8bpp_c:                         138.2 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      28.0 ( 4.93x)
vp9_put_8tap_smooth_4v_8bpp_c:                         123.6 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      28.0 ( 4.41x)

New benchmarks (x86-32):
vp9_avg_8tap_smooth_4v_8bpp_c:                         139.0 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3:                      20.1 ( 6.92x)
vp9_put_8tap_smooth_4v_8bpp_c:                         124.5 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3:                      19.9 ( 6.26x)

Loading the constants into registers did not turn out to be advantageous
here (not to mention Win64, where this would necessitate saving
and restoring ever more register); probably because there are only two
loop iterations.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:31:59 +01:00
Andreas Rheinhardt 36204fbc3c avcodec/vp9itxfm{,_16bpp}: Remove MMXEXT functions overridden by SSSE3
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:27:51 +01:00
Andreas Rheinhardt ea37f49aed avcodec/vp9intrapred: Remove MMXEXT functions overridden by SSSE3
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:27:44 +01:00
Andreas Rheinhardt 6e418af810 avcodec/vp9mc: Remove MMXEXT functions overridden by SSSE3
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-12-08 19:27:05 +01:00
Kacper Michajłow 5b5d51cbc1 avcodec/x86/h264_idct: fix version check for NASM 3 and newer
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-12-08 17:43:29 +00:00
Oliver Chang 9849a274df avcodec/dpx: Fix heap-buffer-overflow in 16-bit decoding
Fixes a heap-buffer-overflow in `libavcodec/dpx.c` triggered by a stale
`unpadded_10bit` flag in the `DPXDecContext`. This flag, set for 10-bit
unpadded frames, persisted across `decode_frame` calls. If a subsequent
frame was 16-bit, the stale flag caused incorrect buffer size
validation, allowing truncated buffers to pass checks designed for
smaller 10-bit packed data. This led to an out-of-bounds read in
`av_image_copy_plane` during 16-bit decoding.

The fix explicitly resets `dpx->unpadded_10bit = 0` at the start of
`decode_frame` to ensure correct validation for each frame.

Fixes: https://issues.oss-fuzz.com/issues/464471792
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array read
Fixes: 464471792/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DPX_DEC_fuzzer-5275522210004992
2025-12-07 19:41:02 +00:00
Rémi Denis-Courmont 10ea5f8b99 lavc/h264idct: R-V V 9-bit h264_luma_dc_dequant_idct
Note that, like the C reference, the same function can be used for
larger bit depths.
2025-12-07 20:27:35 +02:00
Rémi Denis-Courmont d69a36a8d1 lavc/h264idct: R-V V 8-bit h264_luma_dc_dequant_idct
This does not improve performance with current hardware due to the poor
performance of segmented accesses. Performance should be slightly better
with expensive or near-future hardware that I don't have, however it is
still limited by two other factors:
- There are only 4 elements.
- The final stores are necessarily indexed and hit multiple cache lines,
  thus as slow as scalar.
2025-12-07 20:27:35 +02:00
Rémi Denis-Courmont f222eb2b08 lavc/mpv_unquantize: R-V V H.263 DCT unquantize
SpacemiT X60:
dct_unquantize_h263_inter_c:                           417.8 ( 1.00x)
dct_unquantize_h263_inter_rvv_i32:                      66.0 ( 6.33x)
dct_unquantize_h263_intra_c:                           140.2 ( 1.00x)
dct_unquantize_h263_intra_rvv_i32:                      67.7 ( 2.07x)

Note that the C benchmarks are not stable, depending heavily on the
number of coefficients picked by the RNG. The R-V V benchmarks are
however very stable and generally better than C's.
2025-12-07 20:20:38 +02:00
averne c384b1e803 vulkan/prores: use vkCmdClearColorImage
The VK spec forbids using clear commands on YUV images,
so we need to allocate separate per-plane images.
This removes the need for a separate reset shader.
2025-12-07 18:17:36 +00:00
James Almer 00caeba050 avcodec: rename avcodec_receive_frame2 to avcodec_receive_frame_flags
It's a name that communicates its functionality in a better way.
Since the function was introduced very recently, we can safely rename it.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-12-07 12:47:46 -03:00