Fixes: ada-2-poc.mkv
Found-by: Claude and Ada Logics. This issue was found by Anthropic from using agents to study security of open source projects, and I am from Ada Logics helping validate the found issues and report to maintainers.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
And set it also for non-variable frame size encoders.
FATE changes are the result of passing a frame_size to flac and wavenc
encoders, instead of letting them choose one.
Signed-off-by: James Almer <jamrial@gmail.com>
Both worksaround a issue the following commit reveals (encoding with 4096
frame_size fails on aarch64 for unknown reasons), and tests setting
frame_size now that it's allowed (and ensuring the CLI doesn't overwrite it).
Signed-off-by: James Almer <jamrial@gmail.com>
This is for an upcoming change where the field will become user settable.
Unless a proper check for frame_size is introduced, it's better to just not
allow arbitrary values to be used.
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: ada-1-poc.mkv
Found-by: Claude and Ada Logics. This issue was found by Anthropic from using agents to study security of open source projects, and I am from Ada Logics helping validate the found issues and report to maintainers.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: ffmpeg_ANT-2026-02842_dirac-mctmp-heap-overflow
Discovered by Claude (Anthropic). Confirmed and reported by Thai Duong (Calif.io).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
You may look and think `AVFMT_FLAG_CUSTOM_IO` check is enough, but this
is not what it seems. This flag means that user provided custom
AVIOContext, before creating AVFormatContext and it should not be
closed. However nested sub-demuxers may still open an temporary io, and
those have to be closed and use correct io_close2 function.
You can see 0dcac9c3f0 and
ef01061225 where this flag is cleared for
nested opens to avoid leaking those.
lavf micro version bumped so API users can know if it is safe to use
custom io.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
According to Chapter 3, Paragraph 2 of the "SI Brochure - 9th ed./version 3.02":
> Prefix symbols are printed in upright typeface, as are unit symbols,
> regardless of the typeface used in the surrounding text and are
> attached to unit symbols without a space between the prefix symbol
> and the unit symbol.
https://www.bipm.org/documents/20126/41483022/SI-Brochure-9-EN.pdf
Assuming there is no padding between the last packet and the index, this
prevents the index from being parsed as a normal packet, with non-
sensical data.
This lets us seek in the video properly, based on the table at the end
of the files, and has been tested with Suikoden Tierkreis videos.
While at it I’ve also set the duration of the stream, this makes the
progress bar work correctly in mpv.
Unfortunately a bit slower than the MMX version due to
the impossibility to use memory operands in paddw.
The situation would reverse if ff_dctB_mmx() would have
to issue emms.
dctB_c: 3.7 ( 1.00x)
dctB_mmx: 3.3 ( 1.13x)
dctB_sse2: 3.6 ( 1.03x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Allows the uops macro generation code to not actually compile any passes.
More generally, this could be used to e.g. test if an op list is supported by
a backend without actually creating the passes.
The `bool first` change is needed because the `input == prev` check no longer
works if we don't actually compiled any passes.
Signed-off-by: Niklas Haas <git@haasn.dev>
This will be used eventually when I rewrite checkasm/sw_ops to re-use the
code in ops_dispatch.c instead of hand-rolling the execution layer.
Signed-off-by: Niklas Haas <git@haasn.dev>
This function actually lives in ops_dispatch.c, and doesn't really make
sense in ops.h anymore. We should also move some stuff out of ops_internal.h,
which doesn't depend on any external ops stuff, here.
This allows the backend/compilation-related stuff to co-exist more nicely.
Signed-off-by: Niklas Haas <git@haasn.dev>
Using the configured scaler from the SwsContext implicitly. This does affect
the output of libswscale/tests/sws_ops.c, which now prints about 4x as much
data (taking roughly 4x as long, but still within a second on my machine).
We can make this process a lot faster by forcing SWS_SCALE_POINT as the
scaler, which skips calculating any actual filter weights in favor of
generating a trivial 1-tap filter.
Signed-off-by: Niklas Haas <git@haasn.dev>
The only difference here is an extra ff_sws_add_filters() call, which is
a no-op because src w/h = dst w/h = 16.
Signed-off-by: Niklas Haas <git@haasn.dev>
This no longer accesses prev/next as a result of the `unused` removal, so
the signature can be simplified to just take the op directly.
Signed-off-by: Niklas Haas <git@haasn.dev>
We have other op types that skip checking the data even in non-flexible mode,
so there is a precedent for just leaving away `flexible` for such kernels.
Signed-off-by: Niklas Haas <git@haasn.dev>
Mirroring the precedent established by the other SwsOp-generating functions.
This allows us to re-use it for the uops macro generator.
Signed-off-by: Niklas Haas <git@haasn.dev>
The fix from 5fa2a65c11 introduced a regression for non-native-endian
formats (such as rgb565be on a little-endian system).
Reproducible with:
$ ./libswscale/tests/swscale -unscaled 1 -src rgb565be -dst rgb24
Also:
$ ./ffmpeg_g -i /opt/samples/jpegls/128.jls -vf "scale=size=512x512,format=rgb24,scale=flags=neighbor,format=rgb565be" -f rawvideo -vframes 1 -y rgb565be.raw
$ magick -size 512x512 -endian MSB RGB565:rgb565be.raw output.png
$ ./ffplay_g output.png
(note: don't use ffmpeg to convert from rgb565be.raw to output for the
test above since it will perform the same bug and cancel out the error)
When running with "-v 0", the test parameters were not being printed,
which made it hard to track down which conversion the error referred
to.
Now the test parameters are logged with av_log() when a loss error
happens.
The -p, -flags, and -unscaled options all affected the decision to
select a subsample of the tests to run. When specifying -p 0.1, about
57% of the tests would run instead of the expect 10%.
This commit fixes this by separating -p from -flags and -unscaled.
When audio and video streams have different time bases (e.g. video at
1/90000 and audio at 1/48000), vs->start_pts was stored as a raw PTS
from whichever stream's packet arrived first. The segment split
comparison then subtracted this value from the current packet's PTS
without accounting for the time base difference, producing incorrect
elapsed time calculations.
This caused segments to be split at wrong points — either too
frequently (on every keyframe) or not at all, depending on the
relative magnitudes of the time bases.
Fix by normalizing vs->start_pts to AV_TIME_BASE_Q at the point of
assignment and converting pkt->pts to the same base before comparison.
This ensures the segment split decision is always unit-consistent
regardless of which stream's packet is being evaluated.
The bug is most easily triggered by HLS muxing with video passthrough
and audio transcode, where the video retains its container time base
while the audio encoder outputs in its native time base.
Signed-off-by: Ben Kepner <u6bkep@gmail.com>
Unlike other decoders or encoders, prores_raw only has a single
Vulkan format to worry about.
This is a 20% speedup on AMD, since AMD apparently has optimizations
for this.
When ff_filter_alloc fails after the name has been allocated (via
av_strdup), the error handling code frees inputs and input_pads but
misses freeing ret->name, causing a memory leak.
Add av_freep(&ret->name) in the error path before freeing inputs.
When s is NULL in av_dynamic_hdr_smpte2094_app5_from_t35, that's not an
allocation error but just invalid API usage. If there is any allocation
failure beforehand that would lead to this, the caller has to check it,
like is already done in all usages of this function in FFmpeg itself.
Only set initial_padding when vorbis_analysis_blockout succeeds,
this avoids passing uninitialized data/garbage pointer to
av_vorbis_parse_frame.
Fix Coverity Issue 1681345
The pktpos denotes the position in the output packet buffer, while
buf_ptr is the position in the input buffer. As this payload is ignored,
nothing is written to the output packet so increasing the pktpos does
not make sense here, instead the buf_ptr has to be increased to advance
the input buffer to the correct position after this OBU.
This incorrect increment here could result in pktpos exceeding the whole
size of the output packet and the later call to memcpy to write to that
buffer would start its write way past the end of the packet buffer.
Fix#22812
Reported-By: fre3dm4n
Fix using enc_caps.supportedEncodeFeedbackFlags before
calling vkGetPhysicalDeviceVideoCapabilitiesKHR().
Otherwise the check will never pass and will fail with ENOTSUP.
Fixes 3f9e04b
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
In the call to mkv_write_blockadditional, use the correct
buffer for smpte2094_app5.
Commit 38df985fba updated the
buffer usage to prevent incorrect buffer reuse, but left this line
unchanged inadvertently.
Signed-off-by: Vignesh Venkat <vigneshv@google.com>
If incoming packets contain Smpte2094App5 metadata, retain them
so that they are passed through to the output.
Signed-off-by: Vignesh Venkat <vigneshv@google.com>
box() and triangle() have well-defined, trivially verifiable numerical
inverses.
We could actually pre-compute and hard-code the numerical inverse of all
non-parametric kernels, but I'm a bit reluctant to do this as I have plans to
adjust the value of SWS_MAX_REDUCE_CUTOFF based on the desired bit depth of the
output, which makes a hard-coding approach unfeasible.
(It would also be a brittle solution that may break whenever we extend the
scaler configuration API, as well as making it harder to add new filters)
Signed-off-by: Niklas Haas <git@haasn.dev>
Some WebRTC servers such as Pion send STUN packets concurrently during
the DTLS handshake. Unlike OpenSSL and GnuTLS which filter non-DTLS
packets internally, mbedtls passes all received UDP packets directly to
its DTLS state machine, causing the handshake to fail.
Fix this by using ff_is_dtls_packet() in mbedtls_recv to discard
non-DTLS packets such as STUN by returning WANT_READ, as specified
by RFC 5764 Section 5.1.2.
Signed-off-by: Nariman-Sayed <narimansayed28@gmail.com>
Move the DTLS packet detection logic from whip.c into a shared
ff_is_dtls_packet() function in tls.c, with its declaration and
related macros in tls.h. Update whip.c to use the new shared function.
Signed-off-by: Nariman-Sayed <narimansayed28@gmail.com>
Both rely on the AVIOContext being backed by the builtin URLContext.
When the API user overrides io_open, the keepalive path asserts on the
missing URLContext and the http_multiple auto-detect probe fails on
every read. http_multiple=1 still works even with custom IO.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
In mov_write_iacb_tag(), the dynamic buffer dyn_bc was leaked when
ff_iamf_write_descriptors() failed.
In mov_write_track_udta_tag(), the dynamic buffer pb_buf was leaked
when mov_write_track_kinds() failed, as the error path returned
directly instead of going through cleanup.
Fix both by ensuring ffio_free_dyn_buf() is called on all error paths.
Only used as AVBitStreamFilter.codec_ids. This avoids duplicating
it into lavf.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Right now, the cbs_type_table (the table of all CodedBitstreamTypes
supported by CBS) is empty unless cbs_apv and cbs_av1 is enabled.
The latter are only enabled in configure if they are needed in lavc.
This means that the mov muxers (the only users of cbs-in-lavf)
don't work as they should depending upon the availability of
e.g. the av1_metadata BSF. The table being empty is also illegal C
and according to PR #23038 MSVC warns about this (as does GCC
with -pedantic) and it may even lead to an internal compiler error.
This could be fixed by simply adding a mov_muxer->cbs_av1,cbs_apv
dependency in configure, yet this would have the downside that
it would force cbs_av1 and cbs_apv to be built for lavc, too,
even though it may not be needed there. So add new configure
variables cbs_{apv,av1}_lavf and cbs_lavf to track this correctly.
Reported-by: xiaozhuai <798047000@qq.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: xiaozhuai <798047000@qq.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
the c37_mv table is 3x 510-entry tables combined. Extend each
with a coordinate pair for index 0xff, which allows to eliminate
the index check in the code37/48 block handlers.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
This undoes 556cef27d9, which I added to fix a fuzzer-crash,
but there's no reason to expect the decoder can only be invoked
via the smush demuxer. Instead also accept a range of dimensions
from 2x2 up to 640x480.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
For the diff-buffer codecs, return error for frames that are larger
than the currently configured canvas. This mimics the behaviour
of the DOS smush engines.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Codec>=37 with smaller dimensions can be embedded onto larger canvasses;
it makes no sense to trust their dimensions explicitly.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
generate_missing_ref walked frame->f->data[] until a NULL slot, which
on alpha-video frames extended to data[3] and read
sps->hshift[3]/vshift[3] out of bounds.
The alpha plane is produced by the alpha layer via
replace_alpha_plane; the base decoder path never reads or writes it.
Bound the fill loop by the SPS coded plane count. This both removes
the out-of-bounds shift access and avoids an unnecessary full-frame
memset of the alpha plane.
Fixes: out of array read
Fixes: 500770604/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-6157374833623040
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
In the picture header parser, the chroma branch incorrectly tested
sps_max_mtt_hierarchy_depth_intra_slice_chroma to decide whether to
parse ph_log2_diff_max_{bt,tt}_min_qt_intra_slice_chroma.
Per ITU-T H.266 (V4, 01/2026) section 7.3.2.8 "Picture header
structure syntax", the condition is on the just-parsed
ph_max_mtt_hierarchy_depth_intra_slice_chroma, exactly mirroring the
luma branch a few lines above and the inter-slice branch below.
sps_partition_constraints_override_enabled_flag allows the picture
header to override the SPS values, so testing the SPS field is
incorrect and desynchronises the parser whenever the PH override
changes the chroma MTT depth from/to zero.
Signed-off-by: Mou Fan <moufan17@126.com>
This fixes overreads with libdav1d, because it provides
non-padded data in violation to the requirements of
the GetBits API.
Furthermore, using the GetBits API here is wasteful,
as the offsets here are known and the actual data to be copied
is even byte-aligned, allowing to use memcpy.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The potentially negative return value of av_index_search_timestamp()
wasn't being handled before passing it to can_seek_to_key_sample().
Found by Wongi Lee (@_qwerty_po) of Theori with Xint Code,
Jungwoo Lee (@physicube).
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Fixes a -Wimplicit-fallthrough warning from Clang;
GCC does not warn about this.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
av_program_add_stream_index() added in 526efa1053
may fail to carry out its purpose but the lack of
a return value stops callers from catching any error.
Fixed in new function.
When options is NULL, ffurl_connect() creates a temporary dictionary
(tmp_opts). If the protocol_blacklist av_dict_set() fails after the
whitelist entry was inserted, the function returns without freeing
this dictionary.
Ensure tmp_opts is freed on this error path.
Signed-off-by: Huihui_Huang <hhhuang@smu.edu.sg>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The w variable counts pixels, not bytes. The non-RLE path correctly
uses w-- (one pixel = 4 bytes), but the RLE path uses w -= 4, causing
the loop to terminate after roughly 1/4 of the expected pixels.
The w -= 4 was introduced in 14e99cb472 which moved the decrement
inside the loop to fix an OOB write (clusterfuzz-5423041009549312).
The move was correct, but the decrement value should have been 1 to
match the non-RLE path.
Signed-off-by: João Neves <joaocns0@protonmail.com>
rle_uncompress() silently discards the return value of rle(). When the
compressed data is malformed and rle() returns AVERROR_INVALIDDATA,
processing continues on a partially filled buffer. Propagate the error
to the caller, which already handles it at line 1420.
Signed-off-by: João Neves <joaocns0@protonmail.com>
Given that no standalone decoder will be present, use a parser to get stream
information that's not reported by the container.
Signed-off-by: James Almer <jamrial@gmail.com>
note that the spec actually disallows the 0 case too but we are
a little lenient here so the full 24bit twos-complement range can be handled
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Obviously noone should pass negative values, they make no sense, but better to
explicitly check
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Reject INFO list chunks that are too small to contain the expected
4-byte list type field before calling ff_read_riff_info().
The parser subtracts 4 from the list size when handing the remaining
payload to ff_read_riff_info(). If the chunk is smaller than 4 bytes,
that underflows the expected structure and should be treated as invalid
input.
Fixes: DFVULN-607
*Vulnerability reported by Zhenpeng (Leo) Lin at depthfirst*
*Patch validated by Zheng Yu at depthfirst*
Fixes: out of array access
Fixes: poc_matroska.mkv
This issue requires manually increasing the malloc limit
(-max_alloc 4294967296)
Found-by: Guanni Qu <qguanni@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When an SPS uses the multi-layer extension (nuh_layer_id > 0 with
sps_max_sub_layers_minus1 == 7), width and height are taken from the
VPS rep_format without the av_image_check_size() validation that the
direct path performs. HEVC F.7.4.3.1.1 requires rep_format pic
dimensions to satisfy the constraints in 7.4.3.2.1, including
"pic_width_in_luma_samples shall not be equal to 0".
Run the same av_image_check_size() check in the multi-layer-extension
path so the SPS is rejected before it reaches setup_pps().
Fixes: VS-FF-2026-0003/poc.flv
Fixes: out of array access
Found-by: Vuln Seeker Cyber Security Team
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add a per-block bounds check at the start of each XOR block so the
read is rejected before src crosses decomp_len, and propagate the
error from decode_frame().
Fixes: out of array read
Found-by: Seung Min Shin
Use a separate scratch buffer (s->mv_scratch) for the type-0 pixel
copy so s->delta and mc are not disturbed for the lifetime of
decode_move(). The new buffer is freed in decode_close().
Found-by: Seung Min Shin
Patch based on suggsted fix by Seung Min Shin
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Reject AU header sections with a signaled length of zero in
rtp_parse_mp4_au().
The AU-headers-length field specifies the length in bits of the AU header
section that immediately follows. A zero-length section is not useful input
for this parser and can lead to invalid downstream state, so reject it
up front together with oversized values.
*Vulnerability reported by Zhenpeng (Leo) Lin at depthfirst*
*Patch validated by Zheng Yu at depthfirst*
Fixes: OOB read
Negative -map processing iterates previously parsed stream map entries
and dereferences input_files[m->file_index]->ctx->streams[m->stream_index]
without validating that stream_index is in range.
A malformed earlier map can leave m->stream_index negative, which causes
an out-of-bounds read when a later negative map walks existing entries.
Check that stream_index is non-negative and below nb_streams before
calling stream_specifier_match().
*Vulnerability reported by Zhenpeng (Leo) Lin at depthfirst*
*Patch validated by Zheng Yu at depthfirst*
Fixes: DFVULN-695
Reject truncated compressed SWF input before attempting to read the
8-byte header in rtmp_calc_swfhash().
Compressed SWF data identified by the "CWS" signature must be at least
8 bytes long to contain the fixed header. Bail out early when the input
is shorter to avoid operating on malformed data.
*Vulnerability reported by Zhenpeng (Leo) Lin at depthfirst*
*Patch validated by Zheng Yu at depthfirst*
Fixes: DFVULN-612
Guard against empty string before reading the last byte in control_url.
When parsing relative a=control: paths, if no base control URL was set,
the code would access control_url[strlen(control_url)-1] which on an
empty string causes a size_t underflow and out-of-bounds read.
Now compute the length first and check for len == 0 before array access.
*Vulnerability reported by Zhenpeng (Leo) Lin at depthfirst*
*Patch validated by Zheng Yu at depthfirst*
Fixes: DFVULN-611
latm_parse_packet() accumulated attacker-controlled AU length bytes in
a signed int and later checked data->pos + cur_len against data->len.
That addition could overflow, allowing malformed packets to bypass the
bounds check and drive memcpy() far past the end of the LATM buffer.
Reject length-byte accumulation that would exceed the remaining packet
size, and compare cur_len against the remaining buffer space using
subtraction so the bounds check cannot overflow.
Fixes: DFVULN-610
*Vulnerability reported by Zhenpeng (Leo) Lin at depthfirst*
*Patch validated by Zheng Yu at depthfirst*
this rejects packets whose claimed decompressed frame would require a deflate ratio beyond the format's theoretical 1032:1 limit
Fixes: Timeout
Fixes: 474457186/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PDV_fuzzer-5366108782919680
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access
Found-by: Seung Min Shin
Patch based on suggested fix by Seung Min Shin
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
av_index_search_timestamp() returns a negative value when a seek target
cannot be resolved from the stream index. Bail out before using that
result as an index into sti->index_entries to avoid out-of-bounds reads.
Fixes: Buffer underflow
Fixes: DFVULN-608
*Vulnerability reported by Zhenpeng (Leo) Lin at depthfirst*
*Patch validated by Zheng Yu at depthfirst*
Technically it's is UB to call function of different type.
Fixes:
src/libavformat/utils.c:531:9: runtime error: call to function handle_variant_args through pointer to incorrect function type 'void (*)(void *, const char *, int, char **, int *)'
src/libavformat/hls.c:379: note: handle_variant_args defined here
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
$(FFMPEG) expands to "ffmpeg.exe" on Windows/MSYS2, and the bare
$(FFMPEG) call falls through to PATH lookup, picking up an externally
installed ffmpeg instead of the freshly built binary in $target_path.
That stale binary lacked the rejection added in a45fe72c9d, causing
msys2-clang64/clangarm64/ucrt64 slots to silently produce 250x2
instead of failing at 500x0.
Wrap with fate-run.sh's run() so $target_exec and $target_path are
resolved correctly on all platforms, matching the convention used by
e.g. fate-id3v2-invalid-tags, and avoiding the ffmpeg() helper's
unrelated default flags.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
These should say frame->linesize[0] as it does everywhere else this
variable is referenced. Fixes a typo bug.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
We need to check that entry->count is nonzero and that entry->type is
AV_TIFF_SHORT before reading from the buffer, in case a maliciously
constructed IFD uses a zero-count or an unusual type (e.g. IFD) for it.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
If the displaymatrix is present, we should synthesize EXIF data from
the values there even if there is no EXIF attached to the frame.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
If we free ifd->entries then we need to set ifd->size to 0 so another
call to av_fast_realloc doesn't get confused.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
If av_exif_matrix_to_orientation returns 0, then the display matrix
is singular. In this case we should treat it as 1 and print a warning.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
When determining if we need to rewrite the exif buffer or can pass
through as-is, account for a difference in header_mode requested from
the one that is used internally.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
The max width and height for PIXEL_X_TAG and PIXEL_Y_TAG is 0xFFFFu
because these are unsigned shorts, but we used < instead of <=
erroneously. Fix that.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
In av_exif_set_entry, if cloning the entry fails because of an alloc
failed, then we remove the entry from the IFD. If that entry exists
in the middle of ifd->entries we need to shift everything to the left
which this commit implements.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Add additional documentation to av_exif_get_entry and also to
av_exif_set_entry that was already part of the existing ABI but was
insufficiently documented before this commit. Also clarifies that
av_fast_realloc is used, instead of av_realloc on av_exif_set_entry.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This is aligned forward by an extra space, because it inheried the
incorrect alignment from the EXIF declaration above it (fixed in the
previous commit).
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This commit re-aligns the declaration by removing extra whitespace
and fixes the comment above to have the correct acronym. It also
documents what the four magic bytes indicate.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
The 16-bit kernel is dispatched for every non-8-bit pixel format
(9/10/12/16-bit content, all stored in uint16_t). It's supposed to
undo the Q16 scaling that set_filter_param() applies to `amount`:
fp->amount = amount * 65536.0;
but the shift written in the kernel is `>> (8+nbits)`, which for the
nbits=16 instantiation of the macro comes out to `>> 24` instead of
`>> 16`. Because of this, on any non-8-bit input, unsharp applies ~1/256
of the user's requested strength and is effectively a no-op. The
8-bit kernel (nbits=8) happens to be correct because 8+8 == 16.
This commit also widens the intermediate product to int64 before the
shift, to avoid a potential overflow. Take a 16-bit pixel at the
edge of a sharp white/black region, with the user-facing `amount`
set to its declared maximum of 5.0.
*srx = 65535
blur = 32768
diff = *srx - blur = 32767
amount_q16 = 5.0 * 65536 = 327680
Then the kernel computes:
product = diff * amount_q16
= 32767 * 327680 = 10,737,090,560 (~1.07e10)
which overflows INT32_MAX. Widening to int64 keeps the
multiplication in range; the subsequent `>> 16` brings it back to
sample range and the final cast to int32 is then safe. The widening
is a semantic no-op for 8/9/10/12-bit content where the product
always fits in int32 (worst case at 12-bit: 4095 * 327680 ~ 1.34e9).
Introduced by ee792ebe08 (2019-11-08, "avfilter/vf_unsharp: add 10bit
support"). The fate-filter-unsharp-yuv420p10 reference added in the
same series was generated from the broken kernel and is regenerated
here. fate-filter-unsharp (8-bit) is unaffected.
Repro:
python3 -c "import numpy as np; y=np.tile(np.where(np.arange(128)//8 & 1, 512, 256).astype('<u2'), (128,1)); c=np.full((64,64), 512, '<u2'); open('in.yuv','wb').write(y.tobytes()+c.tobytes()*2)"
ffmpeg -f rawvideo -pix_fmt yuv420p10le -s 128x128 -i in.yuv \
-lavfi "split=2[a][b];[b]unsharp=la=1[bs];[a][bs]psnr" \
-f null - 2>&1 | grep PSNR
Before: `PSNR y:66.50 ...` -- the filter is effectively a no-op,
so the sharpened output matches the input almost exactly.
After: `PSNR y:28.27 ...` -- the filter actually sharpens, so
output and input differ as expected.
Signed-off-by: Nil Fons Miret <nilf@netflix.com>
Made-with: Cursor
Reject dimg entries with a zero reference count in mov_read_iref_dimg().
This is the earliest point where the parser learns how many input images
a derived HEIF item references, so it is the right place to enforce the
invariant.
If entries == 0 is accepted here, the value is stored in HEIFGrid.nb_tiles,
later propagated by read_image_iovl() into AVStreamGroupTileGrid.nb_tiles,
and finally consumed in istg_parse_tile_grid(), which assumes at least one
tile and reads tg->offsets[tg->nb_tiles - 1]. With zero tiles, that
assumption breaks and leads to the out-of-bounds access seen in ASan.
Fixing the problem at the parser boundary is preferable to adding a later
workaround because it prevents creation of an invalid derived-image state
and stops that malformed state from reaching downstream consumers.
This is also consistent with the HEIF specification. Both iovl and grid
derived images are formed from one or more input images, and for grid the
dimg reference count must equal rows * columns; since rows and columns are
encoded as *_minus_one + 1, that count cannot be zero. A zero dimg entry
count is therefore invalid input and should be rejected when parsed.
E.g. rename ff_put_mpeg4_qpel8_h_lowpass_ssse3 to
ff_mpeg4_put_qpel8_h_lowpass_ssse3.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It allows pic 32bit code to call the underlying
assembly functions directly, without loading
the GOT first; this saves 1245B of .text here
(for 32bit pic code).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
If the subpel part of the horizontal component of
the motion vector is 1/4 or 3/4, the MPEG-4 qpel motion compensation
first computes the mc for the corresponding motion vector
with 1/2 horizontal subpel part and then averages this
with the left (for 1/4) or the right (for 3/4) source pixel.
These two stages are currently performed in two different functions,
involving a stack buffer as intermediate.
This means that horizontal prediction for every function with
a 1/4 or 3/4 horizontal subpel mv is more expensive code-size wise
(and also performance-wise) as it involves two calls. Given that
the horizontal lowpass functions are not that long, adding combinations
of h_lowpass+l2 actually reduces binary size: An increase of 1136B
in the asm files is more than offset by size reductions in
the wrappers: 1968B here when not using stack protection,
2256B when using stack protection.
Of course it also improves performance. Old benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3: 106.9 ( 8.69x)
avg_qpel_pixels_tab[0][3]_ssse3: 105.5 ( 8.84x)
avg_qpel_pixels_tab[0][5]_ssse3: 226.9 ( 8.57x)
avg_qpel_pixels_tab[0][7]_ssse3: 231.1 ( 8.38x)
avg_qpel_pixels_tab[0][9]_ssse3: 217.8 ( 9.04x)
avg_qpel_pixels_tab[0][11]_ssse3: 214.9 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3: 227.1 ( 8.48x)
avg_qpel_pixels_tab[0][15]_ssse3: 236.1 ( 8.02x)
New benchmarks:
avg_qpel_pixels_tab[0][1]_ssse3: 96.7 ( 9.65x)
avg_qpel_pixels_tab[0][3]_ssse3: 96.6 ( 9.73x)
avg_qpel_pixels_tab[0][5]_ssse3: 225.8 ( 8.61x)
avg_qpel_pixels_tab[0][7]_ssse3: 228.4 ( 8.51x)
avg_qpel_pixels_tab[0][9]_ssse3: 217.1 ( 9.05x)
avg_qpel_pixels_tab[0][11]_ssse3: 217.8 ( 9.32x)
avg_qpel_pixels_tab[0][13]_ssse3: 227.2 ( 8.54x)
avg_qpel_pixels_tab[0][15]_ssse3: 220.5 ( 8.72x)
Note: The l2 functions are also used for vertical lowpass
functions, yet given that they are much bigger, duplicating
them would lead to massive code size increase.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also split the big macro into smaller ones for the pure horizontal vs
the pure vertical and the mixed directions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
An installation of frei0r-plugins is required to run the tests,
which is usually seperate from the build headers. Some systems
have it packaged (e.g. apt install frei0r-plugins). An upstream
release extracted to FREI0R_PATH also works.
Signed-off-by: Stefan Breunig <stefan-ffmpeg-devel@breunig.xyz>
Functions like calc_cur_seg_no, calc_min_seg_no, and calc_max_seg_no calculated
the segment number taking into account the first_seq_no.
However, functions like get_segment_start_time_based_on_timeline and
calc_cur_seg_no didn't take first_seq_no into account.
This made dashdec believe that the cur_seq_no was always less than min_seq_no,
logging 'old fragment' and calling calc_cur_seq_no.
In live dash streams with some startNumber, that call to calc_cur_seq_no after
the 'old fragment' log made ffmpeg reposition itself 60 seconds before the
current time whenever the manifest reloaded.
This made ffmpeg skip segments, specially when the manifest reloaded slower
than the segments duration, resulting in a new manifest with more than one new
segment.
Signed-off-by: Nicolas Dato <nicolas.dato@gmail.com>
Some Dash manifests contain Representations within an Adaptation Set that
reference an underlying mp4 context that contain more than the stream it
describes, as is the case of LCEVC enhancements.
Despite the fact open_demux_for_component() loops through all streams in the
underlying context, the rest of the demuxer is writen assuming only the
stream described by the corresponding representation will be present, which
results in completely wrong stream index assignments.
Signed-off-by: James Almer <jamrial@gmail.com>
Some of these files aligned instructions to 4/24 columns, while
we commonly indent arm/aarch64 assembly to 8/24 columns.
Some of these files also used a different alignment for the
operands.
When we try to lowercase register names (e.g. Q0 -> q0) we avoid
doing that for parts of the code that are comments, as comments
occasionally contain pseudocode that contain such mentions that
aren't register names, but pseudocode/reference code variables.
See 7ebb6c54eb for more details
about that.
In addition to recognizing comments starting with //, also
recognize /* and @ (which is a comment char in arm assembly, but
not in aarch64).
We do want to limit an attackers ability to change once parsed structures.
So once extradata (or another array) is finished and possibly has been used we do not
want to allow an attacker to change it.
This reduces the attack surface
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
in mov_read_dops, `size` bytes is allocated for
`st->codecpar->extradata`, but ff_alloc_extradata doesn't memset, so the
contents of that buffer are just old heap data. If `avio_read` reads
fewer bytes than were requested, uninitialized data can still be left in
the extradata buffer, which is operated on by AV_WL16A and AV_WL32A.
I think the best solution here is to just check the read size and ensure
it's filling the extradata buffer in it's entirety, or erroring out if
there isn't enough data left.
Test av_ts_make_string with NOPTS, zero, positive, negative, and
INT64 boundary values, av_ts2str macro, av_ts_make_time_string2
with various timebases, and av_ts_make_time_string pointer
variant.
Coverage for libavutil/timestamp.c: 0.00% -> 100.00%
Test av_tdrdi_alloc with 1 and 3 displays, and the inline
av_tdrdi_get_display accessor. Verifies that the returned
pointer matches entries_offset + idx * entry_size, tests
write/read-back of display width exponent/mantissa and view ID
fields, and OOM paths via av_max_alloc.
Coverage for libavutil/tdrdi.c: 0.00% -> 100.00%
Test av_dynamic_hdr_vivid_alloc and
av_dynamic_hdr_vivid_create_side_data. Verifies zero defaults,
write/read-back of system_start_code, num_windows, and
color transform params (min/avg/var/max RGB), frame side
data attachment, and OOM paths via av_max_alloc.
Coverage for libavutil/hdr_dynamic_vivid_metadata.c: 0.00% -> 100.00%
Test av_buffer_alloc, av_buffer_allocz, av_buffer_create with
custom free callback, AV_BUFFER_FLAG_READONLY, av_buffer_ref,
av_buffer_is_writable, av_buffer_get_ref_count,
av_buffer_make_writable, av_buffer_realloc (including from NULL),
av_buffer_replace (including with NULL), av_buffer_pool
init/get/uninit cycle, av_buffer_pool_init2 with custom alloc
and pool_free callbacks, av_buffer_pool_buffer_get_opaque, and
OOM paths via av_max_alloc.
Coverage for libavutil/buffer.c: 0.00% -> 90.19%
Remaining uncovered lines are mutex init failures and
secondary allocation failure paths.
rtsp_read_announce() treated any non-zero Content-Length as valid,
including negative values parsed via strtol(). This could send invalid
sizes into allocation, body reads and trailing NUL writes.
Accept only strictly positive SDP body lengths and reject invalid
Content-Length values with AVERROR_INVALIDDATA.
Found-by: Seung Min Shin (was reported to us on 10th April)
CC: 신승민 <guncraft2000@naver.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
SVT-AV1 < 3.0.0 requires input dimensions of at least 64x64.
Older versions may otherwise silently accept smaller inputs without
producing output and cause the caller to hang. Reject such inputs
explicitly in config_enc_params() to produce a clear error.
v3.0.0+ supports sub-64px dimensions and validates the
input itself, so the check is gated with SVT_AV1_CHECK_VERSION.
Fix#22817
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Add a regression test covering issue #22817: cascaded scale=...:-2
filters on extreme aspect ratios previously produced zero output
dimensions silently. The test expects ffmpeg to fail fast.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
ff_scale_adjust_dimensions() can now return a negative error code when
the evaluated output dimensions are non-positive. Check the return
value and fail fast instead of continuing with the unadjusted result.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
ff_scale_adjust_dimensions() can now return a negative error code when
the evaluated output dimensions are non-positive. Check the return
value and fail fast instead of continuing with the unadjusted result.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
ff_scale_adjust_dimensions() can now return a negative error code when
the evaluated output dimensions are non-positive. Check the return
value and fail fast instead of continuing with the unadjusted result.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
ff_scale_adjust_dimensions() can now return a negative error code when
the evaluated output dimensions are non-positive. Check the return
value and fail fast instead of continuing with the unadjusted result.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
ff_scale_adjust_dimensions() can now return a negative error code when
the evaluated output dimensions are non-positive. Check the return
value and fail fast instead of continuing with the unadjusted result.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
ff_scale_adjust_dimensions() can now return a negative error code when
the evaluated output dimensions are non-positive. Check the return
value and fail fast instead of continuing with the unadjusted result.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
ff_scale_adjust_dimensions() can now return a negative error code when
the evaluated output dimensions are non-positive. Check the return
value and fail fast instead of continuing with the unadjusted result.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
ff_scale_adjust_dimensions() can now return a negative error code when
the evaluated output dimensions are non-positive. Check the return
value and fail fast instead of continuing with the unadjusted result.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
When scale filter expressions evaluate to zero or negative output
dimensions (e.g. cascaded scale=...:-2 on extreme aspect ratios),
ff_scale_adjust_dimensions() only checked for int32 overflow and
passed them through, potentially hanging downstream components.
Reject them explicitly so the pipeline fails fast.
Callers that currently ignore the return value will be updated in
the following patches to propagate the error.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Port lfe_fir0_float and lfe_fir1_float to AArch64 NEON. These polyphase
FIR interpolation filters have an x86 SSE/AVX path but no AArch64
equivalent, falling back to scalar C.
The inner loop computes two dot products per output pair. Precomputing a
reversed LFE sample vector before the inner loop avoids per-iteration
shuffle overhead.
Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge):
lfe_fir0_float: C 5902.0 cycles -> NEON 2135.0 cycles (2.77x)
lfe_fir1_float: C 2836.3 cycles -> NEON 1527.8 cycles (1.86x)
Measured with: taskset -c 0 ./tests/checkasm/checkasm --test=dcadsp --bench,
3-run average, Ubuntu 22.04 (kernel 6.8.0-1052-aws), perf_event_paranoid=0.
Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>
Assign to `exec_base.in_offset_x` before the error handling,
to ensure the error cleanup path properly frees the already
allocated memory.
Fixes Coverity issue #1691725
Commit 554c2bc708
ported the yuv2planeX functions that are set iff
use_mmx_vfilter is set to external assembly
and did it in a way that resulted in linking failures
when inline assembly is enabled, but external assembly
is disabled. This was later fixed in commit
c00567647e, but in such
a manner that use_mmx_vfilter can be set without any
of the accompanying yuv2planeX functions being set;
and in case inline assembly was unavailable,
these external assembly functions would never be selected.
This makes the filter-fps and filter-fps-cfr tests fail
with inline assembly but with --disable-x86asm, as
reported in issue #21113. Fix this by moving sws_init_swscale_mmxext
directly into ff_sws_init_swscale_x86() and setting
use_mmx_vfilter directly besides the yuv2planeX function pointer.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
NV_ENC_H264_PROFILE_HIGH_10 and NV_ENC_HEVC_PROFILE_MULTIVIEW_MAIN
both equal 3 when their respective NVENC_HAVE_* flags are defined.
The MVHEVC check in nvenc_check_capabilities() matches against
ctx->profile alone, so an H.264 encode with profile=high10 is
rejected as if it were an HEVC multiview request on hardware
without MVHEVC support.
Signed-off-by: Semih Baskan <strst.gs@gmail.com>
When duplicate frames are forced to be kept, forward the input frame
without cloning instead of creating an unnecessary extra reference.
This removes the leak path introduced when clone allocation fails.
For frames that become the new reference, keep using a clone for
forwarding.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
6f811ad never set VK_DEVICE_QUEUE_CREATE_INTERNALLY_SYNCHRONIZED_BIT_KHR
on the queues at vkCreateDevice. Without the flag, the driver doesn't
actually synchronizes queues.
Fixes: 6f811ad751 ("hwcontext_vulkan: implement internal queue synchronization")
vkGetDeviceQueue2 with flags = 0 is equivalent to vkGetDeviceQueue and
is available since Vulkan 1.1. Needed to support queues created with
non-zero VkDeviceQueueCreateFlags.
Fixes VUID-vkGetDeviceQueue-flags-01841 VVL error.
Fixes ticket #22890.
The ff_hls_senc_parse_audio_setup_info function passes setup_data to
parsers like avpriv_ac3_parse_header and init_get_bits8 which require
the buffer to be padded with AV_INPUT_BUFFER_PADDING_SIZE bytes at the end.
This allows JPEG XL images to be recognized as valid attachments.
Since JPEG is already widely used for cover art, JXL's support for
lossless JPEG transcodes can decrease the total size of music collections.
This fixes JXL cover art rendering in applications like mpv which rely
on FFmpeg for demuxing.
Signed-off-by: jade <heartstopp1ng@proton.me>
mf_encv_input_adjust() currently only validates the pixel format and
otherwise leaves the input IMFMediaType unchanged. The Microsoft
H.264, H.265 and AV1 encoder MFTs tolerate this and internally infer
the missing attributes from the previously-set output type. Other
MediaFoundation encoder MFTs that follow the specification more
strictly reject the input type with MF_E_INVALIDMEDIATYPE (due to
MF_E_ATTRIBUTENOTFOUND on MF_MT_FRAME_SIZE / MF_MT_FRAME_RATE) when
those attributes are absent, which causes IMFTransform::SetInputType
to fail and aborts encoding.
Set MF_MT_FRAME_SIZE, MF_MT_FRAME_RATE and MF_MT_INTERLACE_MODE on
the input media type, mirroring what mf_encv_output_adjust() already
writes to the output type. Behaviour on the Microsoft MFTs is
unchanged (they were already using these values) and encoding now
works with stricter third-party MFTs.
The MF_MT_FRAME_SIZE assignment has been present but commented out
since the original MediaFoundation wrapper was added in 050b72ab5e.
Signed-off-by: Ashrit Shetty <ashritshetty@microsoft.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Monochrome formats (gray, gray10le) have log2_chroma_w == 0 and
log2_chroma_h == 0 because they have no chroma planes — the same
values as YUV444. This caused them to be misclassified as YUV444 by
the is_yuv444 detection introduced in bcea693f75.
After fed6612415 changed cuvid_test_capabilities to use is_yuv444
instead of hardcoding cudaVideoChromaFormat_420, monochrome AV1
streams were rejected with "Codec av1_cuvid is not supported with
this chroma format".
Add an nb_components > 1 guard to exclude single-component formats
from the YUV444 path.
Patch by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
AV1CodecConfigurationRecord may contain only the 4-byte header and no
configOBUs. Still skip the header in that case so only configOBUs are
passed to cuvidParseVideoData().
Otherwise the av1C header itself is treated as sequence header data
and AV1 decoding can fail with an unknown error.
Suggested-by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
3-tap [1,2,1]>>2: shared implementation body across size-specialized
entry points (8x8/16x16/32x32) to reduce code size. Fold the 3-tap
kernel into uhadd + urhadd: uhadd gives floor((prev+next)/2), then
urhadd rounds with curr to produce (prev + 2*curr + next + 2) >> 2
on 16 bytes in-place (no widen/narrow needed). Overlap-last technique
for tail avoids partial stores. Caller pads input arrays by 16 bytes
to guarantee safe over-read.
Strong smoothing (32x32): preloaded weight tables, interleaved
umull/umlal pairs (two 16-byte blocks at a time) to hide
rshrn-to-store latency, with paired st1 for 32-byte writes.
checkasm --bench --runs=15 (Apple M4, average of 3 trials):
ref_filter_3tap_8x8_8_neon: 4.1x
ref_filter_3tap_16x16_8_neon: 3.3x
ref_filter_3tap_32x32_8_neon: 2.5x
ref_filter_strong_8_neon: 1.9x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Test 3-tap for 8x8/16x16/32x32 (both filtered_left and
filtered_top outputs). Test strong smoothing for filtered_top
and in-place left modification.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Extract 3-tap [1,2,1]>>2 and strong intra smoothing from
intra_pred() into HEVCPredContext function pointers, preparing
for arch-specific overrides.
ref_filter_3tap[3] indexed by log2_size - 3 (sizes 8/16/32).
ref_filter_strong for 32x32 luma only.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Deprecating MMX w/o performance regression; nearly identical performance
numbers on my Zen 4 (1.99x vs c)
Signed-off-by: Zuxy Meng <zuxy.meng@gmail.com>
This patch adds the transpose_cuda video filter.
It's similar to the existing transpose filter but accelerated by CUDA.
It supports the same pixel formats as the scale_cuda filter.
This also supersedes the deprecated transpose_npp filter.
Example usage:
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i <INPUT> -vf "transpose_cuda=dir=clock" <OUTPUT>
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Add a top-level title and demote former section headings (MD041-style hierarchy).
Add blank lines around headings and fenced code blocks where appropriate (MD022 and MD031-style). Some Markdown parsers, including kramdown, only recognize headings that are preceded by a blank line.
Use a top-level heading on the first line (MD041-style) and adjust section levels for clearer document structure. Improves navigation for assistive technologies that rely on heading outlines.
This muxer seems to intend to support output that does
not begin at zero (instead of e.g. just hardcoding
nb_frames_pos to 16). But then it is possible
that avio_seek() returns values > INT_MAX even
though the part of the file written by us can not
exceed this value. So the return value of avio_seek()
needs to be checked as 64bit integer and not silently
truncated to int.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The number of streams is always one (namely one video stream
with codec id AV_CODEC_ID_PDV) due to the MAX_ONE_OF_EACH,
ONLY_DEFAULT_CODECS flags. Also, the generic code (init_muxer()
in mux.c) checks that video streams have proper dimensions set.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This adds a NEON-optimized function for computing 32x32 Sum of Absolute
Differences (SAD) on AArch64, addressing a gap where x86 had SSE2/AVX2
implementations but AArch64 lacked equivalent coverage.
The implementation mirrors the existing sad8 and sad16 NEON functions,
employing a 4-row unrolled loop with UABAL and UABAL2 instructions for
efficient load-compute interleaving, and four 8x16-bit accumulators to
handle the wider 32-byte rows.
Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge) using checkasm:
sad_32x32_0: C 146.4 cycles -> NEON 98.1 cycles (1.49x speedup)
sad_32x32_1: C 141.4 cycles -> NEON 98.9 cycles (1.43x speedup)
sad_32x32_2: C 140.7 cycles -> NEON 95.0 cycles (1.48x speedup)
Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>
This incorrectly lists the libavcodec major version as 60 instead of
62. Also fix the date and commit hash while at it
Fixes: 7faa6ee2aa ("libavformat/matroska: Support smpte 2094-50 metadata")
Signed-off-by: llyyr <llyyr.public@gmail.com>
This change should improve performance on Skylake and later
Intel CPUs (which have only half the ports for saturated adds/subs
for mmx register compared to xmm register): llvm-mca predicts
a 25% performance improvement on Skylake.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Due to a discrepancy between SSE2 and the C version coefficients
for idct_put and idct_add are restricted to a range not causing
overflows.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to use pavgb to reduce the amount of instructions used
to calculate the average; processing two rows via movhps allows
to reduce the amount of pxor and pavgb even further and turned
out to be beneficial.
This patch also avoids a load as the constant used here can be easily
generated at runtime.
Old benchmarks:
put_no_rnd_pixels_l2_c: 13.3 ( 1.00x)
put_no_rnd_pixels_l2_mmx: 11.6 ( 1.15x)
New benchmarks:
put_no_rnd_pixels_l2_c: 13.4 ( 1.00x)
put_no_rnd_pixels_l2_sse2: 7.5 ( 1.77x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to avoid the stack for the 8 bit simple IDCT;
for the other IDCTs, it avoids storing and restoring two
xmm registers on Win64.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This function is exported, so has to abide by the ABI
and therefore issues emms since commit
5b85ca5317. Yet this is
expensive and using SSE2 instead improves performance.
Also avoid the initial zeroing and the last pointer
increment while just at it.
This removes the last usage of mmx from libavutil*.
Old benchmarks:
sad_8x8_0_c: 13.2 ( 1.00x)
sad_8x8_0_mmxext: 27.8 ( 0.48x)
sad_8x8_1_c: 13.2 ( 1.00x)
sad_8x8_1_mmxext: 27.6 ( 0.48x)
sad_8x8_2_c: 13.3 ( 1.00x)
sad_8x8_2_mmxext: 27.6 ( 0.48x)
New benchmarks:
sad_8x8_0_c: 13.3 ( 1.00x)
sad_8x8_0_sse2: 11.7 ( 1.13x)
sad_8x8_1_c: 13.8 ( 1.00x)
sad_8x8_1_sse2: 11.6 ( 1.20x)
sad_8x8_2_c: 13.2 ( 1.00x)
sad_8x8_2_sse2: 11.8 ( 1.12x)
Hint: Using two psadbw or one psadbw and movhps made no difference
in the benchmarks, so I chose the latter due to smaller codesize.
*: except if lavu provides avpriv_emms for other libraries
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
We reject inputs that are significantly smaller than the smallest frame.
This check raises the minimum input needed before time consuming computations are performed
it thus improves the computation per input byte and reduces the potential DoS impact
Fixes: Timeout
Fixes: 472769364/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SVQ1_DEC_fuzzer-5519737145851904
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Otherwise the buffer for the hdr10+ blockadditional would
be clobbered if both are present (the buffers can only be
reused after the ebml_writer_write() call).
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
7faa6ee2aa added support
for writing AV_PKT_DATA_DYNAMIC_HDR_SMPTE_2094_APP5,
yet forgot to update the size of the EBML element buffer.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Precompute the SILK NLSF residual weights from the stage-1 codebooks and use the table during LPC decode. This removes the per-coefficient mandated fixed-point weight calculation in silk_decode_lpc() while preserving the same decoded values.
Instead of implicitly relying on SwsComps.unused, which contains the exact
same information. (cf. ff_sws_op_list_update_comps)
Signed-off-by: Niklas Haas <git@haasn.dev>
The implementation of AARCH64_SWS_OP_LINEAR loops over elements of this mask
to determine which *output* rows to compute. However, it is being set by this
loop to `op->comps.unused`, which is a mask of unused *input* rows. As such,
it should be looking at `next->comps.unused` instead.
This did not result in problems in practice, because none of the linear
matrices happened to trigger this case (more input columns than output rows).
Signed-off-by: Niklas Haas <git@haasn.dev>
Needed to allow us to phase out SwsComps.unused altogether.
It's worth pointing out the change in semantics; while unused tracks the
unused *input* components, the mask is defined as representing the
computed *output* components.
This is 90% the same, expect for read/write, pack/unpack, and clear; which
are the only operations that can be used to change the number of components.
Signed-off-by: Niklas Haas <git@haasn.dev>
Makes this logic a lot simpler and less brittle. We can trivially adjust the
list of linear masks that are required, whenever it changes as a result of any
future modifications.
Signed-off-by: Niklas Haas <git@haasn.dev>
Using the power of libswscale/tests/sws_ops -summarize lets us see which
kernels are actually needed by real op lists.
Note: I'm working on a separate series which will obsolete this implementation
whack-a-mole game altogether, by generating a list of all possible op kernels
at compile time.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is far more commonly used without an offset than with; so having it there
prevents these special cases from actually doing much good.
Signed-off-by: Niklas Haas <git@haasn.dev>
First vector is %2, not %3. This was never triggered before because all of
the existing masks never hit this exact case.
Signed-off-by: Niklas Haas <git@haasn.dev>
Since this now has an explicit mask, we can just check that directly, instead
of relying on the unused comps hack/trick.
Additionally, this also allows us to distinguish between fixed value and
arbitrary value clears by just having the SwsOpEntry contain NAN values iff
they support any clear value.
Signed-off-by: Niklas Haas <git@haasn.dev>
This does come with a slight change in behavior, as we now don't print the
range information in the case that the range is only known for *unused*
components. However, in practice, that's already guaranteed by update_comps()
stripping the range info explicitly in this case.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of implicitly excluding NAN values if ignore_den0 is set. This
gives callers more explicit control over which values to print, and in
doing so, makes sure "unintended" NaN values are properly printed as such.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of implicitly testing for NaN values. This is mostly a straightforward
translation, but we need some slight extra boilerplate to ensure the mask
is correctly updated when e.g. commuting past a swizzle.
Signed-off-by: Niklas Haas <git@haasn.dev>
This accidentally unconditionally overwrote the entire clear mask, since
Q(n) always set the denominator to 1, resulting in all channels being
cleared instead of just the ones with nonzero denominators.
Signed-off-by: Niklas Haas <git@haasn.dev>
This currently completely fails for images smaller than 12x12; and even in that
case, the limited resolution makes these tests a bit useless.
At the risk of triggering a lot of spurious SSIM regressions for very
small sizes (due to insufficiently modelling the effects of low resolution on
the expected noise), this patch allows us to at least *run* such tests.
Incidentally, 8x8 is the smallest size that passes the SSIM check.
Not only does this take into account extreme edge cases where the plane
padding can significantly exceed the actual width/stride, but it also
correctly takes into account the filter offsets when scaling; which the
previous code completely ignored.
Simpler, robuster, and more correct. Now valgrind passes for 100% of format
conversions for me, with and without scaling.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a mostly straightforward internal mechanical change that I wanted
to isolate from the following commit to make bisection easier in the case of
regressions.
While the number of tail blocks could theoretically be different for input
vs output memcpy, the extra complexity of handling that mismatch (and
adjusting all of the tail offsets, strides etc.) seems not worth it.
I tested this commit by manually setting `p->tail_blocks` to higher values
and seeing if that still passed the self-check under valgrind.
Signed-off-by: Niklas Haas <git@haasn.dev>
The x86 kernel e.g. assumes that at least one block is processed; so avoid
calling this with an empty width. This is currently only possible if e.g.
operating on an unpadded, very small image whose total linesize is less than
a single block.
Signed-off-by: Niklas Haas <git@haasn.dev>
This code had two issues:
1. It was over-allocating bytes for the input offset map case, and
2. It was hard-coding the assumption that there is only a single tail block
We can fix both of these issues by rewriting the way the tail size is derived.
In the non-offset case, and assuming only 1 tail block:
aligned_w - safe_width
= num_blocks * block_size - (num_blocks - 1) * block_size
= block_size
Additionally, the FFMAX(tail_size_in/out) is unnecessary, because:
tail_size = pass->width - safe_width <= aligned_w - safe_width
In the input offset case, we instead realize that the input kernel already
never over-reads the input due to the filter size adjustment/clamping, so
the only thing we need to ensure is that we allocate extra bytes for the
input over-read.
Signed-off-by: Niklas Haas <git@haasn.dev>
The over_read/write fields are not documented as depending on the subsampling
factor. Actually, they are not documented as depending on the plane at all.
If and when we do actually add support for horizontal subsampling to this
code, it will most likely be by turning all of these key variables into
arrays, which will be an upgrade we get basically for free.
Signed-off-by: Niklas Haas <git@haasn.dev>
This makes it far less likely to accidentally add or remove a +7 bias when
repeating this often-used expression.
Signed-off-by: Niklas Haas <git@haasn.dev>
This could trigger if e.g. a backend tries to operate on monow formats with
a block size that is not a multiple of 1. In this case, `block_size_in`
would previously be miscomputed (to e.g. 0), which is obviously wrong.
Signed-off-by: Niklas Haas <git@haasn.dev>
As well as weird edge cases like trying to filter `monow` and pixels landing
in the middle of a byte. Realistically, this will never happen - we'd instead
pre-process it into something byte-aligned, and then dispatch a byte-aligned
filter on it.
However, I need to add a check for overflow in any case, so we might as well
add the alignment check at the same time. It's basically free.
Signed-off-by: Niklas Haas <git@haasn.dev>
Prevents valgrind from complaining about operating on uninitialized bytes.
This should be cheap as it's only done once during setup().
Signed-off-by: Niklas Haas <git@haasn.dev>
This code made the input read conditional on the byte count, but not the
output, leading to a lot of over-write for cases like 15, 5.
Signed-off-by: Niklas Haas <git@haasn.dev>
These align the filter size to a multiple of the internal tap grouping
(either 1/2/4 for vpgatherdd, or the XMM size for the 4x4 transposed kernel).
This may over-read past the natural end of the input buffer, if the aligned
size exceeds the true size.
Signed-off-by: Niklas Haas <git@haasn.dev>
The V-Nova LCEVC pipeline processes frames on internal background
worker threads. LCEVC_ReceiveDecoderPicture returns LCEVC_Again (-1)
when the worker has not yet completed the frame, which is the
documented "not ready, try again" response. The original code treated
any non-zero return as a fatal error (AVERROR_EXTERNAL), causing decode
to abort mid-stream.
Poll until LCEVC_Success or a genuine error is returned.
Signed-off-by: Peter von Kaenel <Peter.vonKaenel@harmonicinc.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Avoids the post_process_opaque_free callback; the only user of
this is already a RefStruct reference and presumably other users
would want to use a pool for this, too, so they would use
RefStruct-objects, too.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When use_loop == true and idx < 0, we would incorrectly check
in_stride[idx], which is OOB read. Reorder conditions to avoid that.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
When the WAV muxer's `-rf64 auto` option is used, the output is intended
to be a normal WAV file if possible, only extended to RF64 format when
the file size grows too large. This was accomplished by reserving space
for the extra RF64-specific data using a standard JUNK chunk (ignored by
readers), then overwriting the reserved space later with a ds64 chunk if
needed.
In the original rf64 auto implementation, the JUNK chunk was placed
right after the RIFF/WAVE file header, before the fmt chunk; this is the
design suggested by the "Achieving compatibility between BWF and RF64"
section of the RF64 spec:
RIFF 'WAVE' <JUNK chunk> <fmt-ck> ...
However, this approach means that the fmt chunk is no longer in its
conventional location at the beginning of the file, and some WAV-reading
tools are confused by this layout. For example, the `file` tool is not
able to show the format information for a file with the extra JUNK chunk
before fmt.
This change shuffles the order of the chunks for `-rf64 auto` mode so
that the reserved space follows fmt instead of preceding it:
RIFF 'WAVE' <fmt-ck> <JUNK chunk> ...
With this small modification, tools expecting the fmt chunk to be the
first chunk in the file work with files produced by `-rf64 auto`.
This means the fmt chunk won't be in the location required by RF64, so
if the automatic RF64 conversion is triggered, the fmt chunk needs to be
relocated by rewriting it following the ds64 chunk during the conversion:
RF64 'WAVE' <ds64 chunk> <fmt-ck> ...
H.264 only uses these functions with height 2 or 4 and
the aarch64, arm and mips versions of them optimize based
on this. Yet this is not true when these functions are used
by the lowres code in mpegvideo_dec.c. So revert back to
the C versions of these functions for mpegvideo_dec so that
the H.264 decoder can still use fully optimized functions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Frame side data unfortunately lacks padding, which CBS needs, so we can't reuse
the existing AVBufferRef.
Signed-off-by: James Almer <jamrial@gmail.com>
These can randomly trigger the alpha/zero fast paths, resulting in spurious
tests or randomly diverging performance if the backend happens to implement
that particular fast path.
Signed-off-by: Niklas Haas <git@haasn.dev>
This was not actually testing integer path. Additionally, for integer
scales, there is a special fast path for expansion from bits to full range,
which we should separate from the random value test.
The overhead of the loop and memcpy call is less than the overhead of
possibly spilling into one extra unnecessary cache line. 64 is still a
good rule of thumb for L1 cache line size in 2026.
I leave it to future code archeologists to find and tweak this constant if
it ever becomes unnecessary.
Signed-off-by: Niklas Haas <git@haasn.dev>
Most of these filters don't test anything meaningfully different relative to
each other; the only filters that really have special significant are POINT
(for now) and maybe BILINEAR down the line.
Apart from that, SINC, combined with the src size loop, already tests both
extreme cases (large and small filters), with large, oscillating unwindonwed
weights.
The other filters are not adding anything of substance to this, while massively
slowing down the runtime of this test. We can, of course, change this if the
backends ever get more nuanced handling.
checkasm: all 855 tests passed (down from 1575)
Signed-off-by: Niklas Haas <git@haasn.dev>
The current code was a bit clumsy in that it always picked the first
available backend when choosing the new function. This meant that some x86
paths were not being tested at all, whenever the memcpy backend (which has
higher priority) could serve the request.
This change makes it so that each backend is explicitly tested against only
implementations provided by that same backend.
checkasm: all 1575 tests passed (up from 1305)
As an aside, it also lets us benchmark the memcpy backend directly against
the C reference backend.
Signed-off-by: Niklas Haas <git@haasn.dev>
These don't actually exist at runtime, and will soon be removed from the
backends as well.
This commit is intentionally a bit incomplete; as I will rewrite this
based on the auto-generated macros in the upcoming ops_micro series.
Signed-off-by: Niklas Haas <git@haasn.dev>
Check that the driver supports both BUFFER_OFFSET and BYTES_WRITTEN
encode feedback flags before creating the query pool, failing with
EINVAL if either is missing.
Set these flags explicitly instead of masking off HAS_OVERRIDES with a
bitwise NOT, which could pass unrecognized bits from newer drivers to
vkCreateQueryPool causing validation errors and
crashes.
Forward-declaring an enum is not legal C (the underlying type of
the enum may depend upon the enum constants, so this may cause
ABI issues with -fshort-enums); compilers warn about this
with -pedantic.
This essentially reverts 7e84865cff.
Notice that almost* all files that include codec_internal.h also
need to include avcodec.h, so this does not lead to unnecessary
rebuilds.
This addresses part of #22684.
*: The only file I am aware of that defines an FFCodec and does not
need AVCodecContext as complete type is null.c (but even it already
includes it implicitly); the avcodec.c test tool seems to be the only
file where this commit actually leads to an unnecessary avcodec.h
inclusion.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
If any of the dstStrides is not aligned mod 16, the warning
above this one will be triggered, setting stride_unaligned_warned,
so that the following check for stride_unaligned_warned will
be always false.
Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The glue code doesn't care about types, so long the functions are
chained correctly. Let's not pretend there is any type safety there, as
the function pointers were casted anyway from unrelated types.
Particularly some f32 and u32 are shared.
This fixes errors like so:
src/libswscale/ops_tmpl_int.c:471:1: runtime error: call to function linear_diagoff3_f32 through pointer to incorrect function type 'void (*)(struct SwsOpIter *, const struct SwsOpImpl *, unsigned int *, unsigned int *, unsigned int *, unsigned int *)'
libswscale/ops_tmpl_float.c:208: note: linear_diagoff3_f32 defined here
Fixes: #22332
It was added to force auto vectorization on GCC builds. Since then auto
vectorization has been enabled for whole code base, 1464930696.
According to GCC documentaiton, the optimize attribute should be used
for debugging purposes only. It is not suitable in production code.
In particular it's unclear whether the attribute is applied, as it's is
actually lost when function is inlined, so usage of it is quite fragile.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Fixes: out of array access
no testcase
Found-by: Joshua Rogers <joshua@joshua.hu> with ZeroPath
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The makeinfo_html variable wasn't being disabled when the makeinfo test
failed, which prevented texi2html from being probed.
Fixes 589da160b2.
Found-by: Luke Jolliffe <luke.jolliffe@bbc.co.uk>
tape_length * 8 overflows 32-bit int for large input widths. Then
av_malloc_array() allocates a tiny buffer while the subsequent
loop writes tape_length*8 BilinearMap entries, causing
heap-buffer-overflow.
Validate the value in float before converting to int and left
shifting, to avoid both float-to-int and signed left shift
overflow UB. Also split av_malloc_array() arguments to avoid
the multiplication overflow.
Fixes: #21511
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
This fix failure:
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i The_Beauty_of_Earth-1.mp4 \
-vf scale_cuda=2880:1440 \
-c:v hevc_nvenc \
-pix_fmt cuda \
-b:v 8M -c:a copy \
-y test_scale.mp4
> Reconfiguring filter graph because hwaccel changed
> Impossible to convert between the formats supported by the filter
> 'Parsed_scale_cuda_0' and the filter 'auto_scale_0'.
> Error reinitializing filters!
Signed-off-by: Zhao Zhili <quinkblack@foxmail.com>
When check_cflags -mvsx fails, the && short-circuit prevents
check_cc from running. Since check_cc is responsible for
disabling vsx on failure, skipping it leaves vsx incorrectly
enabled.
Fix by removing the && so check_cc always executes.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
They have been superseded by SSSE3; the SSE2 version was even disabled
(and segfaults if enabled).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Compared to the MMX version, this version benefits from wider
registers and pmaddubsw. It also has fewer unnecessary loads
and stores: On x64, the MMX version has 12 unnecessary GPR loads
and 6 stores in each line when width is eight; for width 16,
there are 17 unnecessary GPR loads and six stores per line.
Even the 32bit SSSE3 version only has six loads and zero stores
per line more than the x64 version. Furthermore, in contrast
to the MMX version, the SSSE3 version also does not clobber
the array of block pointers given to it.
Benchmarks:
inner_add_yblock_2_c: 29.2 ( 1.00x)
inner_add_yblock_2_mmx: 32.5 ( 0.90x)
inner_add_yblock_2_ssse3: 28.6 ( 1.02x)
inner_add_yblock_4_c: 85.2 ( 1.00x)
inner_add_yblock_4_mmx: 89.2 ( 0.96x)
inner_add_yblock_4_ssse3: 84.5 ( 1.01x)
inner_add_yblock_8_c: 302.0 ( 1.00x)
inner_add_yblock_8_mmx: 77.0 ( 3.92x)
inner_add_yblock_8_ssse3: 30.6 ( 9.85x)
inner_add_yblock_16_c: 1164.7 ( 1.00x)
inner_add_yblock_16_mmx: 260.4 ( 4.47x)
inner_add_yblock_16_ssse3: 82.3 (14.15x)
Both the MMX and SSSE3 versions leave the size 2 and 4 cases
to ff_snow_inner_add_yblock_c() (but the MMX version has
a prologue at the beginning that it needs to undo before
the call, leading to the higher overhead for these sizes).
I don't know why the SSSE3 version is marginally faster than
the C version in these cases.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The first loop was never entered due to a precedence problem;
the second loop initialized everything, although it was not intended
that way.
This has been added in 56b8769a1c.
Sorry for this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only inner_add_yblock for now.
Hint: Said function uses a pointer to an array of pointers as parameter.
The MMX version clobbers the array in such a way that calling the
function repeatedly with the same arguments (as happens inside bench_new())
leads to buffer overflows and segfaults. Therefore CALL4 had to be
overridden to restore the original pointers. This workaround will be
removed soon when the MMX version is removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary and avoids the src_y parameter;
it also makes this function more ASM-friendly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The input lines used in ff_snow_inner_add_yblock()
must always be set (because their values are used).
The MMX assembly always relied on this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This has been done in 561a18d3ba
in order to avoid shifts, yet this rationale no longer applies
since d593e32983. So shift them back;
this is in preparation for using these coefficients together with
pmaddubsw.
Hint: 561a18d3ba also added a block
guarded by "if(LOG2_OBMC_MAX == 8". I changed the condition to remove
this check (i.e. kept the block) which should not change the output
at all. Yet all FATE tests pass if the block is completely
removed. I don't know if this block is necessary at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Possible now that the SSE2 function is available
even when the stack is not aligned.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
x86-32 lacks one GPR, so it needs to be read from the stack.
If the stack needs to be realigned, we can no longer access
the original location of one argument, so just request a bit
more stack size and copy said argument at a fixed offset from
the new stack.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only the lower quadword needs to be rotated, because
the register is zero-extended immediately afterwards anyway.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Write the 24-bit vpcC flags field at the current cursor position after
the version byte. The previous code wrote to p+1 instead of p, leaving
one byte uninitialized between version and flags and shifting all
subsequent fields (profile, level, bitdepth, etc.) by one byte.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Return the actual find_sei_end() error when SEI appending fails instead of
reusing the previous status code. This preserves the real parse failure for
callers instead of reporting malformed SEI handling as success.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
This was originally introduced by commit 05d6cc116e. During the FFmpeg-libav
split, this function was refactored by commit 7e350379f8 into
av_buffersrc_add_frame(), replacing av_buffersrc_add_ref(). The new function
did not include the overflow warning, despite the same being done for
buffersink.
Then, when commit a05a44e205 merged the two functions back together, the
libav implementation was favored over the FFmpeg implementation, silently
removing the overflow warning in the process.
This commit re-adds that missing warning.
Signed-off-by: Niklas Haas <git@haasn.dev>
The multiplanar image with storage_bit enabled fails to be exported
to DMA-BUF on the QCOM turnip driver, thus triggering this double-free issue.
```
[Parsed_hwmap_2 @ 0xffff5c002a70] Configure hwmap vulkan -> drm_prime.
[hwmap @ 0xffff5c001180] Filter input: vulkan, 1920x1080 (0).
[AVHWFramesContext @ 0xffff5c004e00] Unable to export the image as a FD!
free(): double free detected in tcache 2
Aborted
```
Additionally, add back an av_unused attribute. Otherwise, the compiler
will complain about unused variables when CUDA is not enabled.
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
More about deprecating MMX than any performance gain; nearly identical
performance numbers on my Zen 4 (1.36x vs c), but llvm-mca predicts
>60% perf gain on Intel CPUs newer than Skylake.
Signed-off-by: Zuxy Meng <zuxy.meng@gmail.com>
The original intent here was probably to make the ops code agnostic to
which operation is actually last in the list, but the existence of a
divergence between CONTINUE and FINISH already implies that we hard-code
the assumption that the final operation is a write op.
So we can just massively simplify this with a call/ret pair instead of
awkwardly exporting and then jumping back to the return label. This actually
collapses FINISH down into just a plain RET, since the op kernels already
don't set up any extra stack frame.
Signed-off-by: Niklas Haas <git@haasn.dev>
ff_vk_find_struct returns const void *, so storing it in const void *drm_create_pnext
fixes the initialization warning but then dpb_hwfc->create_pnext = drm_create_pnext
assigns const void * to void *, triggering the same warning at that line. The right
fix is a (void *) cast at the call site, same as done for buf_pnext.
Also restrict the GetPhysicalDeviceImageFormatProperties2 verbose log in
try_export_flags to the DRM modifier path only: when has_mods is false the log
always printed mod[0]=0x0, which is misleading since no DRM modifier is involved.
Signed-off-by: Tymur Boiko <tboiko@nvidia.com>
HLS EVENT playlists (e.g. Twitch VODs) are seekable but not finished,
so live_start_index causes playback to begin near the end. The first
packet's DTS then becomes first_timestamp, creating a wrong mapping
between timestamps and segments.
Fix this by subtracting the cumulative duration of skipped segments from
first_timestamp so it reflects the true start of the playlist.
Also set per-stream start_time from first_timestamp so correct time is
reported, reset pts_wrap_reference on seek to prevent bogus wrap
arounds.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Fixes a memory leak caused by AV_MEDIA_TYPE_VIDEO == 0 being excluded by
the !pool->type check. We can just remove the entire check because
av_buffer_pool_uninit() is already safe on NULL.
Fixes: fe2691b3bb
Reported-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
This reduces the number of malloc() & free() calls, and structures the
data for the buffers a bit neatly.
In case more per-buffer data needs to be added, having a separate struct
is useful.
Signed-off-by: Alexandru Ardelean <aardelean@deviqon.com>
In the loop which allocates the buffers for a V4L2 device, if failure
occurs for a certain buffer (e.g. 3rd of 4 buffers), then the previously
allocated buffers (and the buffer array) would not be free'd in
the mmap_init(). This would cause a leak.
This change handles the error cases of that loop to free all allocated
resources, so that when mmap_init() fails nothing is leaked.
Signed-off-by: Alexandru Ardelean <aardelean@deviqon.com>
As a consequence of the fact that the frame pool API doesn't let us directly
access the linesize, we have to "un-translate" the over_read/write back to
the nearest multiple of the pixel size.
Signed-off-by: Niklas Haas <git@haasn.dev>
Allows the pass buffer allocator to make smarter decisions based on the actual
alignment requirements of the specific pass.
Signed-off-by: Niklas Haas <git@haasn.dev>
Matches the semantics of sws_frame_begin(), which also cleans up any
allocated buffers on error.
This is an issue introduced by the commit that allowed ff_sws_graph_run()
to fail in the first place.
Fixes: 563cc8216b
The major consequence of this is that we start allocating buffers per plane,
instead of allocating one contiguous buffer. This makes the no-op/refcopy
case slightly slower, but doesn't meaningfully affect the rest:
yuva444p -> yuva444p, time=157/1000 us (ref=78/1000 us), speedup=0.497x slower
Overall speedup=1.016x faster, min=0.983x max=1.092x
However, this is a necessary consequence of the desire to allow partial plane
allocations / single plane refcopies. This slowdown also does not affect
vf_scale, which already uses avfilter/framepool.c (via ff_get_video_buffer).
Signed-off-by: Niklas Haas <git@haasn.dev>
Saves a pointless free/alloc cycle on reinit. For the vast majority of filter
links, this going to be allocated anyway; and on the occasions that it's not,
the waste is marginal.
Signed-off-by: Niklas Haas <git@haasn.dev>
As per the FFmpeg coding style guidelines, braces should be avoided on
isolated single-line statement bodies.
Signed-off-by: Niklas Haas <git@haasn.dev>
FFALIGN(..., pool->align) = (...) & ~(pool->align - 1), so this condition
equates to: ((...) & ~(align - 1) & (align - 1)), which is trivially 0.
(Note that all expressions are of type `int`)
Signed-off-by: Niklas Haas <git@haasn.dev>
This struct is overally pretty trivial and there is little to no internal
state or invariants that need to be protected.
Making it public allows e.g. libswscale to allocate buffers for individual
planes directly.
Signed-off-by: Niklas Haas <git@haasn.dev>
Replacing the generic `int format` field. This aids in debugging, as
e.g. gdb will tend to translate the strongly typed enums back into human
readable names automatically.
Signed-off-by: Niklas Haas <git@haasn.dev>
This helper is of dubious utility - it was only used to reinitialize the
frame pools, which is better handled by `ff_frame_pool_reinit()`, and at
present only serves to make extending the API harder.
Users who really need to randomly query the state of the frame pool can
already keep track of the values they set.
Signed-off-by: Niklas Haas <git@haasn.dev>
The previous logic (ported from libavfilter/video.c) would leave the frame
pool intact if the linesize did not change as a result of changing the frame
dimensions. However, this caused ff_default_get_video_buffer2() to return
frames with the old width/height.
I think this bug was avoided in practice because the only filters to actually
support changing the resolution at runtime already always explicitly overrode
the width/height of allocated output buffers by the link properties.
Signed-off-by: Niklas Haas <git@haasn.dev>
This moves the check-uninit-reinit logic out of audio.c/video.c and into
framepool.c, where it can be more conveniently re-used by future users.
Signed-off-by: Niklas Haas <git@haasn.dev>
When mapping Vulkan Video frames to DMA-BUF, synchronize using an exportable
binary semaphore and sync_fd where supported. Submit a lightweight exec that
waits on each plane's timeline semaphore at the current value, signals a
SYNC_FD-exportable binary semaphore, then export with vkGetSemaphoreFdKHR.
Store that binary semaphore in AVVkFrameInternal and reuse it across maps
instead of creating and destroying each time: for
VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT, copy transference means a
successful vkGetSemaphoreFdKHR unsignals the semaphore like a wait, so it can
be signaled again on the next map submit. If export is unavailable, fall back
to vkWaitSemaphores.
Moved drm_sync_sem destroy to vulkan_free_internal
Export dma-buf fds with GetMemoryFdKHR for each populated f->mem[i], iterating
up to the sw_format plane count instead of stopping at the image count, so
multi-memory bindings are not skipped. Describe DRM layers using
max(sw planes, image count) and query subresource layout with the correct
aspect and image index when one VkImage backs multiple planes. Reference the
source hw_frames_ctx on the mapped frame and close dma-buf fds on failure paths.
For DMA-BUF-capable pools, honor VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT
from format export queries when binding memory. With DRM modifiers and a
video profile in create_pnext, preserve caller usage and image flags instead of
overwriting them from generic supported_usage probing; use the modifier list
create info when probing export flags for modifier tiling.
Include VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR from the output frames
context's usage together with DST (fixes
VUID-VkVideoBeginCodingInfoKHR-slotIndex-07245) instead of adding DPB usage
only when !is_current.
In ff_vk_decode_add_slice, pass VkVideoProfileListInfoKHR (from the output
frames context's create_pnext) as the pNext argument to
ff_vk_get_pooled_buffer instead of the full create_pnext chain. In
ff_vk_frame_params, set tiling to OPTIMAL only when it is not already
DRM_FORMAT_MODIFIER_EXT. In ff_vk_decode_init, when the output pool's
create_pnext includes VkImageDrmFormatModifierListCreateInfoEXT, initialize the
DPB pool with that modifier-list pNext and DRM_FORMAT_MODIFIER_EXT tiling;
otherwise use VkVideoProfileListInfoKHR and OPTIMAL as before. When
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR is unset, the output
and DPB pools cannot use different layouts or tiling, so the DPB pool must
match the output pool.
Also fix av_hwframe_map ioctl sync_fd export, multi-planar semaphore handling,
and related failure-path cleanup.
Signed-off-by: Tymur Boiko <tboiko@nvidia.com>
SMPTE-2094-50 is an upcoming standard that is close to being
finalized.
Define a side data type for carrying this metadata. And add
functions for parsing and writing it. This is very similar to
the handling of HDR10+ metadata.
The spec is available here: https://github.com/SMPTE/st2094-50
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
This just adds a Vulkan compute-based 360-degree video conversion.
It implements a sufficient subset of the most popular 360-degree video formats.
Options such as rotation are dynamic and can be adjusted during runtime.
Some of the work was based on Paul B. Mahol's patch from 2020. There
were spots where the arithmetic conversion was incorrect.
Add FATE coverage for PDV encoding and decoding via lavf, including
intra and inter frame cases, skip-nokey decoding, and container-level
seek coverage.
Use -strict experimental in the encode commands because the encoder
is marked experimental.
Add a muxer for the Playdate PDV container format.
The muxer writes the frame table and packet layout required by the
Playdate runtime. It requires seekable output and a predeclared
maximum number of frames (-max_frames).
Includes validation for single video stream input, dimension and
framerate checks, and bounded payload/table offset checks. The frame
entry table is allocated once in write_header() using max_frames + 1.
Document the muxer in doc/muxers.texi and add a Changelog entry.
Add a native encoder for the Playdate PDV format.
Supports monob (1-bit) video, producing zlib-compressed intra frames
and XOR-based delta frames.
Includes bounds checking, overflow guards, correct linesize handling
using ptrdiff_t, and proper buffer allocation ordering.
Mark the encoder as experimental by setting AV_CODEC_CAP_EXPERIMENTAL,
since it has not been validated against Panic's official Playdate
player or SDK.
Commit 5b98cea4 added a size < 2 guard to sctp_write() to prevent
out-of-bounds access when max_streams is enabled, but the identical
pattern in sctp_read() was not addressed.
When max_streams is non-zero, sctp_read() passes (buf + 2, size - 2)
to ff_sctp_recvmsg(). If size < 2, size - 2 wraps to a large value
on the implicit cast to size_t in the callee.
Add the same guard.
Signed-off-by: bird <6666242+bird@users.noreply.github.com>
It always returns zero which none of the callers check,
so just return nothing instead.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
With this commit, the RV30 and RV40 decoders no longer clobber
the fpu state for normal decoding (only error resilience can
still do so).
rv34_idct_add_c: 58.1 ( 1.00x)
rv34_idct_add_mmxext: 16.5 ( 3.52x)
rv34_idct_add_ssse3: 12.2 ( 4.76x)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Without setting this flag, apparently gnutls will only send the client
certificate according some logic based on what it thinks the server
accepts.
This is not the case a lot of times.
Just force it to send the client cert the user supplied, if one was
supplied, no matter what.
Fixes#22707
A failure while preparing a dither buffer leaves the newly allocated
buffer outside the cleanup range, leaking Vulkan resources. Make the
failure path cover the current buffer as well.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Fixes a pre-existing latent bug in the subpass splitting, that was
made worse / exposed by 048ca3b367.
Fixes: cba54e9e3b
Signed-off-by: Niklas Haas <git@haasn.dev>
Furthermore, the flags for H264/HEVC have been updated to those renamed in AMF 1.5.0+,
instead of using the old ones that were already marked as deprecated:
AMF_VIDEO_ENCODER_FULL_RANGE_COLOR -> AMF_VIDEO_ENCODER_OUTPUT_FULL_RANGE_COLOR
AMF_VIDEO_ENCODER_HEVC_NOMINAL_RANGE -> AMF_VIDEO_ENCODER_HEVC_OUTPUT_FULL_RANGE_COLOR
The macro content remains the same, therefore it will not cause regressions.
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
Regression since dc4c798970
Handle the case where mp4 is disabled since mp4 as
an optional dependency of hls_muxer.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
The US country_code path in parse_itut_t35_metadata() reads the
the provider_code with bytestream2_get_be16u(), which is a
unchecked version that does not validate the remaining
length before reading. When an AV1 stream contains ITU-T T.35
metadata with country_code set to 0xB5 (which is US) and a
payload shorter than 2 bytes, this results in a heap overflow
reading 2 bytes past the allocation.
The UK country code already guards against this issue by
checking it before the unchecked read. We're using the same
pattern to the US country code path.
Pwno crafted an AV1 IVF with a metadata OBU containing ITU-T T.35
with country_code=0xB5 and a 1-byte payload. Decoding with libdav1d
triggers the overflow. ASan says:
ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 2 at 0x5020000003f0 thread T0
#0 bytestream_get_be16 src/libavcodec/bytestream.h:98
#1 bytestream2_get_be16u src/libavcodec/bytestream.h:98
#2 parse_itut_t35_metadata src/libavcodec/libdav1d.c:376
0x5020000003f1 is located 0 bytes after 1-byte region
Found-by: Pwno
These args will be read directly from the stack into xmm register,
so loading them into GPRs is unnecessary.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Allows the compiler to optimize the the aliasing checks away
and saves 5376B here (GCC 15, -O3).
Also, avoid converting the stride to uint16_t for >8bpp:
stride /= sizeof(pixel) will use an unsigned division
(i.e. a logical right shift)*, which is not what is intended here.
*: If size_t is the corresponding unsigned type to ptrdiff_t
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There are no implementations of yuv2nv12cX clobbering the fpu state,
so make the test stricter to ensure that it stays that way.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is not needed (there are no MMX functions here) and
given that there is no emms_c() cleaning up after convert_unscaled,
convert_unscaled must not clobber the fpu state.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is an x86 implementation using MMX registers, but it actually
issues emms on its own (since 57a29f2e7d).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The last MMX(EXT) convert_unscaled functions have been removed
in 61e851381f. And anyway, there
is no emms_c cleaning up after these functions, so they must not
clobber the fpu state; that they did it at the time this checkasm
test has been added was a bug introduced by
e934194b6a and fixed by the removal
of said MMX(EXT) functions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fix the following issues with the keep option:
- Add similarity check during keep period. Previously, the code
returned early during the keep period without checking if the
frame is actually similar to the reference.
- Reset keep_count on different frames. Previously, the counter
could accumulate across non-consecutive similar frames, causing
frames to be dropped earlier than expected.
- Keep the same frame reference if appropriate. Previously, the
code made similar frames the new reference, causing reference
drift and gradual scene changes.
Signed-off-by: Dana Feng <danaf@twosigma.com>
Test the integer math utility functions: av_gcd, av_rescale,
av_rescale_rnd (all rounding modes including PASS_MINMAX),
av_rescale_q, av_compare_ts, av_compare_mod, av_rescale_delta,
and av_add_stable. Includes large-value tests that exercise the
128-bit multiply path in av_rescale_rnd.
av_bessel_i0 is not tested since it uses floating point math
that is not bitexact across platforms.
Coverage for libavutil/mathematics.c: 0.00% -> 82.03%
Remaining uncovered lines are av_bessel_i0 (float, 23 lines)
and one edge case fallback in av_rescale_delta.
Test all public API functions: name/format round-trip lookups,
bytes_per_sample, is_planar, packed/planar conversions,
alt_sample_fmt, get_sample_fmt_string, samples_get_buffer_size,
samples_alloc, samples_alloc_array_and_samples, samples_copy,
and samples_set_silence. OOM error paths are exercised via
av_max_alloc().
Coverage for libavutil/samplefmt.c: 0.00% -> 95.28%
Remaining uncovered lines are the fill_arrays failure path
and the overlapping memmove branch in samples_copy.
Test the three public API functions: av_rc4_alloc, av_rc4_init,
and av_rc4_crypt. Verifies keystream output against RFC 6229
test vectors for 40, 56, 64, and 128-bit keys, encrypt/decrypt
round-trip, inplace operation, and the invalid key_bits error path.
Coverage for libavutil/rc4.c: 0.00% -> 100.00%
Prior to this, the results were not saturated into the uchar/ushort range before
being written. The characteristics of the Lanczos filter exposed this issue.
In addition, the results were truncated rather than rounded, which resulted
in checkerboard artifacts in solid color areas and were noticeable when
using Lanczos with 8-bit input.
Example:
ffmpeg -init_hw_device cuda -f lavfi -i testsrc2=s=960x540,format=yuv420p \
-vf hwupload,scale_cuda=format=yuv420p:w=-2:h=720:interp_algo=lanczos \
-c:v h264_nvenc -qp:v 20 -t 1 <OUTPUT>
Fix#20784
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
The swscale internals currently have a quirk which causes the memcpy
backend to be called when the pixfmts match. Obviously, this doesn't do
what is expected, as hardware frames cannot just be copied.
Check for this.
Sponsored-by: Sovereign Tech Fund
swscale gets runtime-defined assembly once again!
This commit splits the Vulkan backend into two, SPIR-V and GLSL,
enabling falling back onto the GLSL implementation if an instruction
is unavailable, or simply for testing.
Sponsored-by: Sovereign Tech Fund
This commit adds a SPIR-V assembler header file. It was partially generated
from the SPIR-V header file JSON definition, then edited by hand to template
and reduce its size as much as possible.
It only implements the essentials required for SPIR-V assembly that swscale
requires.
Sponsored-by: Sovereign Tech Fund
Uniform buffers are much simpler to index, and require no work from
the driver compiler to optimize.
In SPIR-V, large 2D shader constants can be spilled into scratch memory,
since you need to create a function variable to index them during runtime.
Sponsored-by: Sovereign Tech Fund
The issue is that very often, hardware has limited support for BGRA
formats.
As this is a limitation of Vulkan itself, we cannot work around this
in a compatible way.
Sponsored-by: Sovereign Tech Fund
FFmpeg has had an issue with GLSL compilation libraries since they
were first merged 6 years ago. The libraries don't have a stable ABI,
are very difficult for packagers to compile and integrate, are slow,
not threadsafe, and uncomfortable to use. The decision to switch all
Vulkan code to either compile-time GLSL or SPIR-V assembly was taken
in January, and since then, and included with the release of FFmpeg 8.1,
the progress has been steadily eliminating all remaining runtime GLSL
compilation.
Sponsored-by: Sovereign Tech Fund
The main issue is that BGR formats only semi-exist in Vulkan. Unlike all
other formats, they require the user to manually remap the pixel order, and
are also forbidden from being written to without a format in shaders. The main
reason for this was conservative - Vulkan is supposed to work everywhere, including
platforms where there is no write-time remapping/swizzing support.
Sponsored-by: Sovereign Tech Fund
The issue is that with multiplane images, or packed images,
there may be some mismatching between what .elems has, and what
we need.
Descriptors are cheap, so just always reserve 4.
Sponsored-by: Sovereign Tech Fund
The issue is that the main Vulkan context is shared between possibly
multiple shaders, and registering a new shader requires allocating
descriptors.
Sponsored-by: Sovereign Tech Fund
Multiple demuxers call avio_read() without checking its return
value. When input is truncated, destination buffers remain
uninitialized but are still used for offset calculations, memcmp,
and metadata handling. This results in undefined behavior
(detectable with Valgrind/MSan).
Fix this by checking the return value of avio_read() in:
- dss.c: dss_read_seek() — check before using header buffer
- dtshddec.c: FILEINFO chunk — check before using value buffer
- mlvdec.c: check_file_header() — check before memcmp on version
Fixes: #21520
This fixes dummy warnings when link/lld-link is called by the clang:
lld-link: warning: ignoring unknown argument '--as-needed'
lld-link: warning: ignoring unknown argument '-rpath-link=:libswresample:libswscale:libavfilter:libavdevice:libavformat:libavcodec:libavutil'
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Fixes host binaries compilation on platforms without math lib.
Fixes clang host compilation, which replaces `-lm` with `m.lib` that
does not exist:
LINK : fatal error LNK1181: cannot open input file 'm.lib'
clang: error: linker command failed with exit code 1181 (use -v to see invocation)
Fixes MSVC (cl) host warning:
cl : Command line warning D9002 : ignoring unknown option '-lm'
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This uses llvm tools. `clang-*` toolchain is left mostly for backward
compatibility, although it doesn't use llvm tools, only clang. On top of
that it's for enabling sanitizers. While `llvm` toolchain can be use
without sanitizer suffix.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
handle_rtx_packet() constructs an RTX packet by shifting the payload
of a history entry to insert the original sequence number. It uses
memmove with length (ori_size - 12), but never checks that ori_size
is at least 12 bytes (the minimum RTP header size).
Zero-initialized history slots have seq == 0 and size == 0.
rtp_history_find() only compares sequence numbers, so an RTCP NACK
requesting seq 0 early in a session matches such a slot. The
subtraction then wraps to a huge value when converted to size_t,
causing a stack buffer overflow in memmove().
Add a little size check to reject history entries smaller than and
valid RTP header before any arithmetic on their size.
Found-by: Pwno
It was a bit clunky, lacked semantic contextual information, and made it
harder to reason about the effects of extending this struct. There should be
zero runtime overhead as a result of the fact that this is already a big
union.
I made the changes in this commit by hand, but due to the length and noise
level of the commit, I used Opus 4.6 to verify that I did not accidentally
introduce any bugs or typos.
Signed-off-by: Niklas Haas <git@haasn.dev>
This has the side benefit of not relying on the q2pixel macro to avoid division
by zero, since we can now explicitly avoid operating on undefined clear values.
Signed-off-by: Niklas Haas <git@haasn.dev>
Apple VideoToolbox is the dominant producer of hevc-alpha videos, but
early versions generates non-standard VPS extensions that fail to
parse and return AVERROR_INVALIDDATA. Fix this by returning
AVERROR_PATCHWELCOME instead of AVERROR_INVALIDDATA for unsupported
VPS extension configurations. Setting poc_lsb_not_present for the
alpha layer in the fallback path when it has no direct dependency
on the base layer, so that IDR slices on the alpha layer won't
incorrectly read pic_order_cnt_lsb.
Fix#22384
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
ff_frame_new_side_data() may set sd to NULL and return 0 when
side_data_pref() determines that existing side data should be
preferred.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
ff_frame_new_side_data() may set sd to NULL and return 0 when
side_data_pref() determines that existing side data should be
preferred.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
libvidstab's vsTransformPrepare() takes different internal code paths
for in-place (src == dest) vs. separate-buffer operation. The
separate-buffer path stores a shallow copy of the source frame pointer
in td->src without allocating internal memory (srcMalloced stays 0).
When a subsequent frame takes the in-place path, vsFrameIsNull(&td->src)
is false so vsFrameAllocate() is skipped, and vsFrameCopy() writes into
the stale pointer left over from the previous frame, corrupting memory
that the caller no longer owns.
Whether a given frame is writable depends on pipeline scheduling and
frame reference management, which can change between FFmpeg versions.
Since FFmpeg 8.1, changes in the scheduler caused some frames to arrive
as non-writable, leading to alternation between in-place and
separate-buffer paths that triggered the bug.
Fix this by marking the input pad with AVFILTERPAD_FLAG_NEEDS_WRITABLE.
Fix#22595
We currently don't have any cases where this is needed, but include
it for completeness and clarity.
These macros for BTI were added in
08b4716a9e.
A later comment in this file, added in
248986a0db, referenced the macro
AARCH64_VALID_JUMP_CALL_TARGET which never was added here before.
Unit test covering av_video_enc_params_alloc,
av_video_enc_params_block, and
av_video_enc_params_create_side_data.
Tests allocation for all three codec types (VP9, H264, MPEG2) and
the NONE type, with 0 and 4 blocks, with and without size output.
Verifies block getter indexing by writing and reading back
coordinates, dimensions, and delta_qp values. Tests frame-level qp
and delta_qp fields, and side data creation with frame attachment.
Coverage for libavutil/video_enc_params.c: 0.00% -> 86.21%
(remaining uncovered lines are OOM error paths)
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Unit test covering av_detection_bbox_alloc, av_get_detection_bbox,
and av_detection_bbox_create_side_data.
Tests allocation with 0, 1, and 4 bounding boxes, with and without
size output. Verifies bbox getter indexing by writing and reading
back coordinates, labels, and confidence values. Tests classify
fields (labels and confidences), the header source field, and
side data creation with frame attachment.
Coverage for libavutil/detection_bbox.c: 0.00% -> 86.67%
(remaining uncovered lines are OOM error paths)
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Unit test covering all 4 public API functions in libavutil/spherical.c:
av_spherical_alloc, av_spherical_projection_name, av_spherical_from_name,
and av_spherical_tile_bounds.
Tests allocation with and without size output, all 7 projection type
name lookups, projection name round-trip verification, out-of-range
handling, and tile bounds computation for full-frame, quarter-tile,
and centered-tile configurations.
Coverage for libavutil/spherical.c: 0.00% -> 100.00%
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
It is only needed in the unlikely codepath. The ordinary one
only uses six xmm registers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only the process functions are entered via an indirect _call_ from C.
The kernel functions and process_return are dispatched to by indirect
_branches_ instead (continuation-passing style design).
Make use of the recently added "jumpable" parameter to the function
macro in libavutil/aarch64/asm.S to fix these functions when BTI is
enabled.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The function macro emits AARCH64_VALID_CALL_TARGET for exported symbols,
marking them as valid destinations for indirect _calls_. Functions that
are reached by indirect _branches_ (i.e. tail-call dispatch chains
where the link register is not set) require AARCH64_VALID_JUMP_TARGET
instead.
This commit adds a "jumpable" parameter to the function macro that, when
set, emits AARCH64_VALID_JUMP_TARGET instead of AARCH64_VALID_CALL_TARGET.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Using AMF interfaces in C can be cumbersome and visually difficult to process in some cases: i.e.: object->function(object, args). To improve code readability, a new macro is added. This commit is instrumental for future AMF integration refactoring.
-vf_vpp_amf.c: Remove unused variables.
-vf_amf_common.c: Fix hdrmeta_buffer memory leak.
-hwcontext_amf.c: Fix av_amf_extract_hdr_metadata not picking up light metadata if display mastering metadata is not set.
-doc/filters.texi: Remove irrelevant example with HDR metadata for vpp_amf.
The use of code section (.text) was forced by the unreleased NASM
3.02rc3 which made the issue worse, but preventing assambling anything
without code section, including when only data was present.
This works fine for the most part, but using code (.text) section with
IMAGE_COMDAT_SELECT_ANY causes issues with lib.exe after stripping such
object:
fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x2
Esentially it makes our workaround not work in all cases, and while
string could be disabled like it already is for MSVC/ICL builds, it used
to work so let's preserve that state.
This make it not compatible with NASM 3.02rc3 when CV debug info is
generated, but hopefully the upstream fix will be merged before release,
to avoid this regression:
https://github.com/netwide-assembler/nasm/pull/221
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Add NEON-optimized implementation for HEVC intra Planar prediction at
8-bit depth, supporting all block sizes (4x4 to 32x32).
Planar prediction implements bilinear interpolation using an incremental
base update: base_{y+1}[x] = base_y[x] - (top[x] - left[N]), reducing
per-row computation from 4 multiply-adds to 1 subtract + 1 multiply.
Uses rshrn for rounded narrowing shifts, eliminating manual rounding
bias. All left[y] values are broadcast in the NEON domain, avoiding
GP-to-NEON transfers.
4x4 interleaves row computations across 4 rows to break dependencies.
16x16 uses v19-v22 for persistent base/decrement vectors, avoiding
callee-saved register spills. 32x32 processes 8 rows per loop iteration
(4 iterations total) to reduce code size while maintaining full NEON
utilization.
Speedup over C on Apple M4 (checkasm --bench):
4x4: 2.25x 8x8: 6.40x 16x16: 9.72x 32x32: 3.21x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Add NEON-optimized implementation for HEVC intra DC prediction at 8-bit
depth, supporting all block sizes (4x4 to 32x32).
DC prediction computes the average of top and left reference samples
using uaddlv, with urshr for rounded division. For luma blocks smaller
than 32x32, edge smoothing is applied: the first row and column are
blended toward the reference using (ref[i] + 3*dc + 2) >> 2 computed
entirely in the NEON domain. Fill stores use pre-computed address
patterns to break dependency chains.
Also adds the aarch64 initialization framework (Makefile, pred.c/pred.h
hooks, hevcpred_init_aarch64.c).
Speedup over C on Apple M4 (checkasm --bench):
4x4: 2.28x 8x8: 3.14x 16x16: 3.29x 32x32: 3.02x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Add checkasm test for HEVC intra prediction covering DC, planar, and
angular modes at all block sizes (4x4 to 32x32) for 8-bit and 10-bit
depth.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
In case of >8bpp, there is already a zero register available
(for clipping); in case of Unix64, one can simply use an
unused register. Doing so reduces codesize.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Avoids push+pop on Win64; in any case, using registers m0-m7
more often saves codesize.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Avoids push+pop on Win64; in any case, using registers m0-m7
more often saves codesize.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also use a register in the 0-7 range as clobber reg,
as this reduces codesize (by 51B).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The height 8 and 16 cases differ from the second BDOF mini block onwards,
but even the beginning of said mini block is the same and can therefore
be deduplicated. This saves 821B here.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
m8 here (corresponding to a mix of sgx2 and sgy2 in derive_bdof_vx_vy
in the C version) is always nonnegative, so the psignd boils down to
a check for m8 being zero. But if an entry of m8 is zero, then
the corresponding entry of m9 is automatically zero, too, as sgx2
being zero implies sgxdi being zero and sgy2 implies sgxgy, sgydi
being zero.* So just remove these redundant instructions.
*: In other words, one could remove the sgx2,sgy2>0 checks from
the end of derive_bdof_vx_vy() as long as av_log2(0) is defined.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This commit pieces together the previous few commits to implement the
NEON backend for sws_ops.
In essence, a tool which runs on the target (sws_ops_aarch64) is used
to enumerate all the functions that the backend needs to implement. The
list it generates is stored in the repository (ops_entries.c).
The list from above is used at build time by a code generator tool
(ops_asmgen) to implement all the sws_ops functions the NEON backend
supports, and generate a lookup function in C to retrieve the assembly
function pointers.
At runtime, the NEON backend fetches the function pointers to the
assembly functions and chains them together in a continuation-passing
style design, similar to the x86 backend.
The following speedup is observed from legacy swscale to NEON:
A520: Overall speedup=3.780x faster, min=0.137x max=91.928x
A720: Overall speedup=4.129x faster, min=0.234x max=92.424x
And the following from the C sws_ops implementation to NEON:
A520: Overall speedup=5.513x faster, min=0.927x max=14.169x
A720: Overall speedup=4.786x faster, min=0.585x max=20.157x
The slowdowns from legacy to NEON are the same for C/x86. Mostly low
bit-depth conversions that did not perform dithering in legacy.
The 0.585x outlier from C to NEON is gbrpf32le -> gbrapf32le, which is
mostly memcpy with the C implementation. All other conversions are
better.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The NEON sws_ops backend follows the same continuation-passing style
design as the x86 backend.
Unlike the C and x86 backends, which implement the various operation
functions through the use of templates and preprocessor macros, the
NEON backend uses a build-time code generator, which is introduced by
this commit.
This code generator has two modes of operation:
-ops:
Generates an assembly file in GNU assembler syntax targeting AArch64,
which implements all the sws_ops functions the NEON backend supports.
-lookup:
Generates a C function with a hierarchical condition chain that
returns the pointer to one of the functions generated above, based on
a given set of parameters derived from SwsOp.
This is the core of the NEON sws_ops backend.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The runtime assembler interface provides an instruction-level IR and
builder API tailored to the needs of the swscale dynamic pipeline.
It is not meant to be a general purpose assembler interface.
Currently only a static file backend, which emits GNU assembler text,
has been implemented. In the future, this interface will be used to
write functions dynamically at runtime.
This code will be compiled both for runtime usage to generate optimized
functions and for build-time usage to generate static assembly files.
Therefore, it must not depend on internal FFmpeg libraries.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The NEON sws_ops backend will use a build-time code generator for the
various operation functions it needs to implement. This build time code
generator (ops_asmgen) will need a list of the operations that must be
implemented. This commit adds a tool (sws_ops_aarch64) that generates
such a list (ops_entries.c).
The list is generated by iterating over all possible conversion
combinations and collecting the parameters for each NEON assembly
function that has to be implemented, defined by an unique set of
parameters derived from SwsOp. Whenever swscale evolves, with improved
optimization passes, new pixel formats, or improvements to the backend
itself, this file (ops_entries.c) should be regenerated by running:
$ make sws_ops_entries_aarch64
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This is needed to cover the case when assembled source doesn't have
.text section. NASM documentation suggest to add $ suffix to section
name for COMDAT in .text, but this actually requires the main .text
section to exist also. And use less generic suffix for our dummy
sub-section.
Third time's the charm.
Fixes: 80cd067715
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
The existing fate-lavf-yuv420p.y4m covers only the default format.
Add four entries that pass -pix_fmt explicitly to the lavf_video
macro: yuv422p, yuv444p, yuv411p, and gray.
These exercise the branches in yuv4mpegpipe_write_header() that write
the "C422", "C444", "C411", and "Cmono" chroma descriptor strings in
the stream header. All four are gated on ENCDEC(RAWVIDEO,YUV4MPEGPIPE)
and added to FATE_LAVF_VIDEO_SCALE so they inherit the requirement for
CONFIG_SCALE_FILTER that lavf_video's -auto_conversion_filters needs.
Reference files were generated from the actual encoder output and
follow the md5+size+CRC format used by the other lavf references.
Signed-off-by: Soham Kute <officialsohamkute@gmail.com>
Add tests/api/api-enc-parser-test.c, a generic encoder+parser round-trip
test that takes codec_name, width, and height on the command line
(defaults: h261 176 144).
Three cases are tested:
garbage - a single av_parser_parse2() call on 8 bytes with no Picture
Start Code; verifies out_size == 0 so the parser emits no spurious data.
bulk - encodes 2 frames, concatenates the raw packets, feeds the whole
buffer to a fresh parser in one call, then flushes. Verifies that
exactly 2 non-empty frames come out and that the parser found the PSC
boundary between them.
split - the same buffer fed in two halves (chunk boundary falls inside
frame 0). Verifies the parser still emits exactly 2 frames when input
arrives incrementally, and that the collected bytes are identical to
the bulk output (checked with memcmp).
Implementation notes: avcodec_get_supported_config() selects the pixel
format; chroma height uses AV_CEIL_RSHIFT with log2_chroma_h from
AVPixFmtDescriptor; data[1] and data[2] are checked independently so
semi-planar formats work; the encoded buffer is given
AV_INPUT_BUFFER_PADDING_SIZE zero bytes at the end; parse_stream()
skips the fed chunk if consumed==0 to prevent an infinite loop.
Two FATE entries in tests/fate/api.mak: QCIF (176x144) and CIF
(352x288), both standard H.261 resolutions.
Signed-off-by: Soham Kute <officialsohamkute@gmail.com>
The original test only mapped the source file and printed its content,
exercising none of the error branches in av_file_map().
Replace it with a test that maps a real file (path via argv[1] for
out-of-tree builds) and verifies it is non-empty, then calls
av_file_map() on a nonexistent file twice: once with log_offset=0 to
confirm the error is logged at AV_LOG_ERROR, and once with log_offset=1
to confirm the level is raised by one, covering the
log_level_offset_offset path in av_vlog(). A custom av_log callback
captures the emitted level independently of the global log level.
The two error cases share a single for() loop to avoid duplication.
Add a FATE entry in tests/fate/libavutil.mak with CMP=null since
there is no fixed stdout to compare.
Signed-off-by: Soham Kute <officialsohamkute@gmail.com>
This is consistent pattern with other files. Also is needed for next
commit to always include x86util.asm
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Fix the default value of mpegts_original_network_id from 0x0001 to
0xff01 to match the actual code (DVB_PRIVATE_NETWORK_START).
Add the missing hevc_digital_hdtv service type to the
mpegts_service_type option list.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
While "cc + 1 & 0xf" is technically correct because addition has
higher precedence than bitwise AND in C, the intent of "(cc + 1) & 0xf"
is not immediately obvious without recalling the precedence table.
Add explicit parentheses to make the intended evaluation order clear
and improve readability.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Instead of this needlessly complicated dance of allocating on-stack copies
of SwsOpList only to iterate with AVERROR(EAGAIN).
This was originally thought to be useful for compiling multiple ops at once,
but even that can be solved in easier ways.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is now fully redundant with the previous op's output; because unused
components are always marked as garbage on the input side.
Signed-off-by: Niklas Haas <git@haasn.dev>
Needed for the upcoming removal of op->comps.unused[]. This keeps the
dependency array entirely within the ff_sws_op_list_update_comps() function,
apart from being arguably simpler and easier to follow.
Signed-off-by: Niklas Haas <git@haasn.dev>
Just define these directly as integer arrays; there's really no point in
having them re-use SwsSwizzleOp; the only place this was ever even remotely
relevant was in the no-op check, which any decent compiler should already
be capable of optimizing into a single 32-bit comparison.
Signed-off-by: Niklas Haas <git@haasn.dev>
"Reconfiguring filter graph because video parameters changed to yuv420p10le(pc, bt709), 1920x1080, unspecified alph"
Fixup f07573f
Adding a missing space fixed this.
For pre-AVX2, vpbroadcastw is emulated via a load, followed
by two shuffles. Yet given that one always wants to splat
multiple pairs of coefficients which are adjacent in memory,
one can do better than that: Load all of them at once, perform
a punpcklwd with itself and use one pshufd per register.
In case one has to sign-extend the coefficients, too,
one can replace the punpcklwd with one pmovsxbw (instead of one
per register) and use pshufd directly afterwards.
This saved 4816B of .text here.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
8 tap motion compensation functions with both vertical and horizontal
components are under severe register pressure, so that the filter
coefficients have to be put on the stack. Before this commit,
this meant that coefficients for use with pmaddubsw and pmaddwd
were always created. Yet this is completely unnecessary, as
every such register is only used for exactly one purpose and
it is known at compile time which one it is (only 8bit horizontal
filters are used with pmaddubsw), so only prepare that one.
This also allows to half the amount of stack used.
This saves 2432B of .text here.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It has already been checked before that we are only dealing
with high bitdepth here.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Since ba793127c4,
the x86 mpeg4videodsp code uses ff_emulated_edge_mc_sse2()
instead of ff_emulated_edge_mc_8. This leads to linker errors
when x86asm is disabled. Fix this by also falling back to ff_gmc_c()
in case edge emulation is needed with external SSE2 being unavailable.
An alternative is to go back to ff_emulated_edge_mc_8(), but this
would readd the uglyness to videodsp for a niche case.
Reported-by: James Almer <jamrial@gmail.com>
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Some faulty files have an LCEVC descriptor with a single stream, resulting in
a group being created but never fully populated with the current
implementation.
Signed-off-by: James Almer <jamrial@gmail.com>
Leave the existing one for non decoder-specific, post processing usage.
With this, scenarios like nvdec decoding can work algonside lcevc enhancement application.
Signed-off-by: James Almer <jamrial@gmail.com>
> packed = load all components from a single plane (the index given by order_src[0])
> planar = load one component each from separate planes (the index given by order_src[i])
Sponsored-by: Sovereign Tech Fund
The question of whether to do vertical or horizontal scaling first is a tricky
one. There are several valid philosophies:
1. Prefer horizontal scaling on the smaller pixel size, since this lowers the
cost of gather-based kernels.
2. Prefer minimizing the number of total filter taps, i.e. minimizing the size
of the intermediate image.
3. Prefer minimizing the number of rows horizontal scaling is applied to.
Empirically, I'm still not sure which approach is best overall, and it probably
depends at least a bit on the exact filter kernels in use. But for now, I
opted to implement approach 3, which seems to work well. I will re-evaluate
this once the filter kernels are actually finalized.
The 'scale' in 'libswscale' can now stand for 'scaling'.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Above a certain filter size, we can load the offsets as scalars and loop
over filter taps instead. To avoid having to assemble the output register
in memory (or use some horrific sequence of blends and insertions), we process
4 adjacent pixels at a time and do a 4x4 transpose before accumulating the
weights.
Significantly faster than the existing kernels after 2-3 iterations.
Signed-off-by: Niklas Haas <git@haasn.dev>
This uses a naive gather-based loop, similar to the existing legacy hscale
SIMD. This has provably correct semantics (and avoids overflow as long as
the filter scale is 1 << 14 or so), though it's not particularly fast for
larger filter sizes.
We can specialize this to more efficient implementations in a subset of cases,
but for now, this guarantees a match to the C code.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Ideally, we would like to be able to specialize these to fixed kernel
sizes as well (e.g. 2 taps), but that only saves a tiny bit of loop overhead
and at the moment I have more pressing things to focus on.
I found that using FMA instead of straight mulps/addps gains about 15%, so
I defined a separate FMA path that can be used when BITEXACT is not specified
(or when we can statically guarantee that the final sum fits into the floating
point range).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
A singular `imul` per line here is completely irrelevant in terms of
overhead, and definitely not the worth of whatever precomputation would be
required to avoid it.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Missing ayuv64le -> gray and vyu444 -> gray; these conversions can arise
transiently during scaling.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Sometimes, bxd/yd need to be passed directly to a 64-bit memory operand,
which requires the use of the 64-bit variants. Since we can't guarantee that
the high bits are correctly zero'd on function entry, add an explicit
movsxd instruction to cover the first loop iteration.
Signed-off-by: Niklas Haas <git@haasn.dev>
Slightly more convenient for the calculations inside the filter kernel, and
ultimately not significant due to the fact that the extra register only needs
to be saved on the loop entrypoint.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Allows implementations to implement more advanced logic to determine if an
operation is compatible or not.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is an expected consequence of the fact that the new ops code does not
yet do error diffusion, which only really affects formats like rgb4 and monow.
Specifically, this avoids erroring out with the following error:
loss 0.214988 is WORSE by 0.0111071, ref loss 0.203881
SSIM {Y=0.745148 U=1.000000 V=1.000000 A=1.000000}
When scaling monow -> monow from 96x96 to 128x96.
We can remove this hack again in the future when error diffusion is implemented,
but for now, this check prevents me from easily testing the scaling code.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
May allow more efficient implementations that rely on the value range being
constrained.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
These could be implemented as a special case of DECL_READ(), but the
amount of extra noise that entails is not worth it; especially due to the
extra setup/free code that needs to be used here.
I've decided that, for now, the canonical implementation shall convert the
weights to floating point before doing the actual scaling. This is not a huge
efficiency loss (since the result will be 32-bit anyways, and mulps/addps are
1-cycle ops); so the main downside comes from the single extra float conversion
on the input pixels.
In theory, we may revisit this later if it turns out that using e.g. pmaddwd
is a win even for vertical scaling, but for now, this works and is a simple
starting point. Vertical scaling also tends to happen after horizontal scaling,
at which point the input will be F32 already to begin with.
For smaller types/kernels (e.g. U8 input with a reasonably sized kernel),
the result here is exact either way, since the resulting 8+14 bit sum fits
exactly into float.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
PIXEL_MIN is either useless (int) or wrong (float); should be -FLT_MAX
rather than FLT_MIN, if the intent is to capture the most negative possible
value.
Just remove it since we don't actually need it for anything.
Signed-off-by: Niklas Haas <git@haasn.dev>
First, we try compiling the filter pass as-is; in case any backends decide to
handle the filter as a single pass. (e.g. Vulkan, which will want to compile
such using internal temporary buffers and barriers)
If that fails, retry with a chained list of split passes.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is cheap to precompute and can be used as-is for gather-style horizontal
filter implementations.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Rather than dispatching the compiled function for each line of the tail
individually, with a memcpy to a shared buffer in between, this instead copies
the entire tail region into a temporary intermediate buffer, processes it with
a single dispatch call, and then copies the entire result back to the
destination.
The main benefit of this is that it enables scaling, subsampling or other
quirky layouts to continue working, which may require accessing lines adjacent
to the main input.
It also arguably makes the code a bit simpler and easier to follow, but YMMV.
One minor consequence of the change in logic is that we also no longer handle
the last row of an unpadded input buffer separately - instead, if *any* row
needs to be padded, *all* rows in the current slice will be padded. This is
a bit less efficient but much more predictable, and as discussed, basically
required for scaling/filtering anyways.
While we could implement some sort of hybrid regime where we only use the new
logic when scaling is needed, I really don't think this would gain us anything
concrete enough to be worth the effort, especially since the performance is
basically roughly the same across the board:
16 threads:
yuv444p 1920x1080 -> ayuv 1920x1080: speedup=1.000x slower (input memcpy)
rgb24 1920x1080 -> argb 1920x1080: speedup=1.012x faster (output memcpy)
1 thread:
yuv444p 1920x1080 -> ayuv 1920x1080: speedup=1.062x faster (input memcpy)
rgb24 1920x1080 -> argb 1920x1080: speedup=0.959x slower (output memcpy)
Overall speedup is +/- 1% across the board, well within margin of error.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is more useful for tight loops inside CPU backends, which can implement
this by having a shared path for incrementing to the next line (as normal),
and then a separate path for adding an extra position-dependent, stride
multiplied line offset after each completed line.
As a free upside, this encoding does not require any separate/special handling
for the exec tail.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
And use it to look up the correct source plane line for each destination
line. Needed for vertical scaling, in which case multiple output lines can
reference the same input line.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
We have to move the filters out of the way very early to avoid blocking
SWS_OP_LINEAR fusion, since filters tend to be nested in between all the
decode and encode linear ops.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
An operation list containing multiple filter passes, or containing nontrivial
operations before a filter pass, need to be split up into multiple execution
steps with temporary buffers in between; at least for CPU backends.
This helper function introduces the necessary subpass splitting logic
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This allows reads to directly embed filter kernels. This is because, in
practice, a filter needs to be combined with a read anyways. To accomplish
this, we define filter ops as their semantic high-level operation types, and
then have the optimizer fuse them with the corresponding read/write ops
(where possible).
Ultimately, something like this will be needed anyways for subsampled formats,
and doing it here is just incredibly clean and beneficial compared to each
of the several alternative designs I explored.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This commit merely adds the definitions. The implementations will follow.
It may seem a bit impractical to have these filter ops given that they
break the usual 1:1 association between operation inputs and outputs, but
the design path I chose will have these filter "pseudo-ops" end up migrating
towards the read/write for CPU implementations. (Which don't benefit from
any ability to hide the intermediate memory internally the way e.g. a fused
Vulkan compute shader might).
What we gain from this design, on the other hand, is considerably cleaner
high-level code, which doesn't need to concern itself with low-level
execution details at all, and can just freely insert these ops wherever
it needs to. The dispatch layer will take care of actually executing these
by implicitly splitting apart subpasses.
To handle out-of-range values and so on, the filters by necessity have to
also convert the pixel range. I have settled on using floating point types
as the canonical intermediate format - not only does this save us from having
to define e.g. I32 as a new intermediate format, but it also allows these
operations to chain naturally into SWS_OP_DITHER, which will basically
always be needed after a filter pass anyways.
The one exception here is for point sampling, which would rather preserve
the input type. I'll worry about this optimization at a later point in time.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
When an op list needs to be decomposed into a more complicated sequence
of passes, the compile() code may need to roll back passes that have already
been partially compiled, if a later pass fails to compile.
This matters for subpass splitting (e.g. for filtering), as well as for
plane splitting.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a complete rewrite of the math in swscale/utils.c initFilter(), using
floating point math and with a bit more polished UI and internals. I have
also included a substantial number of improvements, including a method to
numerically compute the true filter support size from the parameters, and a
more robust logic for the edge conditions. The upshot of these changes is
that the filter weight computation is now much simpler and faster, and with
fewer edge cases.
I copy/pasted the actual underlying kernel functions from libplacebo, so this
math is already quite battle-tested. I made some adjustments to the defaults
to align with the existing defaults in libswscale, for backwards compatibility.
Note that this commit introduces a lot more filter kernels than what we
actually expose; but they are cheap to carry around, don't take up binary
space, and will probably save some poor soul from incorrectly reimplementing
them in the future. Plus, I have plans to expand the list of functions down
the line, so it makes sense to just define them all, even if we don't
necessarily use them yet.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This can be used to either manually verify, or perhaps programmatically
generate, the list of operation patterns that need to be supported by a
backend to be feature-complete.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This moves the logic from tests/sws_ops into the library itself, where it
can be reused by e.g. the aarch64 asmgen backend to iterate over all possible
operation types it can expect to see.
Signed-off-by: Niklas Haas <git@haasn.dev>
Annoyingly, access to order_src/dst requires access to the SwsOpList, so
we have to append that data after the fact.
Maybe this is another incremental tick in favor of `SwsReadWriteOp` in the
ever-present question in my head of whether the plane order should go there
or into SwsOpList.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of allocating a billion tiny temporary buffers, these helpers now
directly append to an AVBPrint. I decided to explicitly control whether or not
a value with denom 0 should be printed as "inf/nan" or as "_", because a lot
of ops have the implicit semantic of "den == 0 -> ignored". At the same time,
we don't want to obscure legitimate NAN/INF values when the do occur
unintentionally.
Signed-off-by: Niklas Haas <git@haasn.dev>
This commit does not yet touch the PRINTQ macro, but it gets rid of at least
one unnecessary hand-managed buffer.
Signed-off-by: Niklas Haas <git@haasn.dev>
The return value of read_diff_float_data() was previously ignored,
allowing decode to continue silently with partially transformed samples
on malformed floating ALS input. Check and propagate the error.
All failure paths in read_diff_float_data() already return
AVERROR_INVALIDDATA, so the caller fix is sufficient without
any normalization inside the function.
Signed-off-by: Priyanshu Thapliyal <priyanshuthapliyal2005@gmail.com>
Halfs the amount of pmaddwd and improves performance a lot:
sbc_analyze_4_c: 55.7 ( 1.00x)
sbc_analyze_4_mmx: 7.0 ( 7.94x)
sbc_analyze_4_sse2: 4.3 (12.93x)
sbc_analyze_8_c: 131.1 ( 1.00x)
sbc_analyze_8_mmx: 22.4 ( 5.84x)
sbc_analyze_8_sse2: 10.7 (12.25x)
It also saves 224B of .text and allows to remove the emms_c()
from sbcenc.c (notice that ff_sbc_calc_scalefactors_mmx()
issues emms on its own, so it already abides by the ABI).
Hint: A pshufd could be avoided per function if the constants
were reordered.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Check in init whether the parameters are valid.
This can be triggered with
ffmpeg -i tests/data/asynth-44100-2.wav -c sbc -sbc_delay 0.001 \
-b:a 100k -f null -
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
sbc_calc_scalefactors uses an int32_t [16/*max blocks*/][2/*max
channels*/][8/*max subbands*/] array. The MMX version of this code
treats the two inner arrays as one [2*8] array to process
and it processes subbands*channels of them. But when subbands
is < 8 and channels is two, the entries to process are not
contiguous: One has to process 0..subbands-1 and 8..7+subbands,
yet the code processed 0..2*subbands-1.
This commit fixes this by processing entries 0..7+subbands
if there are two channels.
Before this commit, the following command line triggered an
av_assert2() in put_bits():
ffmpeg_g -i tests/data/asynth-44100-2.wav -c sbc -b:a 200k \
-sbc_delay 0.003 -f null -
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
One buffer is encoder-only, the other decoder-only.
Also move crc_ctx before the buffers (into padding).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Choose the first non-hwaccel format rather than the last one. This
matches the logic in ffmpeg CLI and selects YUVA rather than YUV for
HEVC with alpha.
This filter uses AVFrame.opaque to build a linked list of AVFrames. This
is very wrong, as AVFrame.opaque is intended to store caller's private
data and may not be touched by filters. What's worse, the filter leaks
the opaque values to the outside.
Use an AVFifo instead of a linked list to implement the same logic.
It only tests MMX (me_cmp does not have pure MMX functions any more)
and MMXEXT and is therefore x86-only. Furthermore, checkasm is superior
in every regard.
Removing it also fixes a build failure (there is no dependency of this
tool on me_cmp).
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
GET_UTF8 advances the pointer past the newline byte before the
newline check, so shape_text_hb receives text that includes the
newline character. Since HarfBuzz does not treat U+000A as
default-ignorable, it gets shaped into a .notdef glyph.
Fixes#21565
Reported-by: scriptituk <info@scriptit.uk>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The default NASM selection of debug information formats should cover all
cases nicely. See `nasm -h -F` for the default and supported formats.
This commit allows emitting debug information for macho{32,64} (DWARF)
and win{32,64} (CodeView), where previously only ELF targets would
get debug information.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Otherwise, this will indefinitely persist the last couple of mapped frames
(including any extra decoded frames) in memory, even though they will never be
used again, causing a gradual memory leak until filter uninit.
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: nxtedition AB
The expression (exif_len & ~SIZE_MAX) is always 0 for size_t,
making the overflow guard permanently dead code.
Reported-by: Guanni Qu <qguanni@gmail.com>
Signed-off-by: Priyanshu Thapliyal <priyanshuthapliyal2005@gmail.com>
The cosmetic-changes policy in developer.texi was written during the SVN
era, when reviewing indentation changes mixed with functional changes
was genuinely difficult.
Since FFmpeg has moved to Git, reviewers now have simple built-in tools
to ignore whitespace changes:
git diff -w
git log -p --ignore-all-space
Forgejo's pull request UI also offers a 'Hide whitespace changes'
toggle, making it trivial to focus on the functional diff.
For those who prefer reviewing patches in their mail client, the same
result can be achieved by saving the patch and running:
git apply --ignore-whitespace <patch> && git diff -w
Relax the policy so that indentation changes which are invisible to
git diff --ignore-all-space may accompany functional changes, while
still requiring non-whitespace cosmetic changes to be in separate
commits.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The three *_from_name() functions used av_strstart() for prefix matching,
which returns incorrect results when one name is a prefix of another.
av_stereo3d_from_name("side by side (quincunx subsampling)") matched
"side by side" at index 1 and returned AV_STEREO3D_SIDEBYSIDE instead of
AV_STEREO3D_SIDEBYSIDE_QUINCUNX. Similarly,
av_stereo3d_primary_eye_from_name("nonexistent") matched "none" and
returned AV_PRIMARY_EYE_NONE instead of -1.
Switch all three functions from av_strstart() to strcmp() for exact
matching. No in-tree callers rely on prefix matching.
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
The loop condition in the DEFINE_REMAP macro:
stereo < 1 + s->out_stereo > STEREO_2D
is parsed by C as:
(stereo < (1 + s->out_stereo)) > STEREO_2D
Since STEREO_2D is 0 and relational operators return 0 or 1, the
outer comparison against 0 is a no-op for STEREO_2D and STEREO_SBS.
But for STEREO_TB (value 2) the loop runs 3 iterations instead of 2,
producing an out-of-bounds stereo pass.
Add parentheses so the comparison is evaluated first:
stereo < 1 + (s->out_stereo > STEREO_2D)
This gives 1 iteration for 2D and 2 for any stereo format (SBS or TB),
matching the actual number of stereo views.
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Replace abs() with FFABSU() to avoid undefined behavior when
raw_samples[c][i] == INT_MIN. Per libavutil/common.h, FFABS()
has the same INT_MIN UB as abs(); FFABSU() is the correct
helper as it casts to unsigned before negation.
Reported-by: Guanni Qu <qguanni@gmail.com>
Signed-off-by: Priyanshu Thapliyal <priyanshuthapliyal2005@gmail.com>
And move the big buffers to the end. This reduces codesize
as offset+displacement addressing modes are either unavailable
or require more bytes of displacement is too large. E.g. this
saves 5952B on x64 here and 3008B on AArch64. This change should
also improve data locality.
Reviewed-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
6 is an undefined value for payload_size_type. For those, 7 is used to signal
a custom_byte_size synxtax element.
Signed-off-by: James Almer <jamrial@gmail.com>
Just use uintptr_t, it accomplishes the exact same thing while being defined
behavior.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
but just ignore invalid PAT entries so subsequent valid
entries are parsed.
ISO/IEC 13818-1:2021 specifies a valid range of [0x0010, 0x1FFE] in
§ 2.4.4.6 Semantic definition of fields in program association section
and Table 2-3 – PID table
ts->current_pid is always 0 since that is the PID for the PAT.
Per RFC 3550 Appendix A.3, the cumulative number of packets lost is a
signed 24-bit field. Clamp to signed 24-bit range using av_clip_intp2
and av_zero_extend to handle duplicate packets correctly.
In this scenario, as it's the case with DASH segments, the lcevc track will be
alone but potentially have a sbas tref entry referencing itself, which will
make avformat_stream_group_add_stream() fail.
Signed-off-by: James Almer <jamrial@gmail.com>
Demuxers like mov will export packets not meant for presentation (e.g. because
an edit list doesn't include them) by flagging them as discard, but the mov
muxer completely ignored this, resulting in output edit lists considering every
packet.
Fixes issue #22552
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: signed integer overflow
Fixes: out of array access
Fixes: dvdsub_int_overflow_mixed_ps.mpg
Found-by: Quang Luong of Calif.io in collaboration with OpenAI Codex
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It doesn't hurt to keep track of filtered_size:
The end result will be ignored if extradata is not removed
from the bitstream.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Changes compared to the current version include:
1. We no longer use a dummy PutByteContext on the first pass
for checking whether there is extradata in the NALU. Instead
the first pass no longer writes anything to any PutByteContext
at all; the size information is passed via additional int*
parameters. (This no longer discards const when initializing
the dummy PutByteContext, fixing a compiler warning.)
2. We actually error out on invalid data in the first pass,
ensuring that the second pass never fails.
3. The first pass is used to get the exact sizes of both
the extradata and the filtered data. This obviates the need
for reallocating the buffers lateron. (It also means
that the extradata side data will have been allocated with
av_malloc (ensuring proper alignment) instead of av_realloc().)
4. The second pass now writes both extradata and (if written)
the filtered data instead of parsing the NALUs twice.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Add support for CUDA and ROCm (AMD GPU) devices in the LibTorch DNN
backend.
This works for both NVIDIA CUDA and AMD ROCm, as PyTorch exposes ROCm
through the CUDA-compatible API.
Usage:
./ffmpeg -i input.mp4 -vf scale=224:224,format=rgb24,dnn_processing=dnn_backend=torch:model=sr_model_torch.pt:device=cuda output.mp4
Reviewed-by: Guo Yejun <yejun.guo@intel.com>
Signed-off-by: younengxiao <steven.xiao@amd.com>
Add enc_dec_pcm roundtrip tests for the pcm_bluray codec covering
mono, stereo, 5.1, 7.0, and 7.1 channel layouts in s16. The 5.1
and 7.0 tests use an explicit pan filter for channel layout
conversion so the PAN_FILTER dependency is declared only where
needed. An additional s32 test uses a FATE sample file with real
>16-bit content (divertimenti_2ch_96kHz_s24.wav) and decodes to
s32le to verify the full 32-bit round-trip.
enc_dec_pcm is used instead of transcode because the MPEGTS muxer
produces different binary output on 32-bit and 64-bit platforms,
causing the intermediate file checksum to fail on 32-bit CI.
Coverage for libavcodec/pcm-bluray.c: 0.00% -> 93.75%
Coverage for libavcodec/pcm-blurayenc.c: 0.00% -> 91.71%
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Add a unit test covering av_stereo3d_alloc, av_stereo3d_alloc_size,
av_stereo3d_create_side_data, av_stereo3d_type_name,
av_stereo3d_from_name, av_stereo3d_view_name,
av_stereo3d_view_from_name, and av_stereo3d_primary_eye_name.
The from_name calls are driven by a static name table so each
string appears exactly once. Round-trip inverse checks verify
that type_name/from_name and view_name/view_from_name are
consistent with each other.
Coverage for libavutil/stereo3d.c: 0.00% -> 100.00%
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Add a unit test covering alloc, create_side_data, and select
for AV1 and H.274 film grain parameter types (22 cases).
Coverage for libavutil/film_grain_params.c: 0.00% -> 97.73%
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
sscanf() returns EOF (-1) on input failure, which is non-zero and
passes a bare truthy check. When this happens, the %n directive is
never processed, so len stays uninitialized. Using that value to
advance the arg pointer causes an out-of-bounds read and crash.
Check for >= 1 instead, matching the fix applied to the other
sscanf() call in init() by commit b5b6391d64.
Fixes: https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/22451
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
M4V is a standard extension for MPEG-4 video files, commonly used by
Apple devices and software. While it is functionally similar to MP4,
it was missing from the list of recognized extensions for the
MOV/MP4 demuxer.
Use ffio_read_size() to enforce exact-length reads of the per-channel
ADPCM_THP coefficient tables. Previously the return value of
avio_read() was unchecked, silently accepting truncated extradata.
Specifically output side data from tile groups with -show_stream_groups
which includes rotation information in HEIC images.
Signed-off-by: Jeremy James <jeremy.james@gmail.com>
Whenever the link register is stored on the stack, sign it
before storing it and validate at a symmetrical point (with the
stack at the same level as when it was signed).
These macros only have an effect if built with PAC enabled (e.g.
through -mbranch-protection=standard), otherwise they don't
generate any extra instructions.
None of these cases were present when PAC support was added
in 248986a0db in 2022.
Without these changes, PAC still had an effect in the compiler
generated code and in the existing cases where we these macros were
used - but make it apply to the remaining cases of link register
on the stack.
The sme_entry/sme_exit macros already take care of backing up/restoring
these registers. Additionally, as long as no function calls are
made within the function, x30 doesn't need to be backed up at all.
ff_lcvec_parse_config_record() returns success before this patch
when no IDR or NON_IDR NAL units are found.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
ctx->options.async does not exist on DnnContext; the correct
field is ctx->async directly on the context struct.
Signed-off-by: younengxiao <steven.xiao@amd.com>
This exposes parsing already being done to write lvcC boxes, for the purpose
of having these values available elsewhere.
Will be useful for the following change.
Signed-off-by: James Almer <jamrial@gmail.com>
Possible now that this function is no longer MMX.
Old benchmarks:
gmc_edge_emulation_c: 782.3 ( 1.00x)
gmc_edge_emulation_ssse3: 220.3 ( 3.55x)
New benchmarks:
gmc_edge_emulation_c: 770.9 ( 1.00x)
gmc_edge_emulation_ssse3: 111.0 ( 6.94x)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It beats MMX by a lot, because it has to process eight words.
Also notice that the MMX code expects registers to be preserved
between separate inline assembly blocks which is not guaranteed;
the new code meanwhile does not presume this.
Benchmarks:
gmc_c: 817.8 ( 1.00x)
gmc_mmx: 210.7 ( 3.88x)
gmc_ssse3: 80.7 (10.14x)
The MMX version has been removed.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
edge_emu_mc allows to use different src and dst strides,
so one can replace the outsized edge emu buffer with
one that is much smaller and nevertheless big enough
for all our needs; it also avoids having to check
whether the buffer is actually big enough.
This also improves performance (if the compiler uses
stack probing). Old benchmarks:
gmc_c: 814.5 ( 1.00x)
gmc_mmx: 243.7 ( 3.34x)
New benchmarks:
gmc_c: 813.8 ( 1.00x)
gmc_mmx: 213.5 ( 3.81x)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
MPEG-4 GMC uses the following motion prediction scheme:
For output pixel (x,y), the reference pixel at fractional
coordinates (ox+dxx*x+dxy*y,oy+dyx*x+dyy*y) is used as prediction;
the latter is calculated via bilinear interpolation. The coefficients
here are fixed-point values with 16+shift fractional bits
where shift is sprite_warping_accuracy+1. For the weights,
only the shift most significant fractional bits are used.
shift can be at most four*.
The x86 MMX gmc implementation performs these calculations
using 16-bit words. To do so, it restricts itself to the case
in which the four least significant bits of dxx,dxy,dyx,dyy
are zero and shifts these bits away. Yet in case shift is
less than four, the 16 bits retained also contain at least
one bit that actually belongs to the fpel component
(which is already taken into account by using the correct
pixels for interpolation).
(This has been uncovered by a to-be-added checkasm test.
I don't know whether there are actual files in the wild
using sprite_warping_accuracy 0-2.)
*: It is always four when encoding with xvid and GMC.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
We allow both readd and re-add, so it makes sense to allow both reuse and
re-use. They are both listed in my dictionary.
Signed-off-by: Niklas Haas <git@haasn.dev>
The expression ((8*(MAX_ARGS - 8) + 15) & ~15 + 16)
evaluates to zero on Apple platforms due to assembler operator
precedence differences. LLVM's integrated assembler uses different
precedence rules depending on the target:
unsigned AsmParser::getBinOpPrecedence(AsmToken::TokenKind K,
MCBinaryExpr::Opcode &Kind) {
bool ShouldUseLogicalShr = MAI.shouldUseLogicalShr();
return IsDarwin ? getDarwinBinOpPrecedence(K, Kind, ShouldUseLogicalShr)
: getGNUBinOpPrecedence(MAI, K, Kind, ShouldUseLogicalShr);
}
In Darwin mode (Apple targets), arithmetic operators (+, -) have
higher precedence than bitwise operators (&, |, ^), similar to C.
In GNU mode (ELF targets), bitwise operators have higher precedence
than arithmetic operators.
And plumb it all the way through to the SwsCompiledOp. This is cleaner than
setting up this metadata up-front in x86/ops.c; and more importantly, it
allows us to determine the amount of over-read programmatically during ops
setup.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Mainly so that implementations can consult sws->flags, to e.g. decide
whether the kernel needs to be bit-exact.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Mainly so setup functions can look at table->block_size, and perhaps
the table flags, as well as anything else we may add in the future.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is basically a cosmetic commit that groups all of the parameters to
setup() into a single struct, as well as the return type. This gives the
immediate benefit of freeing up 8 bytes per op table entry, though the
main motivation will come in the following commits.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Will make more sense in light of the fact that this may not correspond
to the op list actually sent to the backends, due to subpass splitting.
Signed-off-by: Niklas Haas <git@haasn.dev>
If the block size is somehow less than 8, this may round down, leading to
one byte too few being copied (e.g. for monow/rgb4).
This was never an issue for current backends because they all have block sizes
of 8 or larger, but a future platform may have different requirements.
Signed-off-by: Niklas Haas <git@haasn.dev>
The `memcpy_in` condition is reversed for negative strides, which require a
memcpy() on the *first* line, not the last line. Additionally, the check
just completely didn't work for negative linesizes, due to comparing against
a negative stride.
Signed-off-by: Niklas Haas <git@haasn.dev>
Added in commit 00907e1244 to hack around a problem that was caused by
the Vulkan backend's incorrect use of the ops dispatch code, which was fixed
properly in commit 143cb56501.
This logic never made sense to begin with, it was only meant to disable the
memcpy logic for Vulkan specifically.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
There's no reason to immediately allocate all of these; we can do it at the
end when we know for sure which passes we have.
This will matter especially if we ever add a way to remove passes again after
adding them (spoiler: we will).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
The output file fopen() result is not checked. If it fails (e.g.
permission denied or invalid path), output_file is NULL and the
subsequent fwrite() call will crash.
Add a NULL check with an error message, consistent with the
existing error handling pattern in this example.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
avcodec_get_supported_config() is called with dec_ctx (the decoder
context) to query supported pixel formats and sample formats, but
the intent is to configure the encoder. The decoder supported
format list may differ from the encoder, leading to format
negotiation failures or incorrect output.
Pass enc_ctx instead so the actual encoder capabilities are
queried.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
encode_write() mapped all return values from avcodec_receive_packet()
into 0 or -1, which destroyed the AVERROR_EOF signal needed by the
caller. The flush call in main() could never see AVERROR_EOF, so a
successful encode always exited with a non-zero status.
Let encode_write() return the original error code and have each
call site handle the expected status:
- Encoding loop: ignore AVERROR(EAGAIN) (need more input)
- Flush path: ignore AVERROR_EOF (normal end-of-stream)
This makes the control flow explicit and easier to follow for
anyone reading the example.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
The cleanup path uses `ofmt->flags` to check AVFMT_NOFILE, but
`ofmt` is only assigned after avformat_alloc_output_context2
succeeds. If a failure occurs between output context allocation
and the `ofmt` assignment (e.g. stream_mapping allocation fails),
ofmt_ctx is non-NULL while ofmt is still NULL, causing a crash.
Use ofmt_ctx->oformat->flags instead, which is always valid when
ofmt_ctx is non-NULL.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
The sample generation loop hardcodes a stride of 2 (stereo) with
samples[2*j], but the channel count is dynamically selected by
select_channel_layout() which picks the layout with the highest
channel count. If the encoder supports more than 2 channels,
samples will be written at wrong offsets.
Use c->ch_layout.nb_channels as the stride instead.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
The output loop used sw_frame->width as the write size for all
planes. This is only correct for NV12 where the interleaved UV
plane happens to have the same byte width as the Y plane. For
other pixel formats (e.g. YUV420P where U/V planes are half
width, or P010 where samples are 2 bytes), the output would be
corrupted.
Use av_image_get_linesize() to compute the correct byte width
for each plane based on the actual pixel format.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
fwrite() returns size_t (unsigned), so comparing its return value
with < 0 is always false and write errors are silently ignored.
Check against the expected byte count instead.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
pgm_save() passes the FILE pointer from fopen() directly to
fprintf() and fwrite() without a NULL check. If fopen() fails
(e.g. permission denied or disk full), this causes a NULL pointer
dereference and crash.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
fopen() with "r" opens the file in text mode, which on Windows
translates \r\n to \n, corrupting raw NV12 pixel data. Use "rb"
to open in binary mode, matching the output file which already
uses "w+b".
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
avformat_close_input() is designed for input format contexts only.
Using it on output contexts is API misuse — it accesses iformat
(which is NULL for output contexts) and does not follow the correct
output cleanup path.
Replace with the proper pattern already used in remux.c and
transcode.c: avio_closep() to close the IO handle, followed by
avformat_free_context() to free the format context.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
When encoding VP9 with a YUV pixel format (e.g. yuv420p) and
AVCOL_SPC_RGB colorspace metadata, libvpxenc unconditionally set
VPX_CS_SRGB. This produced a spec-violating bitstream: Profile 0
(4:2:0) with sRGB colorspace, which is only valid for Profile 1/3
(4:4:4). The resulting file is undecodable.
Fix this by setting ctx->vpx_cs to VPX_CS_SRGB in set_pix_fmt()
for 4:4:4 YUV formats when AVCOL_SPC_RGB is set, matching the
existing GBRP path. This covers the legitimate case of RGB data in
YUV444 containers (e.g. H.264 High 4:4:4 with identity matrix).
With this change, any AVCOL_SPC_RGB that reaches the switch in
set_colorspace() is guaranteed to be a subsampled format where
sRGB is invalid. Return an error so the user can fix their
pipeline rather than silently producing incorrect output.
To reproduce:
ffmpeg -f lavfi -i testsrc=s=64x64:d=1:r=1 \
-c:v libvpx-vp9 -pix_fmt yuv420p -colorspace rgb bad.webm
ffprobe bad.webm
# -> "vp9 (Profile 0), none(pc, gbr/...), 64x64"
ffmpeg -i bad.webm -f null -
# -> 0 frames decoded, error
See also:
https://issues.webmproject.org/487307225
Signed-off-by: Guangyu Sun <gsun@roblox.com>
Signed-off-by: James Zern <jzern@google.com>
Signal that our assembly is compliant with the GCS feature, if
the GCS feature is enabled in the compiler (available since Clang
18 and GCC 15) - this is enabled by -mbranch-protection=standard
with a new enough compiler.
GCS doesn't require any specific modifications to the assembly
code, but requires that all functions return to the expected call
address (checked through a shadow stack).
For cases when returning early without updating any pixels, we
previously returned to return address in the caller's scope,
bypassing one function entirely. While this may seem like a neat
optimization, it makes the return stack predictor mispredict
the returns - which potentially can cost more performance than
it gains.
Secondly, if the armv9.3 feature GCS (Guarded Control Stack) is
enabled, then returns _must_ match the expected value; this feature
is being enabled across linux distributions, and by fixing the
hevc assembly, we can enable the security feature on ffmpeg as well.
adapt_colors() allocates a SwsLut3D before calling add_convert_pass(). If add_convert_pass() fails, the function returns without freeing the previously allocated lut. Free lut on that error path.
Signed-off-by: Huihui_Huang <hhhuang@smu.edu.sg>
Passing a struct/union by value can generally be inefficient.
Additionally, when the struct/union is declared to be aligned,
whether it really stays aligned when passed as a parameter by
value is unclear.
This fixes build errors like this, with MSVC targeting 32 bit ARM:
libswscale/ops_chain.h(91): error C2719: 'unnamed-parameter': formal parameter with requested alignment of 16 won't be aligned
Cap ulNumDecodeSurfaces to 32 and ulNumOutputSurfaces to 64 to prevent
cuvidCreateDecoder from failing with CUDA_ERROR_INVALID_VALUE when
initial_pool_size exceeds the hardware limits.
Also cap the decoder index pool (dpb_size) to 32 so that indices
handed out via av_refstruct_pool_get stay within the valid range
for cuvidDecodePicture's CurrPicIdx.
When unsafe_output is enabled, stop holding idx_ref in the unmap
callback. Since cuvidMapVideoFrame copies decoded data into an
independent output mapping slot, the decode surface index can safely
be reused as soon as the DPB releases it, without waiting for the
downstream consumer to release the mapped frame. This decouples the
decode surface index lifetime (max 32) from the output mapping slot
lifetime (max 64), eliminating the "No decoder surfaces left" error
that occurred when downstream components like nvenc held too many
frames.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Attributes with the language-supported [[attr]] style are only supported
since C++11 and C23 respectively, so this needs to be accounted for in
these checks.
This solves a huge amount of warning spam of:
warning: [[]] attributes are a C23 extension [-Wc23-extensions]
when using --enable-extra-warnings.
The VVC probe only checked forbidden_zero_bit but not
nuh_layer_id range in the NAL unit header. This allowed
certain MP3 files to be misdetected as VVC streams because
their frame data coincidentally contained 00 00 01 start
code patterns that looked like valid NAL units.
Add a check for nuh_layer_id (must be <= 55). The existing
check_temporal_id() already validates nuh_temporal_id_plus1
is in [1, 7]. Together these two checks reject the bogus
NAL units produced by MP3 frame data.
Note: nuh_reserved_zero_bit is intentionally not checked
here, as it is reserved for future use by the spec and may
become non-zero in a later revision.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
OpenSSL DTLS can't retransmit lost packets in blocking mode.
Switch to non-blocking mode and use DTLSv1_handle_timeout()
to properly handle DTLS handshake retransmissions.
pmt_cb() passes mp4_descr + mp4_descr_count as the output base but
MAX_MP4_DESCR_COUNT (16) as the capacity, not the remaining capacity.
init_MP4DescrParseContext() resets d->descr_count to 0 on every call,
so the bounds check at parse_MP4ESDescrTag compares a fresh 0 against
16 regardless of the shifted base.
A PMT with two IOD descriptors of 16 ESDescrs each will crash. The first
fills the buffer mp4_descr[0..15], and then the second writes
mp4_descr[16..31] -- 1152 bytes past the end of the stack.
This change passes the remaining capacity instead of always passing 16.
The writeback in mp4_read_iods is incremented so the caller's running
count is preserved.
Fixes: stack-buffer-overflow
Found-by: Nicholas Carlini <nicholas@carlini.com>
Fixes ticket #22420.
When the first decoded frame is type 1, xan_decode_frame_type1() reads y_buffer as prior-frame state before any data has been written to it.
Since y_buffer is allocated with av_malloc(), this may propagate uninitialized heap data into the decoded luma output.
Allocate y_buffer with av_mallocz() instead.
Fixes UB in the form or adding a 0 offset to a NULL pointer, and substracting a
NULL pointer from another.
Signed-off-by: James Almer <jamrial@gmail.com>
The buffers are allocated using the worst case scenario of the entire NALU
being written, when this is in many times not the case.
Signed-off-by: James Almer <jamrial@gmail.com>
The specification for LCEVC states that start codes may be three or four bytes
long except for the first NALU in an AU, which must be four bytes long.
Signed-off-by: James Almer <jamrial@gmail.com>
The specification for H.26{4,5,6} states that start codes may be three or four
bytes long long except for the first NALU in an AU, and for NALUs of parameter
set types, which must be four bytes long.
This is checked by ff_cbs_h2645_unit_requires_zero_byte(), which is made
available outside of CBS for this change.
Signed-off-by: James Almer <jamrial@gmail.com>
Based on the behaviour from cbs_h2645, which removes actual
trailing_zero_8bits bytes and possibly also work arounds issues in
ff_h2645_extract_rbsp(). In this case, the same issue could be
present in ff_nal_find_startcode().
Signed-off-by: James Almer <jamrial@gmail.com>
The correct syntax after country_code is:
t35_uk_country_code_second_octet b(8)
t35_uk_manufacturer_code_first_octet b(8)
t35_uk_manufacturer_code_second_octet b(8)
Signed-off-by: James Almer <jamrial@gmail.com>
new_pes_packet() moves a buffer with pkt->buf = pes->buffer before
JPEG-XS validation. If header_size > pkt->size, an early return leaves
pes->buffer as a stale alias of pkt->buf with refcount 1. Later,
mpegts_read_packet() calls av_packet_unref(), freeing the buffer
through pkt->buf. The flush loop then re-enters new_pes_packet() and
dereferences the dangling pes->buffer; a second path hits it via
av_buffer_unref() in handle_packets() after a seek.
Drop the early return. The packet is delivered with AV_PKT_FLAG_CORRUPT
set, matching the PES-size-mismatch case above, and the function falls
through to the normal cleanup path. The else guards the header trim so
pkt->data/pkt->size stay valid for the memset.
Fixes: use after free
Fixes regression since 16f89d342e.
Found-by: Nicholas Carlini <nicholas@carlini.com>
This also reverts: c2364e9222
Fixes: out of array access (testcase exists but did not replicate for me)
Founbd-by: Gil Portnoy <dddhkts1@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
An H.264 picture with 65536 slices makes slice_num collide with the
slice_table sentinel. slice_table is uint16_t, initialized via
memset(..., -1, ...) so spare entries (one per row, mb_stride =
mb_width + 1) stay 0xFFFF. slice_num is an uncapped ++h->current_slice.
At slice 65535 the collision makes slice_table[spare] == slice_num
pass, defeating the deblock_topleft check in xchg_mb_border and the
top_type zeroing in fill_decode_caches.
With both guards bypassed at mb_x = 0, top_borders[top_idx][-1]
underflows 96 bytes and XCHG writes at -88 below the allocation
(plus -72 and -56 for chroma in the non-444 path).
Fixes: heap-buffer-overflow
Found-by: Nicholas Carlini <nicholas@carlini.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The legacy scaler is no longer implicitly used to generate a reference
to perform comparisons for every conversion. It is now up to the user
to generate a reference file and use it as input for a separate run to
perform comparisons.
It is now possible to compare against previous runs of the graph-based
scaler, for example to test for newer optimizations.
This reduces the overall time necessary to obtain speedup numbers from
the legacy scaler to the graph-based scaler (or any other comparison,
for that matter) since the reference must only be run once.
For example, to check the speedup between the legacy scaler and the
graph-based scaler:
./libswscale/tests/swscale [...] -bench 50 -legacy 1 > legacy_ref.txt
./libswscale/tests/swscale [...] -bench 50 -ref legacy_ref.txt
If no -ref file is specified, we are assuming that we are generating a
reference file, and therefore all information is printed (including
ssim/loss, and benchmarks if -bench is used).
If a -ref file is specified, the output printed depends on whether we
are testing for correctness (ssim/loss only) or benchmarking (time/
speedup only, along with overall speedup).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This emphasizes the order of magnitude of the loss, which is what is
important for us.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The format of the reference file is the output which is printed to
stdout from this tool itself.
Malformed reference files cause an error, with a more descriptive error
message. Running a subset of the reference conversions is still
supported through -src and/or -dst.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The test results (along with SSIM) are printed to stdout again so that
the output can be parsed by -ref.
Benchmark results have also been added to the output.
We still need to re-run the reference tests to perform benchmarks, but
this will be simplified in the next few commits.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The conversion parameters, ssim/loss, and benchmark results will
eventually be merged into the same output line.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The low bit depth workaround code is duplicated in this commit, but the
other occurrence will be removed in a few commits, so I see no reason
to factor it out.
The legacy scaler still has some conversions that give results much
worse than the expected loss, but we still want them as reference, so
we don't trigger expected loss errors on conversions with the legacy
scaler.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
We will eventually be able to select between running the new graph-based
scaler or the legacy scaler.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Support for input and output formats are already checked in run_self_tests().
This reverts commit a22faeb992.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Fixes: out of array read with --disable-safe-bitstream-reader
Fixes: poc_wmv2.avi
Note, this requires the safe bitstream reader to be turned off by the user and the user disregarding the security warning
Change suggested by: Guanni Qu <qguanni@gmail.com>
Found-by: Guanni Qu <qguanni@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
`spectrum_decode` currently executes Frequency Domain (FD) decoding steps
for all channels, regardless of their `core_mode`. When a channel is in
Linear Prediction Domain (LPD) mode (`core_mode == 1`), FD-specific
parameters such as scalefactor offsets (`sfo`) and individual channel
stream (`ics`) information are not parsed.
This causes a global-buffer-overflow in `dequant_scalefactors`. Because
`spectrum_scale` is called on LPD channels, it uses stale or
uninitialized `sfo` values to index `ff_aac_pow2sf_tab`. In the reported
crash, a stale `sfo` value of 240 resulted in an index of 440
(240 + POW_SF2_ZERO), exceeding the table's size of 428.
Fix this by ensuring `spectrum_scale` and `imdct_and_windowing` are only
called for channels where `core_mode == 0` (FD).
Co-authored-by: CodeMender <codemender-patching@google.com>
Fixes: https://issues.oss-fuzz.com/486160985
Fixes: integer overflow
Fixes: testcase that calls av_timecode_init_from_components() with hh set explicitly to INT_MAX
Found-by: Youngjae Choi, Mingyoung Ban, Seunghoon Woo
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Group assignments by filter family (qpel, epel), variant
(base, uni, bi, uni_w, bi_w) and direction (pixels, h, v, hv).
Add NEON8_FNASSIGN_QPEL_H macro to replace repeated manual
qpel horizontal assignments.
No functional change.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Add NEON-optimized implementations for HEVC QPEL uni-directional
weighted HV interpolation (put_hevc_qpel_uni_w_hv) at 8-bit depth,
for block widths 6, 12, 24, and 48.
These functions perform horizontal then vertical 8-tap QPEL filtering
with weighting (wx, ox, denom) and output to uint8_t. Previously
only widths 4, 8, 16, 32, 64 were implemented; this completes
coverage for all standard HEVC block widths.
Performance results on Apple M4:
./tests/checkasm/checkasm --test=hevc_pel --bench
put_hevc_qpel_uni_w_hv6_8_neon: 3.11x
put_hevc_qpel_uni_w_hv12_8_neon: 3.19x
put_hevc_qpel_uni_w_hv24_8_neon: 2.26x
put_hevc_qpel_uni_w_hv48_8_neon: 1.80x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Move the subs instruction before the store macro in the 8x-unrolled
loops of qpel_uni_w_v4/v8/v16/v64 and qpel_uni_w_hv4/hv8/hv16, so
that many NEON instructions from the store macro separate it from the
conditional branch. This gives the CPU pipeline time to resolve the
condition flags before the branch decision.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
When get_bits_left() returns a negative value (bitstream reader already past the end of the buffer), the while condition while (get_bits_left(gb)) evaluates to true since any non-zero int is truthy.
With the safe bitstream reader enabled, get_bits1() returns 0 past the buffer end, so the break never triggers and leading_zeros increments toward INT_MAX.
Change the condition to > 0, consistent with skip_1stop_8data_bits() which already uses <= 0 for the same pattern.
Signed-off-by: Linke <1102336121@qq.com>
There's a possibility here with a well-crafted MP4 file containing only
the nested boxes in order: MOOV.TRAK.MDIA.MINF.STBL.SDTP where the
header size uses the 64 bit large size, and the ending stdp box has some
size value >= 0x100000014.
On a 32 bit build of ffmpeg, av_malloc's size parameter drops the high
order bits of `entries`, and and the allocation is now a controlled size
that is significantly smaller than `entries`. The following loop will
then write off the ended of allocated memory with data that follows the
box fourcc.
Why: the change is done to comply with lilv expectations of hosts.
Added call lilv_instance_activate in the config_output function to abide by lilv documentation that states it must be called before lilv_instance_run:
"This MUST be called before calling lilv_instance_run()" - documentation source (https://github.com/lv2/lilv/blob/main/include/lilv/lilv.h)
Added call lilv_instance_deactivate in the uninit function to abide by lv2 documentation:
"If a host calls activate(), it MUST call deactivate() at some point in the future" - documentation source (https://gitlab.com/lv2/lv2/-/blob/main/include/lv2/core/lv2.h)
Added instance_activated integer to LV2Context struct to track if instance was activated and only do lilv_instance_deactivate if was activated to abide by lv2 documentation:
"Hosts MUST NOT call deactivate() unless activate() was previously called." - documentation source (https://gitlab.com/lv2/lv2/-/blob/main/include/lv2/core/lv2.h)
Regarding the patcheck warning (possibly constant :instance_activated):
This is a false positive since the struct member is zero-initialized.
Fixes: trac issue #11661 (https://trac.ffmpeg.org/ticket/11661)
Reported-by: Dave Flater
Signed-off-by: Karl Mogensen <karlmogensen0@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Not only do some sources not provide an aspect ratio, as is the case of
MPEG-TS, but also some enhanced streams have no change in dimensions, and this
heuristic would generate bugus values.
Instead, we need to parse the LCEVC bitstream for a Global Config process block
in order to get the actual dimensions. This add a little overhead, but it can't
be avoided.
Signed-off-by: James Almer <jamrial@gmail.com>
rtmpproto converts listen_timeout to milliseconds by multiplying it
by 1000 before passing it to TCP. However, negative values are special
sentinels (e.g., -1 for infinite wait) and should not be multiplied.
This worked prior to commit 49c6e6cc44 because there was no range
validation. Since that commit, ff_parse_opts_from_query_string
validates option values against their declared ranges, causing these
multiplied negative values to fail.
Fixes ticket #22469.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Another step towards a cleaner API, with a cleaner separation of purposes.
Also avoids wasting a whopping one third of the flag space on what really
shouldn't have been a flag to begin with.
I pre-emptively decided to separate the scaler selection between "scaler"
and "scaler_sub", the latter defining what's used for things like 4:2:0
subsampling.
This allows us to get rid of the awkwardly defined SWS_BICUBLIN flag, in favor
of that just being the natural consequence of using a different scaler_sub.
Lastly, I also decided to pre-emptively axe the poorly defined and
questionable SWS_X scaler, which I doubt ever saw much use. The old flag
is still available as a deprecated flag, anyhow.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
In case we ever need to increase this number in the future.
I won't bother bumping the ABI version for this new #define, since it doesn't
affect ABI, and I'm about to bump the ABI version in a following commit.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This was incorrectly inferred to be a Keys spline when the documentation
was first added; but it's actually an "unwindowed" (in theory) natural
cubic spline with C2 continuity everywhere, which is a completely different
thing.
(SWS_BICUBIC is closer to being a Keys spline)
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Or else this might false-positive when we retry compilation after subpass
splitting.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
More useful than just allowing it to "modify" the ops; in practice this means
the contents will be undefined anyways - might as well have this function
take care of freeing it afterwards as well.
Will make things simpler with regards to subpass splitting.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Useful for a handful of reasons, including Vulkan (which depends on external
device resources), but also a change I want to make to the tail handling.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of recomputing the input/output address on each iteration, we
can use the in_bump/out_bump arrays the way the x86 backend does.
I initially avoided this in order to ensure the reference backend always does
the correct thing, even if some future bug causes the bump values to be
computed incorrectly, but doing it this way makes an upcoming change easier.
(And besides, it would be easier to just add an av_assert2() to catch those
cases)
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
ff_h[yc]scale_fast_mmxext() call other functions from inline assembly;
these functions look like leaf functions to GCC, so it may use the
red zone to avoid modifying the stack. But this makes the call
instructions in the inline asm corrupt the stack.
In order to fix this 424bcc46b5
made libswscale/x86/swscale_mmx.o be compiled with -mno-red-zone.
Later Libav fixed it in their version in commit
b14fa5572c by saving and restoring
the memory clobbered by the call (as is still done now). This was
merged into FFmpeg in 0e7fc3cafe,
without touching the -mno-red-zone hack.
Libav later renamed swscale_mmx.c to just swscale.c in
16d2a1a51c which was merged into FFmpeg
in commit 2cb4d51654, without
removing the -mno-red-zone hack, although the file it applies to
no longer existed.
This commit removes the special red-zone handling given that it is
inactive anyway.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When a decoder buffer is flushed, parts of the private context is reset,
which may affect show_streams().
Example:
ffprobe -of flat fate-suite/ac3/mp3ac325-4864-small.ts \
-analyze_frames -show_entries stream=ltrt_cmixlev
Before: ltrt_cmixlev="0.000000"
After: ltrt_cmixlev="0.707107"
Currently, it seems that only ac3 downmix info is concerned.
(ac3 downmix options are exported since 376bb8481a).
Fix regression since 045a8b15b1.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
This is to reapply 18217bb0f5.
Its commit msg is still meaningful:
"Using the max instead of the min avoids the progress stopping
with gaps in sparse streams (subtitles)."
Also on a very similar issue: currently, a single stream with
no data makes ffmpeg reports N/A for both time and speed.
Fix this by ignoring missing dtses.
Fix regressions since d119ae2fd8.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
Fixes compilation errors on newer Clang/GCC that errors out on
incompatible pointers.
error: incompatible pointer types passing 'unsigned long long *' to
parameter of type 'amf_uint64 *' (aka 'unsigned long *')
[-Wincompatible-pointer-types]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
6972b127de requires at least version
1.5.0, as earlier versions are not compatible with C due to unguarded
`extern "C"`.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
The ref->src conversion only needs to be performed once per source
pixel format.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This prevents the propagation of dither_error across frames, and should
also improve reproducibility across platforms.
Also remove setting of flags for sws_src_dst early on, since it will
inevitably be overwritten during the tests.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Remove dimension checks originally added to please static analysis
tools. There is little reason to have arbitrary limits in this
developer test tool. The reference files are under control by the user.
This reverts f70a651b3f and c0f0bec2f2.
Legacy swscale may overwrite the pixel formats in the context (see
handle_formats() in libswscale/utils.c). This may lead to an issue
where, when sws_frame_start() allocates a new frame, it uses the wrong
pixel format.
Instead of fixing the issue in swscale, just make sure dst is always
allocated prior to calling the legacy scaler.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Otherwise, we always pass frames that already have buffers allocated, which
breaks the no-op refcopy optimizations.
Testing with -p 0.1 -threads 16 -bench 10, on an AMD Ryzen 9 9950X3D:
Before:
Overall speedup=2.776x faster, min=0.133x max=629.496x
yuv444p 1920x1080 -> yuv444p 1920x1080, flags=0x100000 dither=1
time=9 us, ref=9 us, speedup=1.043x faster
After:
Overall speedup=2.721x faster, min=0.140x max=574.034x
yuv444p 1920x1080 -> yuv444p 1920x1080, flags=0x100000 dither=1
time=0 us, ref=28 us, speedup=516.504x faster
(The slowdown in the legacy swscale case is from swscale's lack of a no-op
refcopy optimizaton, plus the fact that it's now actually doing memory
work instead of a no-op / redundant memset)
Signed-off-by: Niklas Haas <git@haasn.dev>
This was originally intended to also include performance gains/losses
due to complicated setup logic, but in practice it just means that changing
the number of iterations dramatically affects the measured speedup; which
makes it harder to do quick bench runs during development.
The NVENC H.264 high profile provides up to 16% bitrate savings
(BD-Rate measured with VMAF) compared to the main profile.
Since most users do not explicitly set a profile, changing the
default benefits the common case. Users requiring the main profile
for legacy decoder compatibility can still set it explicitly.
The change is gated behind a versioned define so it only takes
effect on the next major version bump (libavcodec 63).
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
If s->stop is set, the return value would be overwritten
before being checked. This bug was introduced in the switch
to AV_TX in 014ace8f98.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Added in e995cf1bcc,
yet this filter does not have any dsp function using MMX:
it only has generic x86 assembly, no SIMD at all,
so this emms_c() was always unnecessary.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
fixup 08d327e
When an uchar is set to -1, it will become 255 when read as an int.
Duplicate variables for two terminal types can also avoid unused variable warnings.
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
This reverts commit 32554fc107.
Accidentally pushed this commit twice, with the wrong location.
Correct version is 97682155e6.
Signed-off-by: Niklas Haas <git@haasn.dev>
These were abstraction-violating in the first place. Good riddance.
This partially reverts commit c911295f09.
Signed-off-by: Niklas Haas <git@haasn.dev>
Avoids some unnecessary round-trips through the execution harness, as well
as removing one unnecessary layer of abstraction (SwsOpExec).
It's a bit unfortunate that we have to cast away the const on the AVFrame,
since the Vulkan functions take non-const everywhere, even though all they're
doing is modifying frame internal metadata, but alas.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Allows compiled functions to opt out of the ops_dispatch execution harness
altogether and just get dispatched directly as the pass run() function.
Useful in particular for Vulkan.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Now that this function returns a status code and takes care of cleanup on
failure, many call-sites can just return the function directly.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is arguably more convenient for most downstream users, as will be
more prominently seen in the next commit.
Also allows this code to re-use a pass_free() helper with the graph uninit.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is just slightly common enough a pattern that it IMO makes sense to do
so. This will also make more sense after the following commits.
Signed-off-by: Niklas Haas <git@haasn.dev>
This condition was weaker than necessary.
In particular, graph->num_thread == 1 guarantees pass->num_slices == 1.
Signed-off-by: Niklas Haas <git@haasn.dev>
And call it on the read/write ops directly, rather than this awkward loop.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Results in IMHO slightly more readable code flow, and will be useful in an
upcoming commit (that adds logic to ref individual planes).
Signed-off-by: Niklas Haas <git@haasn.dev>
This should match the number of lines. As an aside, align these declarations.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of once at the start of add_convert_pass(). This makes much
more sense in light of the fact that we want to start e.g. splitting
passes apart.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is already called by compile_backend(), and nothing else in this file
depends on accurate values.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
And call it on the read/write ops directly, rather than this awkward loop.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This improves the debugging experience. These are all internal structs so
there is no need to worry about ABI stability as a result of adding flags.
Signed-off-by: Niklas Haas <git@haasn.dev>
Adapted from the corresponding me_cmp code. Only the width 16 function
has been adapted, because it seems that the width 8 function actually
reads 16 bytes per line.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
lavu/x86/pixelutils.h only declares exactly one function,
namely the arch-specific init function. Such declarations
are usually contained in the ordinary header providing
the generic init function, yet the latter is public in this case.
Given that said function is called from exactly one callsite,
the header can be made more useful by moving the actual x86-init
function to it (as a static inline function) and removing
pixelutils_init.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Memory operands of VEX encoded instructions generally have
no alignment requirement and so can be used in the case where
both inputs are unaligned, too. Furthermore, unaligned load
instructions are as fast as aligned loads (from aligned addresses)
for modern cpus, in particular those with AVX2.
Therefore it makes no sense to have three different AVX2 sad32x32
functions. So remove two of them (the remaining one is the same
as the old one where src1 was aligned and src2 was not).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The automatic zero-extensions when assigning a 32bit register
make using 64bits unnecessary.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This change has no measurable impact on performance here;
it is intended to avoid unpredictable behavior with floating
point operation like the one that led to commit
57a29f2e7d.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The early code may encounter handshake failure when publish
WHIP to some server.
See RFC 8827 section 6.5:
All implementations MUST support DTLS 1.2 with the
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 cipher suite
and the P-256 curve.
So this patch uses the specific curve to avoid incompatibility.
Signed-off-by: Aditya Banavi <adityabanavi@gmail.com>
Patches should be submitted to [Forgejo](https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls) or the [ffmpeg-devel mailing list](https://ffmpeg.org/mailman/listinfo/ffmpeg-devel) using `git format-patch` or `git send-email`. Github pull requests should be avoided because they are not part of our review process and **will be ignored**.
See [https://ffmpeg.org/developer.html#Contributing](https://ffmpeg.org/developer.html#Contributing) for more information.
0. If you like to include source plugins, merge them before configure
for example run tools/merge-all-source-plugins
@@ -14,15 +14,11 @@ path when launching `configure`, e.g. `/ffmpegdir/ffmpeg/configure`.
3. Type `make install` to install all binaries and libraries you built.
NOTICE
------
## NOTICE
- Non system dependencies (e.g. libx264, libvpx) are disabled by default.
- Non system dependencies (e.g. libx264, libvpx) are disabled by default.
NOTICE for Package Maintainers
------------------------------
## NOTICE for Package Maintainers
- It is recommended to build FFmpeg twice, first with minimal external dependencies so
that 3rd party packages, which depend on FFmpegs libavutil/libavfilter/libavcodec/libavformat
can then be built. And last build FFmpeg with full dependencies (which may in turn depend on
some of these 3rd party packages). This avoids circular dependencies during build.
- It is recommended to build FFmpeg twice, first with minimal external dependencies so that 3rd party packages, which depend on FFmpegs libavutil/libavfilter/libavcodec/libavformat
can then be built. And last build FFmpeg with full dependencies (which may in turn depend on some of these 3rd party packages). This avoids circular dependencies during build.
The Common Vulnerability Scoring System (CVSS) is an open, industry-standard framework used to measure and communicate the severity of software vulnerabilities, ranging from 0.0 to 10.0.
*Why we need this Document*
It is important that FFmpeg CVEs have consistent and correct CVSS, not only for the obvious reason that one can recognize the severity of an issue at first glance.
But also as these numbers form the basis of rewards paid in bug bounty systems. Inconsistent CVSS could lead to unfair payouts.
*What is this Document*
Prior 2026, FFmpeg had no guideline about CVSS.
This document describes how to select the CVSS for a FFmpeg related CVE. It currently only covers the Base Score.
*What is the CVSS Base Score*
AV Attack Vector (Network, Adjacent, Local, Physical)
AC Attack Complexity (Low, High)
PR Privileges Required (None, Low, High)
UI User Interaction (None, Required)
S Scope (Unchanged, Changed)
C Confidentiality (None, Low, High)
I Integrity (None, Low, High)
A Availability (None, Low, High)
*Things people have set incorrectly*
Below are general guidelines and in specific cases other things may apply.
Attack Vector.
Quote from https://www.first.org/cvss/v3.1/user-guide
"Specifically, analysts should only score for Network or Adjacent when a vulnerability is bound to the network stack.
Vulnerabilities which require user interaction to download or receive malicious content (which could also be delivered locally, e.g., via USB drives) should be scored as Local."
Availability.
FFmpeg Crashes -> AVAILABILITY IMPACT: Low
FFmpeg is frequently used as a short-lived, single-run process instead of a continuously running service that handles ongoing streams of user input. In that usage model, a crash usually causes only limited disruption.
User Interaction
Please consider if an attacker can actually set the parameters required for an attack.
In general arbitrary filter parameters cannot be set by an attacker and require the user/account owner/admin to set them
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.