Dimensions are not available at the container level for LCEVC in mpegts, so derive them
from the parsed values taken from the enhancement stream.
Signed-off-by: James Almer <jamrial@gmail.com>
As defined in ITU-T H.222.0 v9, LCEVC streams use the "Byte stream format"
defined in Annex B of ISO/IEC 23094-2:2021.
Signed-off-by: James Almer <jamrial@gmail.com>
Every container signals it as such, and the data media stream type is too
undefined and unsupported across the codebase that even if no standalone
decoder can be present for it, it's preferable to it.
This is technically an API break, but LCEVC support has been minimal until
now so it should be safe.
Signed-off-by: James Almer <jamrial@gmail.com>
Some video codecs are not meant to output frames on their own but to be applied
on top of frames generated by other codecs, as is the case of LCEVC, Dolby Vision,
etc. Add a codec prop to signal this kind of codec, so that library users may know
to not expect a standalone decoder for them to be present.
Signed-off-by: James Almer <jamrial@gmail.com>
It is only used by the MPEG-2 encoder, so replace it
by a private option instead. Use a more elaborate term
for it: intra_dc_precision ("dc" could be anything).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This option is only allowed to be set for MPEG-2, so ignore it
except for MPEG-2 and move handling of it to mpeg12enc.c.
This is in preparation for deprecating the AVCodecContext option.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Ensure that the RTP payload size is at least one TS packet
(188 bytes) when packetizing MPEG-TS.
Previously, small payload sizes were silently rounded up,
which could lead to incorrect behavior. Return EINVAL
with a clear error message instead.
Signed-off-by: Devraj Ajmera <devrajajmera077@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Without the specification, limiting the index is the best that can be done.
Fixes: out of array access
Fixes: 487591441/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_LATM_fuzzer-6205915698364416
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: path traversal with -dump_attachment:t
Fixes: malicious.mkv
Based on code from libavformat/concatdec.c
This will be factored out possibly into libavutil once there is agreement on the API
Found-by: Shangzhi Xu <mxu490469@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
A new Sequence Header or a Temporal Delimiter OBU invalidate any previous frame
if not yet complete (As is the case of missing Tile Groups).
Similarly, a new Frame Header invalidates any onging Tile Group parsing.
Fixes: out of array access
Fixes: av1dec_tile_desync.mp4
Fixes: av1dec_tile_desync_bypass.mp4
Signed-off-by: James Almer <jamrial@gmail.com>
Section 6.8.1 of the AV1 specification states:
"If obu_type is equal to OBU_REDUNDANT_FRAME_HEADER, it is a requirement of
bitstream conformance that SeenFrameHeader is equal to 1."
Leave the existing behavior for reading scenarios as such a file may still
be readable.
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: signed integer overflow: -1548257 * 2048 cannot be represented in type 'int'
Fixes: #21592
Found-by: HAORAN FANG
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -2147483648 - 65536 cannot be represented in type 'int'
Fixes: #21588
Found-by: HAORAN FANG
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 536870944 * 16 cannot be represented in type 'int'
Fixes: #21587
Found-by: HAORAN FANG
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: integer overflow (does not replicate, but looks like it should overflow with some craftet parameters)
Fixes: #21584
Found-by: HAORAN FANG
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When luma init switched to cascade the chroma init was skiped
Fixes: NULL pointer dereference
Fixes: #21583
Found-by: HAORAN FANG
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The value of vb_pos at vb_bottom, vb_above is known
at compile-time, so one can avoid the modifications
to vb_pos and just compare against immediates.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are completely unnecessary for the 8bit case (which only
handles 8bit) and overtly complicated for the 10 and 12bit cases:
All one needs to do is set up the (1<<bpp)-1 vector register
and jmp from (say) the 12bpp function stub inside the 10bpp
function. The way it is done here even allows to share the
prologue between the two functions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It can be combined with an earlier lea for the loop
processing 16 pixels at a time; it is unnecessary
for the tail, because the new values will be overwritten
immediately afterwards anyway.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The vvc_alf_filter functions don't use x86inc's stack managment
feature at all; they merely push and pop some regs themselves.
So don't tell x86inc to provide stack (which in this case
entails aligning the stack).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Each luma alf block has 2*12 auxiliary coefficients associated
with it that the alf_filter functions consume; the C version
simply increments the pointers.
The x64 dsp function meanwhile does things differenty:
The vvc_alf_filter functions have three levels of loops.
The middle layer uses two counters, one of which is
just the horizontal offset xd in the current line. It is only
used for addressing these auxiliary coefficients and
yet one needs to perform work translate from it to
the coefficient offset, namely a *3 via lea and a *2 scale.
Furthermore, the base pointers of the coefficients are incremented
in the outer loop; the stride used for this is calculated
in the C wrapper functions. Furthermore, due to GPR pressure xd
is reused as loop counter for the innermost loop; the
xd from the middle loop is pushed to the stack.
Apart from the translation from horizontal offset to coefficient
offset all of the above has been done for chroma, too, although
the coefficient pointers don't get modified for them at all.
This commit changes this to just increment the pointers
after reading the relevant coefficients.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The permutation that was applied before the write macro
is actually only beneficial when one has 16 entries to write,
so move it into the macro to write 16 entries and optimize
the other macro.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
One always has eight samples when processing the luma remainder,
so xmm registers are sufficient for everything. In fact, this
actually simplifies loading the luma parameters.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When width is known to be 8 (i.e. for luma that is not width 16),
the upper lane is unused, so use an xmm-sized packuswb and avoid
the vpermq altogether. For chroma not known to be 16 (i.e. 4,8 or
12) defer extracting from the high lane until it is known to be needed.
Also do so via vextracti128 instead of vpermq (also do this for
bpp>8).
Also use vextracti128 and an xmm-sized packuswb in case of width 16
instead of an ymm-sized packuswb followed by vextracti128.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
At the end of the height==8 codepath, a jump to RET at the end
of the height==16 codepath is performed. Yet the epilogue
is so cheap on Unix64 that this jump is not worthwhile.
For Win64 meanwhile, one can still avoid jumps, because
for width 16 >8bpp and width 8 8bpp content a jump is performed
to the end of the height==8 position, immediately followed
by a jump to RET. These two jumps can be combined into one.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Write them in assembly instead; this exchanges a call+ret
with a jmp and also avoids the stack for (1<<bpp)-1.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Both the 8bpp width 16 and >8bpp width 8 cases write
16 contiguous bytes; deduplicate writing them. In fact,
by putting this block of code at the end of the SAVE macro,
one can even save a jmp for the width 16 8bpp case
(without adversely affecting the other cases).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For 8bpp width 8 content, an unnecessary jump was performed
for every write: First to the end of the SAVE_8BPC macro,
then to the end of the SAVE macro. This commit changes this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When processing the last block, we no longer need to preserve
some registers for the next block, allowing simplifications.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Output directly to the desired destination registers instead
of zeroing them, followed by adding the desired values.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
BDOF_PROF_GRAD just adds some values to m12,m13,
so one can avoid two pxor, paddw by deferring
saving these registers prematurely.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The legacy API is defined by sws_init_context(), sws_scale() etc., whereas
the "modern" API is defined by just using sws_scale_frame() without prior
init call.
This int allows us to cleanly distinguish the type of context, paving the
way for some minor refactoring.
As an immediate benefit, we now gain a bunch of explict error checks to
ensure the API is used correctly (i.e. sws_scale() not called before
sws_init_context()).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
There are two options which use non-zero default value: async and
batch_size of openvino. init_model_ov checks and set batch_size to
one when batch_size is equal to zero, so the only option affected
by missing default value is async. Now async works as expected.
This commit update deinterlace_d3d12 filter options name.
Currently it follows the options name with "deinterlace_vaapi",
In this commit, it will follow filters such as "yadif" and "w3fdif".
Sample command lines:
1. Software decode with hwupload:
ffmpeg -init_hw_device d3d12va=d3d12 -i interlaced.ts \
-vf "format=nv12,hwupload,deinterlace_d3d12=method=default,hwdownload,format=nv12" \
-c:v libx264 output.mp4
2. Full hardware pipeline:
ffmpeg -hwaccel d3d12va -hwaccel_output_format d3d12 -i interlaced.ts \
-vf "deinterlace_d3d12=method=custom:mode=field" \
-c:v h264_d3d12va output.mp4
Signed-off-by: younengxiao <steven.xiao@amd.com>
Fixes: out of array access
Fixes: crash_vvc_heap_oob_read.bin
Found-by: akshay jain <akshaythe@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This allows distinguishing between different types of failure, e.g.
AVERROR(EINVAL) on invalid pass dimensions.
Signed-off-by: Niklas Haas <git@haasn.dev>
Makes various pieces of code that expect to get a SWS_OP_READ more robust,
and also allows us to generalize to introduce more input op types in the
future (in particular, I am looking ahead towards filter ops).
Signed-off-by: Niklas Haas <git@haasn.dev>
This code is self-contained and logically distinct from the ops-related
helpers in ops.c, so it belongs in its own file.
Purely cosmetic; no functional change.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Use a 120 KiB per-function threshold. This roughly matches the default
thread stack size on Musl, which is 128 KiB, leaving some headroom for
caller frames.
This warning draws attention to functions in the codebase that may have
excessively large stack frames. A few functions could benefit from
reducing their frame size.
120 KiB is a reasonable limit. If a function requires more stack, it
likely would benefit from restructuring the code.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Inside the amf_device_derive() function, there seem to be some
unreachable 'break' statements, which can be removed.
And also the final 'return 0' statement will also be unreachable
especially since there is a default case in the switch block.
Signed-off-by: Alexandru Ardelean <aardelean@deviqon.com>
refs does not belong to AVCodecParameters, so require a decoder:
it should only be showed when frames are actually processed by ffprobe.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
This fixes compiling with MSVC for aarch64 after
510999f6b0.
While MSVC does do dead code elimintation for function references
within e.g. "if (0)", it doesn't do that for functions referenced
within a static function, even if that static function itself ends
up not used.
A reproduction example:
void missing(void);
void (*func_ptr)(void);
static void wrapper(void) {
missing();
}
void init(int cpu_flags) {
if (0) {
func_ptr = wrapper;
}
}
If "wrapper" is entirely unreferenced, then MSVC doesn't produce
any reference to the symbol "missing". Also, if we do
"func_ptr = missing;" then the reference to missing also is
eliminated. But for the case of referencing the function in a
static function, even if the reference to the static function can
be eliminated, then MSVC does keep the reference to the symbol.
Fixes: applying non-zero offset 16 to null pointer
Fixes: 471614378/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SNOW_fuzzer-5967030642868224
Note: FF_PTR_ADD() does not work as this code has NULL + 123 cases where the pointer is unsused afterwards
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Accept up to 15 ULP difference.
This fixes running "checkasm --test=ac3dsp <seed>" for the seeds
2043066705, 24168 and 111972 on ARM, and the seeds 40552 and
209754 on aarch64.
This is the same change as 8e4c904c8e,
increasing the tolerance further.
With this change, checkasm passes for over 500 000 seeds on both
ARM and aarch64.
Newer revisions of WinSDK 10.0.26100.0 have exposed more flags for
IsProcessorFeaturePresent; now there is a separate one for
detecting specifically I8MM and not just SVE-I8MM. Switch to using
this flag instead.
Clarify the behavior of seek keyboard shortcuts in both the
documentation and command-line help text. Specifically:
- left/right: mention custom interval option support
- page down/up: improve wording for chapter seeking fallback
This is a bit more forward-facing than a bare allocation, and importantly,
allows the `swscale/utils.c` code to remain agnostic about how to correctly
uninit this struct.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
By excluding the Vulkan makefile entirely when --disable-unstable is passed.
This also correctly avoids compiling e.g. unused GLSL compilers.
Fixes: #22295
See-Also: #22366
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Use PRIu32/PRIX32 format specifiers instead of %d/%u/%X for uint32_t
variables in av_log calls. On some platforms (e.g. NuttX), uint32_t is
typedef'd as unsigned long rather than unsigned int, which triggers
-Wformat warnings despite both types being 4 bytes. Using PRI macros
is the portable way to match the actual underlying type of uint32_t.
Signed-off-by: zengshuang <zengshuang@xiaomi.com>
Add NEON alpha drop/insert using ldp+tbl+stp instead of ld4/st3 and
ld3/st4 structure operations. Both use a 2-register sliding-window
tbl with post-indexed addressing. Instruction scheduling targets
narrow in-order cores (A55) while remaining neutral on wide OoO.
Scalar tails use coalesced loads/stores (ldr+strh+lsr+strb for alpha
drop, ldrh+ldrb+orr+str for alpha insert) to reduce per-pixel
instruction count. Independent instructions placed between loads and
dependent operations to fill load-use latency on in-order cores.
checkasm --bench on Apple M3 Max (decicycles, 1920px):
rgb32tobgr24_c: 114.4 ( 1.00x)
rgb32tobgr24_neon: 64.3 ( 1.78x)
rgb24tobgr32_c: 128.9 ( 1.00x)
rgb24tobgr32_neon: 80.9 ( 1.59x)
C baseline is clang auto-vectorized; speedup is over compiler NEON.
Signed-off-by: David Christle <dev@christle.is>
Add checkasm coverage for rgb32tobgr24 (alpha drop) and rgb24tobgr32
(alpha insert) with test widths exercising all code tiers and
overwrite detection via sentinel bytes.
Signed-off-by: David Christle <dev@christle.is>
Add a NEON rgb24tobgr24 using ld3/st3 to swap R and B channels in
packed 24bpp RGB buffers. Handles all input sizes with a 16-pixel
NEON fast path, 8-pixel NEON cleanup, and scalar tail.
checkasm --bench on Apple M3 Max (1920*3 = 5760 bytes):
rgb24tobgr24_c: 722.0 ( 1.00x)
rgb24tobgr24_neon: 94.9 ( 7.61x)
Signed-off-by: David Christle <dev@christle.is>
Add checkasm coverage for rgb24tobgr24 with test widths exercising
all code tiers (scalar, 8-pixel NEON, 16-pixel NEON).
Signed-off-by: David Christle <dev@christle.is>
This was calling atoi() on `p + offset`, which is nonsense (p may point to
the start of the cache-control string, which does not necessarilly coincide
with the location of the max-age value). Based on the code, the intent
was clearly to parse the value *after* the matched substring.
The retry path restores this offset, but the failure path does not. This
is especially important for the case of the continuation handler in
http_read_stream(), which may result in subsequent loop iterations (after
repeated failures to read additional data) seeking to the wrong offset.
This (arbitrarily) returns -1, which happens to be AVERROR(EPERM) on my
machine. Return the more descriptive AVERORR(EIO) instead.
Also add a log message to explain what's going on.
The code was evidently designed at one point in time to support "direct"
execution (not via a thread pool) for num_threads == 1, but this was never
implemented.
As a side benefit, reduces context creation overhead in single threaded
mode (relevant e.g. inside the libswscale self test), due to not needing to
spawn and destroy several thousand worker threads.
Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
Ensures samples where a missing Frame Header is handled by a subsequent
Redundant one are parsed correctly.
Signed-off-by: James Almer <jamrial@gmail.com>
(This also stops zero-allocating er_temp_buffer for H.264,
reverting back to the behavior from before commit
0a1dc817237b8546189960efd523257b4f62e1ab.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Given we rewrite these NALUs to remove the encoded data blocks to export as extradata,
we need to do the inverse to remove SC, GC and AI blocks to export as filtered data in
packes.
Signed-off-by: James Almer <jamrial@gmail.com>
write_lcevc_nalu() is meant only for IDR and NON_IDR NALUs. For everything else, just
copy it unchanged.
Signed-off-by: James Almer <jamrial@gmail.com>
Rewrite ff_hevc_put_hevc_qpel_h16_8_neon and h32 to use byte-domain
widening multiply (umull/umlal/umlsl via calc_qpelb/calc_qpelb2 macros)
instead of the previous int16-domain approach (uxtl + mul/mla).
The byte-domain approach eliminates the uxtl expansion step and halves
the ext stride (1 byte vs 2 bytes per tap), reducing per-row instruction
count from ~32 to ~23. The functions are also inlined, removing bl/ret
call overhead.
This benefits all HV-path callers (hv/uni_hv/bi_hv/uni_w_hv/bi_w_hv)
at widths 16/32/48/64.
checkasm benchmarks on Apple M4 (5-run average):
H-pass standalone (NEON):
h16: 34.0 -> 24.4 cycles (1.39x speedup)
h32: 132.0 -> 95.0 cycles (1.39x speedup)
h64: 521.8 -> 373.9 cycles (1.40x speedup)
HV compound paths geometric mean speedup (NEON, width >= 16):
qpel_hv: 1.144x (4 functions)
qpel_bi_hv: 1.158x (4 functions)
qpel_uni_hv: 1.188x (4 functions)
qpel_uni_w_hv: 1.158x (3 functions)
Overall: 1.162x (15 functions)
VVC qpel h16/h32 are separated into self-contained functions retaining
the int16-domain approach, as VVC filters have arbitrary coefficients
incompatible with the hardcoded sign pattern in calc_qpelb.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
The VVC qpel h16 and h32 functions had a redundant 'mov mx, x30'
instruction. The first one was placed before vvc_load_filter had
finished using mx (the filter pointer argument), making it a dead
store immediately overwritten by the second 'mov mx, x30'.
Remove the first instance and reorder so that 'sub src, src, #3'
comes before 'mov mx, x30', ensuring the filter pointer in mx is
fully consumed by vvc_load_filter before being overwritten with the
link register.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
AFter this, depth and the root element flag is stored in space
needed for alignment and thus does not increase the AVExpr size
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
we already check the depth of the parser but the AVExpr tree differs
Fixes: stack exhaustion
Fixes: YWH-PGM40646-39
Found-by: jpraveenrao
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -8659510451449931520 - 2205846422852077376 cannot be represented in type 'int64_t' (aka 'long')
Fixes: 486358507/clusterfuzz-testcase-minimized-ffmpeg_dem_WEBM_DASH_MANIFEST_fuzzer-4896911086911488
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access
Fixes: 471509958/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_ADPCM_IMA_MAGIX_DEC_fuzzer-4847227777646592
We ask for a mono sample because the implementation for mono is incomplete
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
scratch[20] doesnt exist in version 0
Fixes: use of uninitialized memory
Fixes: 471664627/clusterfuzz-testcase-minimized-ffmpeg_dem_SEGAFILM_fuzzer-4738726971637760
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Fixes: signed integer overflow: 2147483640 + 32 cannot be represented in type 'int'
Fixes: 473569764/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_DEC_fuzzer-5377306970619904
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long long'); cast to an unsigned type to negate this value to itself
Fixes: 473334102/clusterfuzz-testcase-minimized-ffmpeg_dem_MATROSKA_fuzzer-5109540931829760
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes non repeatable checksums
This also avoids allocating the mc only buffer when its not used
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If there is no escape case then reaching that branch is an error
Fixes: shift exponent 32 is too large for 32-bit type 'uint32_t' (aka 'unsigned int')
Fixes: 472335543/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DST_fuzzer-6682453243920384
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: runtime error: shift exponent -1 is negative
Fixes: runtime error: shift exponent 32 is too large for 32-bit type 'int'
Fixes: 471846062/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_DEC_fuzzer-5835290976780288
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
resample_linear can produce overflows with craftet input,
The added casts should have no effect on the binary output or the operations they
just change things to a defined regime
Fixes: signed integer overflow: 2069416960 + 78151680 cannot be represented in type 'int'
Fixes: 472047214/clusterfuzz-testcase-minimized-ffmpeg_SWR_fuzzer-6374046976770048
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 9223372036854775807 + 3546086691638400 cannot be represented in type 'int64_t' (aka 'long')
Fixes: 471723681/clusterfuzz-testcase-minimized-ffmpeg_dem_MXF_fuzzer-4841032488648704
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 1077919680 + 1077936128 cannot be represented in type 'int'
Fixes: 471686763/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_ADPCM_N64_DEC_fuzzer-6493712281829376
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch implements the DNNAsyncExecModule for the LibTorch backend,
enabling non-blocking inference using the common infrastructure instead
of custom threading (th_async_module_submit) to align with the
TensorFlow and OpenVINO backends.
The implementation uses ff_dnn_start_inference_async which provides
unified async logic across all DNN backends, eliminating the need for
backend-specific threading code.
Verified with:
ffmpeg -f lavfi -i testsrc=duration=5:size=320x240:rate=30 -vf dnn_processing=dnn_backend=torch:model=model.pt -y output.mp4
Signed-off-by: Raja Rathour <imraja729@gmail.com>
Increase test height from 2 to 4 rows and add 32 bytes of source stride
padding. This exercises chroma row sharing across multiple luma row pairs
in YUV420P and the stride increment arithmetic in SIMD implementations,
both of which were previously untested.
Signed-off-by: David Christle <dev@christle.is>
Add ARM64 NEON-accelerated unscaled YUV-to-RGB conversion for planar
YUV input formats. This extends the existing NV12/NV21 NEON paths with
YUV420P, YUV422P, and YUVA420P support for all packed RGB output
formats (ARGB, RGBA, ABGR, BGRA, RGB24, BGR24) and planar GBRP.
Register with ff_yuv2rgb_init_aarch64() to also cover the scaled path.
checkasm: all 42 sw_yuv2rgb tests pass.
Speedup vs C at 1920px width (Apple M3 Max, avg of 20 runs):
yuv420p->rgb24: 4.3x yuv420p->argb: 3.1x
yuv422p->rgb24: 5.5x yuv422p->argb: 4.1x
yuva420p->argb: 3.5x yuva420p->rgba: 3.5x
Signed-off-by: David Christle <dev@christle.is>
The res variable (pixel residual count for widths not divisible by 16)
is computed once before the row loop, but DEALYUV2RGBLINERES and
DEALYUV2RGBLINERES32 destructively subtract 8 from it inside the loop
body. When srcSliceH > 2, subsequent row pairs get an incorrect
residual count, producing wrong output for the tail pixels.
Fix by recomputing res from the constant c->opts.dst_w at the top of
each row-pair iteration.
Signed-off-by: David Christle <dev@christle.is>
It is no longer necessary: Most encoders don't use any MMX code
at all any more and the ones that do issue emms on their own.
Instead add an ff_asser1_fpu() to check that it stays that way.
To add some more information, only the halfpel (and fullpel),
qpel and motion estimation APIs use MMX in addition to the
SBC and Snow specific dsp code. halfpel is used by the
mpegvideo encoders, SVQ1 and Snow encoders. The same
encoders in addition to the AC-3 ones and dvvideo use me_cmp.
qpel is only used by the MPEG4 encoder which is part of
mpegvideo. None of these codecs need the generic emms_c (even on
errors):
a) The AC-3 encoders only use a width 16 me_cmp function which
can no longer use MMX since d91b1559e0.
b) dvvideo and SBC emit emms on their own and have no error paths
after the start of the part that can use MMX.
c) SVQ1 calls emms_c() on its own, even on all error paths
that need it.
d) Snow calls emms_c() on its ordinary (success) return path;
it has only one error path in the part of the code that uses MMX,
but even it is fine as ratecontrol_1pass() always calls emms_c()
itself.
e) For mpegvideo, the MMX code is almost confined to the part
of the code reachable from the worker threads (if slice threading
is in use). The exception to this is in skip_check() which always
calls emms_c() itself. Because encode_picture() always calls
emms_c() itself after executing the worker threads and before any
error condition, the floating point state is clean upon exit from
encode_picture().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These will be used to mark and check parts of the code
where the floating point state is supposed to be not
clobbered. They are an improvement on the (deprecated)
av_assert0_fpu() because of proper file and line number
information; they also reset the floating point state
if the assert is actually triggered.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Needed for the allocations in ff_snow_common_init_after_header()
(as well as for calculate_visual_weight() if
spatial_decomposition_count could change).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary since d91b1559e0
(before that the sad_cmp[0] call in get_intra_count() may
have clobbered the floating point state without cleaning it up
itself).
Also remove some commented out emms_c from places where
the floating point state is guaranteed not to have been clobbered
by us.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
yuy2toyv12, vu9_to_vu12 and yvu9_to_yuy2 are unused. Removing
them saved 7808B of .text and 102B of .text.unlikely
as well as 24B of .bss here.
Thanks to James Almer for pointing out that yuy2toyv12 is unused.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
AVFrame just really doesn't have the semantics we want. However, there a
tangible benefit to having SwsFrame act as a carbon copy of a (subset of)
AVFrame.
This partially reverts commit 67f3627267.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Contrary to the MMXEXT version this version does not overread at all
(the MMXEXT version processes the input of 2*w bytes in eight byte
chunks and overreads by a further six bytes, because it loads
the next left and left top values at the end of the loop,
i.e. it reads FFALIGN(2*w,8)+6 bytes instead of 2*w).
Benchmarks:
sub_hfyu_median_pred_int16_9bpp_c: 12673.6 ( 1.00x)
sub_hfyu_median_pred_int16_9bpp_mmxext: 1947.7 ( 6.51x)
sub_hfyu_median_pred_int16_9bpp_sse2: 993.9 (12.75x)
sub_hfyu_median_pred_int16_9bpp_aligned_c: 12596.1 ( 1.00x)
sub_hfyu_median_pred_int16_9bpp_aligned_mmxext: 1956.1 ( 6.44x)
sub_hfyu_median_pred_int16_9bpp_aligned_sse2: 989.4 (12.73x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to only use certain functions using wide registers
if there is enough work to do and if one can even read a whole
register wide without overreading.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Now WHIP can directly pass SSRC as UINT32 instead of
converting it as INT for match wrong rtpenc option
range Since 69568479a9.
And Converting SSRC from UINT32 to INT64 is safe to
pass by av_dict_set_int().
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
Fixes:
warning: initializing 'char *' with an expression of type 'const char *'
discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Up until now, several altivec-specific fields are directly
put into SwsInternal #if HAVE_ALTIVEC is true. These fields
are of altivec-specific vector types and therefore
require altivec specific headers to be included.
Unfortunately, said altivec specific headers redefine
bool in a manner that is incompatible with stdbool.
swscale/ops.h uses bool and this led graph.c and ops.c
to disagree about the layout of structures from ops.h,
leading to heap corruption [1], [2] in the sws-unscaled
FATE test.
Fix this by moving the altivec-specific parts out of SwsInternal
and into a structure that extends SwsInternal and is allocated
jointly with it. Said structure is local to yuv2rgb_altivec.c,
because this is the only file accessing said fields. Should
more files need them, an altivec-specific swscale header would
need to be added.
Thanks to jfiusdq <jfiusdq@proton.me> for analyzing the issue.
[1]: https://fate.ffmpeg.org/report.cgi?slot=ppc64-linux-gcc-14.3-asan&time=20260224065643
[2]: https://fate.ffmpeg.org/report.cgi?slot=ppc64-linux-gcc-14.3&time=20260224051615
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is in preparation for removing the util_altivec.h inclusion
in swscale_internal.h, which causes problems (on PPC) because
it redefines bool to something different from stdbool.h.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This commit introduces new options to support more than one decryption
keys:
* add a decryption_keys option to MOV, that supports a dictionary of
KID=>key (in hex), using AV_OPT_TYPE_DICT
* add the corresponding cenc_decryption_keys option to DASH
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
The ssrc option used AV_OPT_TYPE_INT with range [INT_MIN, INT_MAX],
but the underlying struct field is uint32_t and RFC 3550 defines SSRC
as a 32-bit identifier covering the full [0, UINT32_MAX] range. This
caused ffmpeg to reject any SSRC value above INT_MAX (~2.1 billion),
which is roughly half of all valid values.
The auto-generated fallback path (av_get_random_seed()) already
returns uint32_t and can produce values above INT_MAX, creating an
inconsistency where the automatic path succeeds but the explicit
-ssrc option rejects the same value.
Change the option to AV_OPT_TYPE_UINT with range [0, UINT32_MAX]
to match the field type and allow the full identifier space.
Fixes ticket #9080.
Signed-off-by: Neko Asakura <neko.asakura@outlook.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
For modern cpus (like those supporting AVX2) loads and stores
using the unaligned versions of instructions are as fast
as aligned ones if the address is aligned, so remove
the aligned AVX2 version (and the alignment check) and just
use the unaligned one.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This avoids a check and a shift if >=8 elements are processed;
it adds a check if < 8 elements are processed (which should
be rare).
No change in benchmarks here.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The pixel format for the process loops have already been checked at
this point to be valid.
The switch added in e4abfb8e51 returns AVERROR(EINVAL) in the default
case without calling ff_sws_op_chain_free(chain), but there's no need
to free it since we mark this branch as unreachable.
This has now become fully redundant with AVFrame, especially since the
existence of SwsPassBuffer. Delete it, simplifying a lot of things and
avoiding reinventing the wheel everywhere.
Also generally reduces overhead, since there is less redundant copying
going on.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
And have ff_sws_graph_run() just take a bare AVFrame. This will help with
an upcoming change, aside from being a bit friendlier towards API users
in general.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This commit replaces the AVBufferRef inside SwsPassBuffer by an AVFrame, in
anticipation of the SwsImg removal.
Incidentally, we could also now just use av_frame_get_buffer() here, but
at the cost of breaking the 1:1 relationship between planes and buffers,
which is required for per-plane refcopies.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This function was originally written to support the use case of e.g.
partially allocated planes that implicitly reference the original input
image, but I've decided that this is stupid and doesn't currently work
anyways.
Plus, I have plans to kill SwsImg, so we need to simplify this mess.
Signed-off-by: Niklas Haas <git@haasn.dev>
The current logic didn't take into account the possible plane shift. Just
re-compute the correctly shifted pointers using the row position.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead, precompute the correctly swizzled data and stride in setup()
and just reference the SwsOpExec fields directly.
To avoid the stack copies in handle_tail() we can introduce a temporary
array to hold just the pointers.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Given the entire iamf struct is inside a single Track, if the first iamf stream
(which is the one sharing the index and id from the Track) was to be disabled,
then packets from every iamf stream would be discarded.
Fix this by actually going through the entire iamf Sample and discarding those
from the disabled streams only.
Signed-off-by: James Almer <jamrial@gmail.com>
If we were to add the highest id of a non iamf stream as offset to iamf stream
ids, and one of the latter was 0, then an id overlap would ocurr.
Signed-off-by: James Almer <jamrial@gmail.com>
It has come to my attention that a way to limit the request range size
would be useful in general, for reasons beyond just speeding up initial
header parsing.
This patch generalizez -initial_request_size to -request_size. I decided
to continue allowing both options to be used simultaneously, so users can
e.g. set -request_size to something large like 10 MiB, but still use a smaller
size for initial header parsing (e.g. 256 KiB).
Fixes: https://github.com/mpv-player/mpv/issues/8655
src/libswscale/x86/ops.c(534): error C2099: initializer is not a constant
src/libswscale/x86/ops.c(535): error C2099: initializer is not a constant
src/libswscale/x86/ops.c(536): error C2099: initializer is not a constant
src/libswscale/x86/ops.c(537): error C2099: initializer is not a constant
src/libswscale/x86/ops.c(539): error C2099: initializer is not a constant
src/libswscale/x86/ops.c(540): error C2099: initializer is not a constant
Fixes: ec959e20c5
Signed-off-by: Niklas Haas <git@haasn.dev>
SHA-1 is deprecated and considered cryptographically weak.
Replace EVP_sha1() with EVP_sha256() when signing self-generated
certificates to comply with modern security standards.
This would otherwise generate illegal ops and undefined behavior, for
pixel formats without any supported corresponding pixel type.
This didn't cause any issues previously because the following
`ff_sws_encode_pixfmt` would normally fail for such formats, and the UB
was ignored in practice.
Signed-off-by: Niklas Haas <git@haasn.dev>
The issue was that FFv1 concat'd a lot of separate files, each with
a license header, inflating the total shader size to over 3000 lines.
As all codec shaders were rewritten for compile-time SPIR-V, this became
unnencessarily restrictive.
Sponsored-by: Sovereign Tech Fund
Allows users to test if a given hardware pixel format is supported by swscale.
This is only a rough heuristic anyways, because the actual support may
depend on the specific *combination* of frame attributes, and ultimately
is better served by the `sws_test_frame` and `sws_frame_setup()` APIs anyways.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of defining multiple patterns for the dither ops, just define a
single generic function that branches internally. The branch is well-predicted
and ridiculously cheap. At least on my end, within margin of error.
Signed-off-by: Niklas Haas <git@haasn.dev>
This doesn't actually gain any performance but makes the code needlessly
complicated. Just directly add the indirect address as needed.
Signed-off-by: Niklas Haas <git@haasn.dev>
If you place the branch inside the loop, gcc at least reverts back to scalar
code, so better to just split up and guard the entire loop.
Signed-off-by: Niklas Haas <git@haasn.dev>
We often need to dither only a subset of the components. Previously this
was not possible, but we can just use the special value -1 for this.
The main motivating factor is actually the fact that "unnecessary" dither ops
would otherwise frequently prevent plane splitting, since e.g. a copied
alpha plane has to come along for the ride through the whole F32/dither
pipeline.
Additionally, it somewhat simplifies implementations.
Signed-off-by: Niklas Haas <git@haasn.dev>
I tried it, and it broke horribly. We should definitively rule out this.
Unfortunately, it's not as easy as just setting `disable asm` here, because
asm having been enabled will already have affected the build process further
upstream - conversely, we don't know whether or not memalign exists until
fairly late in the build process, and I'm not about to go destroying
the current organization of the file just to work around this.
So an error message it is.
This code calculates the pixel step using c->dstFormatBpp, which is wrong;
should use the pixdesc step instead. (dstFormatBpp is 3 for YUVX, but
the actual pixel step is 4 bytes)
This bug was mostly hidden by the fact that the same bug actually disabled
the fast path for the (common) case of strStride == dstStride. Uncovered while
making changes to the swscale.c allocation code.
Reproduced by modifying libswscale in such a way that the dst buffer stride
happened to be exactly 4/3 larger than the src buffer stride, e.g. by adding
extra padding to each dst buffer line, but reproducible in principle by any
API user.
Signed-off-by: Niklas Haas <git@haasn.dev>
The assumption that w < stride is not held for such formats. Rather than
this brittle logic we can just always copy the smaller of the two strides.
This is unlikely to affect anything in practice, since usually AVFrames with
the same size and format have matching linesizes, and so the fast path should
be taken. However, in the unlikely case that they don't agree, this fixes an
assertion failure when copying monow to monow.
Reproduced by modifying libswscale to pick a larger-than-normal dst buffer
alignment (and corresponding linesize), but reproducible in principle by any
API user.
Signed-off-by: Niklas Haas <git@haasn.dev>
Annoying C-ism; we can't overload the function type even though they will
always be pointers. We can't even get away with using (void *) in the
function signature, despite casts to void * being technically valid.
Avoid the issue altogether by just moving the process loop into the
type-specific template altogether, and just referring to the correct
compiled process function at runtime. Hopefully, the compiler should be
able to optimize these into a single implementation. GCC, at least, compiles
these down into a single implementation plus three stubs that just jmp
to the correct one.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is useful especially for the special case of scaling by common
not-quite-power-of-two constants like 255 or 1023.
Signed-off-by: Niklas Haas <git@haasn.dev>
By breaking to the `av_unreachable` below. This branch is unreachable because
of the `if (entry->flexible)` branch further above.
Signed-off-by: Niklas Haas <git@haasn.dev>
AVOption with AV_OPT_TYPE_INT assumes the field is int (4 bytes),
but enum size is implementation-defined and may be smaller.
This can cause memory corruption when AVOption writes 4 bytes
to a field that is only 1-2 bytes, potentially overwriting
adjacent struct members.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Add parameter `coder` for users to select entropy coding in D3D12 H264
encoding.
Named constants `cabac` (1) and `cavlc` (0) are supported.
Default is CABAC (1). If the driver does not support CABAC, a warning is
logged and encoding falls back to CAVLC.
Usage:
CABAC (default): `-coder cabac` or `-coder 1`
CAVLC: `-coder cavlc` or `-coder 0`
Sample command line:
```
ffmpeg -hwaccel d3d12va -hwaccel_output_format d3d12 -i input.mp4 -c:v h264_d3d12va -coder cavlc -y output.mp4
```
This patch enables two-pass encoding for libsvtav1 by implementing
support for AV_CODEC_FLAG_PASS1 and AV_CODEC_FLAG_PASS2.
Previously, users requiring two-pass encoding with SVT-AV1 had to use
the standalone SvtAv1EncApp tool. This patch allows 2-pass encoding
directly through FFmpeg.
Based on patch by Fredrik Lundkvist, with review feedback from James
Almer and Andreas Rheinhardt.
See: https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/327452.html
Changes:
- Use AV_BASE64_DECODE_SIZE macro for buffer size calculation
- Allocate own buffer for rc_stats_buffer (non-ownership pointer)
- Error handling with buffer cleanup
Signed-off-by: Werner Robitza <werner.robitza@gmail.com>
The input only needs to have natural alignment
(i.e. 1 in the 8bpp case, 2 otherwise), the output
only needs to have 16 (namely in the width 8 >8bpp case).
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This commit fixes four related bugs:
1. The >8bpp codepath only initializes half the buffer.
The remaining half used leftover samples from the 8bpp codepath
which initialized the complete buffer.
2. The 8bpp codepath tests only 2 bit inputs (&3). This means
that the second half of the buffer only uses 10 bits (in fact,
only values of the form 000000xx000000xxb) when treated as uint16_t
in the >8bpp test. Due to 1., using more bits in the 8bpp test
would make the >8bpp tests fail (the intermediates would no longer
fit into 16bits).
3. For the >8bpp tests, the first half of the buffer would
only be tested with 8bpp input.
4. The 8bpp codepath initializes the whole buffer, but only
uses half of it.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For bi-predicted weighted averages, only the sum
of the two offsets is ever used, so add the two early.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Improves error diagnostics for D3D12 video encoders check feature support by adding
detailed ValidationFlags reporting when driver validation fails.
This made it easy for users to identify which specific feature was
unsupported without manually decoding the flags.
Signed-off-by: younengxiao <steven.xiao@amd.com>
These muxers already set hard dependencies in
configure so they cannot be enabled unless the
dependencies are enabled.
So these error handling is unreachable.
Directly Using ff_*_muxer is simpler than calling
av_guess_format()
Refer to 289cb3beff
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
This gives more information about each operation and helps catch issues
earlier on.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This file is becoming too bloated and hard to read, so split it into separate
files, each having codec specific methods.
This will also speed up compilation when using several concurrent jobs.
Signed-off-by: James Almer <jamrial@gmail.com>
Allows multiple passes to share a single output buffer reference. We always
allocate an output buffer so that subpasses can share the same output buffer
reference while still allowing that reference to implicitly point to the
final output image.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
The global args to ff_sws_graph_run() really shouldn't matter inside thread
workers. If they ever do, it indicates a leaky abstraction. The only reason
it was needed in the first place was because of the way the input/output
buffers implicitly defaulted to the global args.
However, we can solve this much more elegantly by just calculating it in
ff_sws_graph_run() directly and storing the computed SwsImg inside the
execution state.
Signed-off-by: Niklas Haas <git@haasn.dev>
This allows already referenced planes to be skipped, in the case of e.g.
only some of the output planes being sucessfully referenced. Also avoids
what is technically UB, if the user happens to call ff_sws_graph_run() after
already having ref'd an image.
Signed-off-by: Niklas Haas <git@haasn.dev>
Using the original input image here is completely wrong - the format/palette
could have been set to anything else in the meantime. At best, we would want to
use the original input to add_legacy_sws_pass(), but it's impossible for this
to differ from the per-pass input. The only time legacy subpasses are added
is when using cascaded contexts, but in this case, the only context actually
reading from the palette format would be the first one.
I'm not entirely sure why this code was originally written this way, but
I'm reasonably confident that it's not at all necessary. Tested extensively
on both FATE, the self-test, and real-world files.
Signed-off-by: Niklas Haas <git@haasn.dev>
This annoyingly requires recreating some of the logic inside av_img_alloc(),
because there's no good existing current helper accessible from libswscale
that gives per-plane allocations like this.
The new code is based off the calculations inside libavframe/bufferpool.c.
Signed-off-by: Niklas Haas <git@haasn.dev>
When multiple passes share a buffer reference, the true buffer dimensions
may be different for each pass, depending on slice alignment. So we can't
rely on the pass dimensions being representative.
Instead, store this information in the SwsPassBuffer itself.
Signed-off-by: Niklas Haas <git@haasn.dev>
I want to add more metadata to this and also turn it into a refstruct,
but get the cosmetic diff out of the way first.
Signed-off-by: Niklas Haas <git@haasn.dev>
We don't actually have an SwsContext yet at this point, so just use
AV_OPT_SEARCH_FAKE_OBJ. For the actual evaluation, the signature only
requires that we pass a "pointer to a struct that contains an AVClass as
its first member", so passing a double pointer to the class itself is
sufficient.
The check for item presence was insufficient as it would result in the last
item in the array being overwritten if it existed even if the id didn't match.
Fixes: Assertion ref failed at src/libavformat/mov.c:10649
Fixes: clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-5312542695292928
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: James Almer <jamrial@gmail.com>
Each SubBand currently contains an array of 519 uint8_t[32],
yet most of these are unused: For both the decoder and the
encoder, at most 34 contexts are actually used: The only
variable index is context+2, where context is the result
of av_log2() and therefore in the 0..31 range.
There are also several accesses using compile-time indices,
the highest of which is 30. FATE passes with 31 contexts
and maybe these are enough, but I don't know.
Reducing the number to 34 reduces sizeof(SnowContext)
from 2141664B to 155104B here (on x64).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
One can use the same shuffles for the width 8 and width 16
case if one also changes the permutation in vpermd (that always
follows pshufb for width 16).
This also allows to load it before checking width.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The C versions of the average and weighted average functions
contains "FFMAX(3, 15 - BIT_DEPTH)" and the code here followed
this; yet it is only instantiated for bit depths 8, 10 and 12,
for which the above is just 15-BIT_DEPTH. So the comparisons
are unnecessary.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
derive_weight() is only called when pred_flag is PF_BI,
which only happens in B slices.
Reviewed-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Restructure the loop slightly to avoid UB in the first loop iteration if
src is 4 bytes, which otherwise computes (0 << 32) | 1. Instead, make
1 the default base case and only shift+add if src < dst.
Add an explicit check to preserve the behavior of returnin 0 if src > dst.
Also deduplicate printing json and summary output.
Reviewed-by: Kyle Swanson <k@ylo.ph>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Instead of unescaping the entire image data buffer in advance, and then
having to perform heuristics to skip over where the restart markers
would have been, unescape the image data for each restart marker
individually.
The initialization code was only being run when mb_y was 0, so it could
just as well be moved out of the loop.
I haven't been able to find a bayer sample that has restart markers to
check whether vpred should be reinitialized at every restart. It would
seem logical that it should, but I have left this out until we find a
sample that does have restart markers.
Use naming for SOS header fields from ISO/IEC 10918-1's non-lossless
mode of operation in ff_mjpeg_decode_sos() instead of mixing JPEG-LS
and lossless names. Each decode function still keeps its correct name
for each field.
For hwaccel, find_marker() was being used to skip over the image data,
which could include multiple restart markers.
For MJPEG-B and THP, the field size was already correct since the image
data was already unescaped.
For the rest (mjpeg and jpegls), the buffer was being incremented by
the unescaped_buf_size, which could be smaller than the actual buffer
size.
Now the buffer is correctly incremented in all cases.
For non-jpegls:
Changes the behaviour to be more in line with IJG's reference implementation:
- optional 0xFF fill bytes in a stuffed zero byte sequence (which is an
invalid pattern according to the standard) are now discarded:
"FF (FF)? 00" => "FF" instead of "FF 00"
- sequences with optional 0xFF fill bytes and a marker are no longer copied:
"FF (FF)? XX" => "" instead of "FF XX"
- a trailing 0xFF byte is no longer issued when a valid "0xFF 0xXX" marker
is found:
"FF XX" => "" instead of "FF"
For jpegls:
Changes the behaviour to be more in line with IJG's (non-jpegls) reference
implementation, similar to the changes above:
- optional 0xFF fill bytes in a stuffed zero bit sequence (which is an
invalid pattern according to the standard) are now discarded:
"FF (FF)? 0b0xxxxxxx" => "FF 0bxxxxxxx" instead of "FF 7F XX"
- sequences with optional 0xFF fill bytes and a marker are no longer copied:
"FF (FF)? 0b1xxxxxxx" => "" instead of "FF 7F"
Unescaping for jpegls is now done in one pass instead of two. The first
pass used to detect the length of the buffer, while the second pass would
copy up to the detected length.
Note that jpegls restart markers are still not supported.
There is also a speed up with the new implementations, mostly due to the
usage of memchr() as suggested by Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When an array of pointers to strings is converted into
an array of constant-length strings, one has to find out
the maximum size of the strings. Currently no warning will
be emitted If one forgets to account for the trailing zero,
making this optimization dangerous (think of the scenario
where the array will be modified without adjusting the maximum
size of the strings).
The -Wunterminated-string-initialization warning catches these cases.
It is supported by GCC 15.1 and Clang 21. It only requires to mark
the few instances of arrays initialized via string literals that
are not supposed to be strings with the nonstring attribute; this
has already been done in the previous commit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This attribute is used to signal to the compiler
that an array object initialized via a string literal
is not a real string and may lack the space for
the trailing zero, as in char fourcc[4]="FOUR".
This is in preparation for enabling
the -Wunterminated-string-initialization warning.
Reviewed-by: J. Dekker <jdek@itanimul.li>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This stringtable is a concatenation of strings, with their
own trailing \0. Without an explicit size said stringtable
ends with two \0 (one from the last string and one more
because it is initialized via a string literal); the explicit
size exists to discard the last \0.
Yet this makes GCC (but not Clang) think that this is an
unterminated string, leading to warnings when
-Wunterminated-string-initialization is enabled.
So just remove the explicit size to avoid this.
(An alternative would be to mark this as a non-string
for GCC, but not for Clang, but this has been deemed ugly;
another alternative would be to omit the internal \0 from
the stringtable and use length fields instead, but
the necessary copying code more than outweighs the size benefits
of omitting the \0.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Clang by default allows unknown warnings and merely emits
a Wunknown-warning-option warning which is non-fatal by default,
making it appear as if Clang supported any warning. This patch
rewrites the tests to always add the arguments to error out
in case an unrecognized option is encountered.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
E.g. when CC is clang, the unknown warnings flags
are -Werror=unused-command-line-argument and
-Werror=unknown-warning-option. These flags are currently
also used for testing the C++ and ObjC compilers, yet
g++ does not recognize these flags and errors out because
of them, so that the tests fail and the parentheses,
switch etc. warnings are not disabled for C++.
Fix this by using per-language flags.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Reorder to avoid allocation if there is nothing to read and also solve
a memory leak in that case.
Also make sure to free the RTSPMessageHeader, which is not passed
to the caller.
Fixes: signed integer overflow: 1878131215 + 2013265920 cannot be represented in type 'int'
Fixes: 472729732/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RV60_fuzzer-4893818005815296
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This was just a minor/pointless optimization in the first place. We keep
the skip on the last component because we can never commute that past the
end of the list.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
When splitting planes, some planes can end up without a read operation
altogether, e.g. when just clearing the alpha plane.
Just return ENOTSUP for such lists instead of EINVAL.
Also fixes the !ops->num_ops check to avoid UB.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
The way this code was written relied on the implicit assumption that no other
row was reading from the same column, which was true in practice so far but
not necessarily true in general. Fix it by precomputing the nonzero component
mask and then adding an explicit check.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This requires a bit of a manual check in the 32-bit integer case to
make sure we don't exceed the value range of AVRational; but it still allows
quite a number of optimizations despite that restriction.
e.g.
rgb24 -> yuva444p9be:
- [u16 ...X -> ++++] SWS_OP_CLEAR : {_ _ _ 511}
- [u16 .... -> zzzz] SWS_OP_SWAP_BYTES
- [u16 .... -> zzzz] SWS_OP_WRITE : 4 elem(s) planar >> 0
+ [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES
+ [u16 ...X -> zzz+] SWS_OP_CLEAR : {_ _ _ 65281}
+ [u16 .... -> zzz+] SWS_OP_WRITE : 4 elem(s) planar >> 0
gray -> yuv444p12be:
- [u16 .XXX -> +++X] SWS_OP_CLEAR : {_ 2048 2048 _}
- [u16 ...X -> zzzX] SWS_OP_SWAP_BYTES
- [u16 ...X -> zzzX] SWS_OP_WRITE : 3 elem(s) planar >> 0
+ [u16 .XXX -> zzXX] SWS_OP_SWAP_BYTES
+ [u16 .XXX -> z++X] SWS_OP_CLEAR : {_ 8 8 _}
+ [u16 ...X -> z++X] SWS_OP_WRITE : 3 elem(s) planar >> 0
Ultimately, the benefit of this will only become relevant once we start
splitting apart planes, since then we can have planes with only CLEAR
operations.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of in each read() function. Not only is this slightly faster, due
to promoting more tail calls, but it also allows us to have operation chains
that don't start with a read.
Also simplifies the implementations.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Allows this backend to process op lists without a read, e.g. for pure clear
operations. I decided to change `write` to a pointer as well for symmetry.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of awkwardly preserving these from the `SwsOp` itself. This
interpretation lessens the risk of bugs as a result of changing the plane
swizzle mask without updating the corresponding components.
After this commit, the plane swizzle mask is automatically taken into
account; i.e. the src_comps mask is always interpreted as if the read op
was in-order (unswizzled).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This helper function now also takes into account the plane order, and only
returns true if the SwsOpList is a true no-op (i.e. the input image may be
exactly ref'd to the output, with no change in plane order, etc.)
As pointed out in the code, this is unlikely to actually matter, but still
technically correct.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This can be used to have the execution code directly swizzle the plane
pointers, instead of swizzling the data via SWS_OP_SWIZZLE. This can be used
to, for example, extract a subset of the input/output planes for partial
processing of split graphs (e.g. subsampled chroma, or independent alpha),
or just to skip an SWS_OP_SWIZZLE operation.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This optimization is lossy, since it removes important information about the
number of planes to be copied. Subsumed by the more correct
Instead, move this code to the new ff_sws_op_list_is_noop().
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
And use it in ff_sws_compile_pass() instead of hard-coding the check there.
This check will become more sophisticated in the following commits.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Descriptor buffers were a neat attempt at organizing descriptors.
Simple, robust, reliable.
Unfortunately, driver support never caught on, and neither did validation
layer support.
Now they're being replaced by descriptor heaps, which promises to be
the future. We'll see how it goes.
The layer settings extension was enabled since the list was blank,
and some compilers complained about that, and it should have been
always supported on all platforms everywhere.
Unfortunately, some platforms LIE, claim they support it and yet
they error out that the extension is missing.
Juse zero pad the array.
fixup 80229c1
[scale_vulkan @ 0000028b1c2c1300] scale:31: error: 'texture' : no matching overloaded function found
scale:31: error: 'return' : cannot convert return value to function return type
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
This adds the ability to have a stored reply, needed for the
SET_PARAMETERS command feature to reliably report the reply even
while intermediate packets are read.
Fixes the behavior when calling ff_rtsp_read_reply again after it
was called at another place with return_on_interleaved_data set to
true. Before, it would result in completely corrupting the internal
state as the $ interleaved packet marker would have already been read
which the next run of ff_rtsp_read_reply would look for but never find
it, trying to read packet data as RTSP message.
This new API adds the ability to send commands to the demuxer and also
receive replies to these commands.
It is useful in cases like the RTSP demuxer, where it might be desirable
for the API user to send commands like SET_PARAMETER or PLAY and PAUSE.
Sending and receiving are decoupled, as to not require waiting for a
reply when sending, as with some demuxers and depending on the command
it could be necessary to process further packets in the meantime
before requesting a reply.
Based on the Makefile variables, "make fate-audio"
is supposed to include the tests for several audio
codecs with their own Makefile. Yet because they are
included after audio.mak*, this doesn't work atm.
Fix this by adding explicit prerequisites.
*: Expansion of variables is not deferred in the context
of prerequisites.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
loudnorm provides stats output that's meant to be used for two-pass
normalization. These stats are often interleaved with ffmpeg's stream
descriptions and other output, making them difficult to parse and pass
to the second pass. The new stats_file option enables writing the stats
to a separate file, or to standard output, for simple parsing and other
programmatic usage.
Signed-off-by: Adam Jensen <adam@acj.sh>
Fixes: null pointer dereference
Fixes: 471768165/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_CFHD_DEC_fuzzer-6187504467509248
The first frame allocates buffers with one transform type
the second frame sets up another transform type but the code to reallocate buffers is never triggered
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This removes the internal sample format convert
Fixes: nan is outside the range of representable values of type 'int'
Fixes: 471946097/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_LIBVORBIS_DEC_fuzzer-4843605174059008
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Given we now align both dimensions when allocating buffers, don't reinitialize
the pool when dealing with dimension changes that will not affect the existing
pool size.
Signed-off-by: James Almer <jamrial@gmail.com>
For AV1, NV_ENC_BFRAME_REF_MODE_MIDDLE does not use a single middle
B-frame. Per the NVENC Programming Guide, it sets every other B-frame
as an Altref2 reference except the last B-frame in the Altref interval.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Remove the improperly named check functions that were added for the
opencolorio detection in 677cf95ea4.
Those are not needed anymore and there are already require_cxx and
check_lib_cxx for this purpose.
When checking for C++ classes in check_class_headers_cxx, the test code
required the class to be default-constructible, which is not always the
case. For the sake of the test, it is enough to check that referencing
the class succeeds.
Loading CA certificate is supported.
Remove unrelated comments.
The underlying socket can be tcp or udp.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
This function just calculate and copy the fingerprint
to the provided buf, will not allocate memory.
It fails when the input (such as cert) is invalid.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
Fixes: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself
Fixes: 471604230/clusterfuzz-testcase-minimized-ffmpeg_dem_LRC_fuzzer-5474264750030848
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: stack overflow
This should have limited security impact as it requires access to arbitrary
options.
Found-by: Zhenpeng (Leo) Lin from depthfirst
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
I could not find any documentation which disallows the use of transparency for
pixel formats other than YUV 4:2:0, so this patch adds support for transparency
using 4:2:2, 4:4:4, RGB, and their high bit depth variants.
It is not quite clear if the alpha channel should be encoded using the
same pixel format as the normal channels, or it should be always YUV 4:2:0. I
sticked to 4:2:0.
Unfortunately the browsers I tested (chrome, firefox) only support 4:2:0
properly, so let's require an experimental flag to generate files with the new,
more exotic pixel formats.
Signed-off-by: Marton Balint <cus@passwd.hu>
AlphaMode should indicate whether the BlockAdditional Element with BlockAddID
of "1" contains Alpha data, as defined by to the Codec Mapping for the `CodecID`.
Only VP8 and VP9 Codec Mappings define this, so writing it for all codecs with
a pixel format of YUVA420P, or for codecs other than VP8/VP9 (if the alpha_mode
metadata entry was set) was wrong.
Signed-off-by: Marton Balint <cus@passwd.hu>
All callers in FFmpeg check this already, but it is a public
function that can plausibly be given more channels.
In which case out of array writes would occur
This is likely a regression from when channel layouts where extended
to support more than 64 channels
Found-by: 이동준 <ldj6192@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access
Fixes: 418335931/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_DEC_fuzzer-6718455383654400
Fixes: 471611870/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_DEC_fuzzer-6645447302381568
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In this case an av_assert2 in lavu/softfloat.h pulls in
a dependency on av_log(), which is not available at all,
because the tablegen tools are created for and run on
the host, not the target. This leads to linking failures.
Fix this by undefining ASSERT_LEVEL before the inclusion
of avassert.h.
Reviewed-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Priming samples don't depend on the contents of the first frame passed to the
encoder but rather on encoder configuration, so use dummy vorbis_dsp_state and
vorbis_block with the main encoder vorbis_info to generate a packet we can then
parse to derive the initial padding.
Muxers will now be aware of the initial padding before receiving packets, which
will allow them to write proper container headers during init without any extra
considerations like having to do a second pass or rewrite during write_trailer.
Signed-off-by: James Almer <jamrial@gmail.com>
ff_rate_control_init/uninit() are only called during
codec init or close and no DSP calls happen in these,
so there is no need to reset the floating point state.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The MMX code has been removed in 5ef613bcb0
and MMX was not used for most cpus even before then.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There are no MMX functions left in all of libswresample, so checking
for it makes no sense. Notice that configure currently disables x86asm
if mmx is disabled, so the check is also redundant.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Said function has presumably been added in order to check
that we have successfully reset the floating point state
after having violated the ABI/calling convention by not
issuing emms in our MMX DSP routines.
Yet the ability to check this should not have been made public,
because there is no external need for it and because the
function does not even always achieve what its documentation
claims to do: It only works when inline assembly is available.
It should have been implemented in emms.h like emms_c()
(which is where a replacement should be put if there is still
ABI-violating MMX code when av_assert0_fpu() is removed).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Starting with PyTorch 2.6, the hooks API was redesigned so it no longer depends on the device type.
As part of this change, the XPU initialization function was renamed from initXPU() to init().
Add version check to support both old and new LibTorch versions.
Signed-off-by: younengxiao <steven.xiao@amd.com>
When using `dnn_processing` filter with torch backend, FFmpeg hangs indefinitely because no inference is actually performed.
Resolve this problem by add "else" branch for synchronous execution path.
Usage:
ffmpeg -i input.mp4 -vf scale=224:224,format=rgb24,dnn_processing=dnn_backend=torch:model=sr_model_torch.pt:device=cpu output.mp4
Don't just export initial_padding, but also take it into account for timestamps
by adjusting the stream's start_time. Also, export the padding samples in a way
that's actually supported, as currently trailing_padding is unused so no user
will look at it.
For the sake of not reducing coverage and because the lossy tests have reference
files that contain the priming samples, add the skip_manual flags2 to the tests
so we still output them. A new ffprobe call will ensure the demuxer behavior is
tested too.
Signed-off-by: James Almer <jamrial@gmail.com>
Unlike md5, the adler32 hash shows the frames with priming samples, which will
be useful in the following commits.
Signed-off-by: James Almer <jamrial@gmail.com>
This also advertises the supported sample rates/channel layouts.
For ADPCM_IMA_AMV, it actually fixes the advertised channel layouts.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
We know that this is Dolby Vision Enhancement Layer and while it's not
handled, we can just reduce log spam for this, as it's if fact
recognized.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Sockets of type SOCK_DGRAM don't support listen(), so it was
impossible to read from "unixgram" sockets in ffmpeg.
Signed-off-by: Rost Kurylo <rost.kurylo@netint.ca>
Add support for decoding 42ni and 23ni in mov/mp4.
Note that in24/in32 can be either LE or BE thanks to enda atom,
so this patch leaves the muxer unchanged.
Fixes#21445.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
Photoshop documents can contain additional "auxiliary" channels that
don't take part in layer compositing, therefore, using the channel count
to determine the presence of an alpha channel in the merged image is
incorrect. Instead, as per the PSD specification, use the sign of the
layer count (present in the layers and masks section) to determine if
there is an alpha channel, then determine the number of primary channels
using both the presence of the alpha channel and the pixel format.
It is pointless for them given that these values will
be overridden by the decoder lateron anyway.
The only exceptions to this are scenarios where the decoder
actually checks request_sample_fmt or where there are both
fixed- and floating point decoders.
(IMO something like get_format() for audio would be better for
both cases.)
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ch_layouts, sample_fmts and supported_samplerates
are of no use for a decoder, so don't set them.
In this case this fixes an inconsistency:
ch_layouts contained mono and stereo, yet lateron
the AVCodecContext's ch_layout is always populated
with an AV_CHANNEL_ORDER_UNSPEC layout.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary for a decoder (the sample fmt is
set lateron during init and the user has no choice
over this (even for the decoders where sample_fmts
contains two entries).
(This also fixes the ADPCM_DECODER macro: It's second
and third parameters were called "name" and "sample_fmts",
yet the second argument always contained the sample fmts.)
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SwsContext.{src,dst}_format is int (but uses enum AVPixelFormat)
values, so the accesses need to be performed using an int
for -fshort-enums support.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In particular, make this code -fshort-enums compatible.
The branch can be optimized away by compilers when
sizeof(enum AVPixelFormat) == sizeof(enum AVSampleFormat).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Return the data in an AVFrame instead. This is what several users
({find,cover}_rect*) want anyway. This also avoids accessing
AVFrame.format (an int) via an enum AVPixelFormat*.
*: This commit actually avoids two frame copies For find_rect:
av_frame_clone() contained an implicit alloc+copy.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also rename it to ff_pixfmt_is_in(). This is more type-safe;
in particular, it is required to support -fshort-enum.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Assemble it already in caf_write_packet(). This has the advantage
of reducing the amount of buffers used; it also allows to avoid
a seek when writing the trailer and avoids function call overhead
(for the avio_w8(); it also reduces codesize).
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: out of array access on resolution change with slices threads
Fixes: VULN-10/poc.ivf
Found-by: Zhenpeng (Leo) Lin from depthfirst
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array read
Fixes: VULN-6/poc.raw
Found-by: Zhenpeng (Leo) Lin from depthfirst
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access if a filter-graph is used the injects changing dimensions
Found-by: Zhenpeng (Leo) Lin from depthfirst
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This fixes running the test under Valgrind, where more, unrelated lines are
printed, making the comparison fail.
Signed-off-by: James Almer <jamrial@gmail.com>
Reset `sc->stsd_count` before parsing entries. This number doesn't get
reset, which means that multiple parse passes can increment it past the
`sc->extradata` array end and cause OOB writes.
Abort earlier if subblock durations are inconsistent with their containing block,
and ensure each subblock duration is at least 1.
Signed-off-by: James Almer <jamrial@gmail.com>
Abort earlier if subblock durations are inconsistent with their containing block,
and ensure each subblock duration is at least 1.
Signed-off-by: James Almer <jamrial@gmail.com>
It was done for the sake of having subblock_duration not be zero as the spec
forbids that value, but harcoding it to any arbitrary value is no better
considering the user is meant to fill the entire structure.
This helps speeding up the function when trying to allocate a struct with a
huge amount of blocks.
Signed-off-by: James Almer <jamrial@gmail.com>
EXIF IFD entries with TIFF field type 0 are invalid per the specification.
Without a check, exif_read_values() fails to allocate entry->value,
causing an out of memory error.
This patch skips such entries early during parsing, allowing decoding
to continue normally.
Fixes: https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/21623
Sometimes, HTTP sources require a lot of seeking during probing / header
parsing (especially for formats like MXF). Currently, we need to completely
tear down and re-establish the connection most times this happens, which puts
a lot of stress on the network stack and also results in transmission of
possibly many unnecessary bytes.
This patch adds an option to allow FFmpeg to request partial ranges during
the initialization stage. This is done until the initial request size is fully
read, after which we fall back to the normal behavior (i.e. infinite streaming
via an unbounded request).
The usefulness of this is limited without also specifying -multiple_requests 1,
since otherwise there is little point to requesting partial ranges to begin
with. (However, it is semantically independent, so we keep it that way.)
When the previous reply was a partial response (e.g. due to a seek to the
end of the file), and the remaining data from that partial response is
below the short seek size threshold, we can serve this seek by just draining
that data and re-using the existing connection.
This can currently only happen when using keep-alive connections
(-multiple_requests 1) and seeking from the end of the file to somewhere
else, in which case the file's tail can be drained and the connection re-used.
Under other circumstances, however, we still need to force a reconnection,
because we do not yet send partial range requests. (This will be changed in the
following commit)
We need to take special care not to regress the existing fallback logic
for when `http_open_cnx` fails, so here is a quick case analysis:
non-drain path:
- identical to the current
soft drain fails: (ffurl_read error path)
- s->hd = old_hd = NULL
- http_open_cnx() always opens a new connection
- on failure, old buffer is restored and s->hd remains NULL
soft drain succeeds, http_open_cnx() fails:
- s->hd is set to NULL by http_open_cnx() failure path
- old_hd was never set, so remains NULL
- old buffer is restored, s->hd remains NULL
In either case, the outcome that any (previously valid) buffer is left as-is,
the offset is unchanged, and the connection ends up closed (s->hd == NULL).
This is okay to do after the previous change to http_buf_read, which allows
it to internally re-open the connection if needed.
If the Content-Range indicates a smaller range than what we expected,
we should send a new request for the remainder before attempting to read
more.
Again, this commit is theoretically non-functional on its own, since any
conforming HTTP server should give us the entire range we asked for in the
first place, but it is semantically independent from and prepares us for the
following changes.
This could conceivably happen currently if the user tries reading more
bytes after the last chunk has already been received. In this case,
we currently segfault - but simply returning AVERROR(EIO) seems more
reasonable and lets the higher end retry the connection in this case.
In the event that the range returned is smaller than the true filesize, we
should only expect to receive that many bytes - not the entire rest of the
file.
This commit is theoretically non-functional on its own, since any conforming
HTTP server will always return us the full file range, but I wanted to split
it off from the subsequent changes in order to make review easier.
This fails to consider the case of whence == SEEK_END and the resulting
offset happening to exactly match the current position.
Reorder the check to compute the target position first, then compare.
This reverts commit aa20d7b3e8.
Adding these avpriv functions is absolutely overblown: Muxers
can get the desired duration in a few lines themselves.
In particular, using the parse functions from this file
necessitated parsing the extradata (and entailed exporting
the parsing function), although it was only used to know
whether the frames are self-delimiting, but everything of
interest to a muxer does not depend on this at all.
The commit to be reverted also made several structures
part of the ABI, which should be avoided in general.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This avoids avpriv functions from lavc/opus/parse.c
(which parse way more than we need, necessitating
parsing the extradata).
It furthermore makes the output of the muxer consistent,
i.e. no longer depending upon whether the Opus parser
or decoder are enabled (the avpriv functions would just
return AVERROR(ENOSYS)).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Before this patch, the last packet in the affected fate test would be written
without a BlockDuration element despite the packet's duration being shorter
than the Opus frame size.
Signed-off-by: James Almer <jamrial@gmail.com>
Following up on b613eebe78, if a demuxer that exports complete frames sets a
duration, don't overwrite it from the output of the parser.
Signed-off-by: James Almer <jamrial@gmail.com>
We did check video codecs but not audio
Fixes: Assertion failure (on codec_id) in parser.c
Fixes: 472097507/clusterfuzz-testcase-minimized-ffmpeg_dem_LIVE_FLV_fuzzer-6016386662203392
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
dxv assumes that newly reallocated memory in tex_data is not uninitialized
thus we have to do that too in case of reallocation in ff_lzf_uncompress()
Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
size represents the output size
randomly changing it but not reseting it on errors leaks uninitialized memory.
Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In the case where the last sorted value was -1u and we where on the first
pass of run1 we failed to fill the last few values of bitmap
No real world testcase is known
Fixes: use of uninitialized memory
Fixes: 460333808/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FFV1_fuzzer-6370167888347136
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The ldrsw instruction requires immediate offset with # prefix.
This fixes the syntax error introduced in commit 26752368f0
(aarch64/h26x: Add put_hevc_pel_bi_w_pixels) where the
load_bi_w_pixels_param macro was added.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
The 'sws_dither' and 'alphablend' options access 'SwsDither' and
'SwsAlphaBlend' enum fields as integers. This is unsafe when the
code is compiled with -fshort-enums, as the enum size might be
smaller than an int.
Since the 'dither' and 'alpha_blend' struct members are part of the
public API, their types cannot be easily changed.
To ensure safe integer access and maintain ABI compatibility across
different compiler settings, a MAX_ENUM value is added to force the
enums to a 32-bit underlying type.
Fix#20712
When print_sdp() fails during rtp streaming, sch_start()
returns error code without calling wirte_trailer, causing
a memory leak.
The avformat_free_context() will call the mux->deinit, so
we should add a rtp_deinit() to free all internal allocations.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
The -sources and -sinks options were defined with OPT_FUNC_ARG flag,
which requires an argument. This caused "Missing argument for option
'sources'" error when running ffprobe -sources without arguments.
Removing OPT_FUNC_ARG flag allows these options to work without
arguments, listing all available devices, while still supporting
optional device name argument for filtering specific devices.
The `sub_packet` index in `QDM2Context` was not reset to 0 when
`qdm2_decode_frame` started processing a new packet. If an error
occurred during the decoding of a previous packet, `sub_packet` would
retain a non-zero value.
In subsequent calls to `qdm2_decode_frame` with a new packet, this
non-zero `sub_packet` value caused `qdm2_decode` to skip
`qdm2_decode_super_block`. This function is responsible for initializing
packet lists with pointers to the current packet's data. Skipping it led
to the use of stale pointers from the previous (freed) packet, resulting
in a heap-use-after-free vulnerability.
This patch explicitly resets `s->sub_packet = 0` at the beginning of
`qdm2_decode_frame`, ensuring correct initialization for each new
packet.
Fixes: OSS-Fuzz issue 476179569
(https://issues.oss-fuzz.com/issues/476179569).
The variable 'size' is used as a loop index for the 'sizes' array.
This naming similarity is error-prone and recently led to a typo where
'size[sizes]' was written instead of 'sizes[size]'.
Rename the loop index variable from 'size' to 'idx' across all 10 test
functions to make the code more readable and prevent similar typos.
Additionally, replace the hardcoded loop upper bound '10' with
'FF_ARRAY_ELEMS(sizes)' for better maintainability.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Commit 4d4b301e4a introduced a typo where `size[sizes]` was used
instead of `sizes[size]` in 10 places within checkasm_check_pixel_padded
calls.
Since `sizes` is an array and `size` is the loop index, `size[sizes]`
interprets the array pointer as an index, resulting in undefined behavior
and causing AddressSanitizer to detect buffer overflows during testing.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Fixes: tickets/4663/levels30.jp2
The file decodes without error messages and no integer overflows
The file before the broader M_b check did decode with error messages and integer overflows but also no visual artifacts
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This logic was previously added to the scheduler. That commit added locking
(and rescheduling) when updating the corresponding `receive_finished` flag,
but missed doing the same for `send_finished`.
Fixes: fd1fd5850d
(This also fixes a symbol name collision with libssh,
which has a nonstatic function called md5.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
We're not writing a kuki chunk because its contents for Opus are currently
unknown, so it's best if we don't allow the creation of non spec compliant
files.
Signed-off-by: James Almer <jamrial@gmail.com>
Given the contents of the Opus kuki are not fully clear, generate it
using know values from the desc and pakt chunks.
Signed-off-by: James Almer <jamrial@gmail.com>
caf_write_deinit() would segfault if the CAFStreamContext
couldn't be allocated. Fix this by moving everything from
CAFStreamContext to the ordinary CAFContext; the separation
doesn't make sense for a format with only one stream anyway
and removing it also avoids an indirection.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Match more SVE/SME specific details.
Also lowercase all register names. As this matches many cases
of code comments that refer to variables elsewhere, not specific
registers, we only apply this tranformation on the part of lines
before a potential comment.
This file is excempt from the indent checker script, as there
are a few other bits in it that the script wants to reformat
into slightly worse form, or which might not warrant being
reformatted.
But these instructions should indeed be indented this way.
If a codec has fixed block_align and frame_size but a given sample has either
priming or remainder frames, a pakt chunk can be written declaring zero packets
and no table, reporting only the samples to be discarded.
Signed-off-by: James Almer <jamrial@gmail.com>
st->duration is not guaranteed to be set, so store the sum of packet durations instead.
Also, set mPrimingFrames and mRemainderFrames to correct values.
Based on a patch by Jun Zhao.
Signed-off-by: James Almer <jamrial@gmail.com>
Take into account priming frames, exported as start time, and remainder frames,
substracted from the stream duration as well as exported as discard padding
side data in the last packet.
Signed-off-by: James Almer <jamrial@gmail.com>
The hardcoded extra 120 samples results in the side data reporting the need to
discard the entire packet rather than the padding samples.
This is in line with the behavior of the libopus encoder.
Signed-off-by: James Almer <jamrial@gmail.com>
Sample size is always sizeof(float), is planar is a simple if
given that these codecs only support float and planar float.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(It would even be possible to avoid a clobbering m10 in
MASKED_COPY and the mask register (%3) in MASKED_COPY2
when VEX encoding is in use.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The result of using pmaddwd with the coefficients 1,-1,...,1,-1
is just the negative of using pmaddwd with the coefficients
-1,1,...,-1,1, so avoid one pmaddwd.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Avoid incrementing lowq and highq inside the loop by using
complex addressing modes, avoiding to undo said modification
at the end of the horizontal loop.
For inputq, modify istrideq outside of the loop so that
it is only modified once at the end of the horizontal loop.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Sign extend the integer arguments directly from the stack
instead of loading qwords, followed by sign-extending the
lower half.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, cfhdencdsp used constants consisting
of -1, 1, ...,-1,1 words and 1, -1,...,1,-1 words
for use as constants in pmaddwd. But one can use
the same constants if one shuffles the words in
a dword the opposite order. Similarly for some other
constants. This also allowed to avoid a register in
chfdenc_vert_filter.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When s->external_sock is enabled, the underlying sock
will be set after tls_open(), so it should not open a
sock and handshake.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
Implement ff_ssl_*_key_cert()
Generate self-signed cert and key in server
mode if there're no key and cert input.
Implement ff_tls_set_external_socket() and
ff_dtls_export_materials()
Add gnutls as dtls protocol deps.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
A constant named 8 has value 8 is redundant. These constants cannot
prevent user from setting other vaues. Just describe valid values
in the help message.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The b_adapt option allows users to control adaptive B-frame decision
when lookahead is enabled in HEVC encoding. This feature was already
available for H.264 and AV1 encoders, but was missing from HEVC.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
before this commit ffmpeg get Heap Buffer Overflow in DASH Demuxer
via Negative Start Number.
Check the value from mpd xml, set the value to 0 if get negative value.
Fixes: heap buffer overflow
Found-by: Zhenpeng (Leo) Lin from depthfirst
This would make this code compatible with forcing VEX encodings
for the SSE4 codepath and is also more correct, because avx_enabled
would be enabled for AVX, although the instructions used in these
codepaths require AVX2.
Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These names are a remnant of dsputil when all the DSP functions
from all codecs were part of DSPcontext.
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Now that the SSE2/AVX functions are no longer restricted
to those systems having an aligned stack, the MMXEXT functions
are always overridden (except for ancient systems without
SSE2), so remove them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Allocate it via cglobal as usual. This makes the SSE2/AVX functions
available when HAVE_ALIGNED_STACK is false; it also avoids
modifying rsp unnecessarily in the deblock_h_luma_intra_10 functions
on Win64.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are a remnant of the MMX functions (which processed
only eight pixels at a time, so that it was called twice
via a wrapper; the actual MMX function had "v8" in its name
instead of simply v) which have been removed in commit
4618f36a24.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These functions are not used with these parameters;
see the filter_mb_* functions in h264_loopfilter.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Forgotten in 4618f36a24.
Also remove a PASS8ROWS wrapper that seems to have been always
unused.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Implement NEON optimization for HEVC dequant at 8-bit depth.
The NEON implementation uses srshr (Signed Rounding Shift Right) which
does both the add with offset and right shift in a single instruction.
Optimization details:
- 4x4 (16 coeffs): Single load-process-store sequence
- 8x8 (64 coeffs): Fully unrolled, no loop overhead
- 16x16 (256 coeffs): Pipelined load/compute/store to hide memory latency
- 32x32 (1024 coeffs): Pipelined with all available NEON registers
Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_8_c: 11.3 ( 1.00x)
hevc_dequant_4x4_8_neon: 6.3 ( 1.78x)
hevc_dequant_8x8_8_c: 33.9 ( 1.00x)
hevc_dequant_8x8_8_neon: 6.6 ( 5.11x)
hevc_dequant_16x16_8_c: 153.8 ( 1.00x)
hevc_dequant_16x16_8_neon: 9.0 (17.02x)
hevc_dequant_32x32_8_c: 78.1 ( 1.00x)
hevc_dequant_32x32_8_neon: 31.9 ( 2.45x)
Note on Performance Anomaly:
The observation that hevc_dequant_32x32_8_c is faster than 16x16 (78.1 vs 153.8)
is due to Clang auto-vectorizing only for sizes >= 32x32.
Compiler: Apple clang version 17.0.0 (clang-1700.6.3.2)
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Previously x265 encoder used its default log level regardless of
FFmpeg's log level setting. Note the log level can be overwritten
by x265-params.
Fix#21462
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
In config_output() for direction=DU/RL, the position
is initialized to s->sono_size, which equals h or w
when bar=0. That position is later used as an
in-bounds pixel coordinate without clamping in draw(),
causing writes past the end of the output planes.
Repro:
ffmpeg -f lavfi -i sine=frequency=1000:sample_rate=44100 \
-filter_complex "[0:a]showcwt=s=640x512:bar=0:direction=du[v]" \
-map "[v]" -frames:v 1 -f null -
AddressSanitizer: heap-buffer-overflow ... WRITE of size 1
Initialize and wrap the DU/RL position to sono_size - 1
(or 0 when empty), preventing out-of-bounds row/column
writes when bar=0 while preserving existing slide behavior.
Add chroma_location option so that, in the subsampled-to-subsampled case, the destination's chroma siting can be changed from the source's without having to use other filters. Useful, for example, when converting BT.2020 to BT.709, where the former customarily uses "top left" and the latter "left."
Update documentation.
Closes: #21185
Fixes: multiple integer overflows
Fixes: out of array access
The PoC modifies filter parameters generally inaccessable to an attacker
Found-by: Zhenpeng (Leo) Lin from depthfirst
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: multiple integer overflows
Fixes: out of array access
Regression since: a408d03ee6
The PoC modifies filter parameters generally inaccessable to an attacker
Found-by: Zhenpeng (Leo) Lin from depthfirst
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This issue hid under the radar since the codebooks between coupling
modes very often result in identical bit counts regardless of the encoded
data, leading to no frame-level bitstream desyncs except in rare cases.
AAC Mps212 data is parsed immediately after the SBR data, where a loss
of sync in SBR will result in Mps212 being wildly different.
Move motion estimation precision check from standalone
`d3d12va_encode_init_motion_estimation_precision()` function into each
codec's init_sequence_params() to reuse existing feature support
queries.
- fixes AV1 using wrong support structure (SUPPORT instead of SUPPORT1)
- eliminates duplicate setup code
- removes redundant CheckFeatureSupport API call
- no intended functional changes other than bug fix
The spec says:
pMiColStarts is a pointer to an array of TileCols number
of unsigned integers that corresponds to MiColStarts
defined in section 6.8.14 of the [AV1 Specification]
And the unit of MiColStarts is MI(mode info).
So is pMiRowStarts.
In early code, the BUNDLE always has two stream
id "a=group:BUNDLE 0 1" even though there's only
one stream.
This patch aims to dynamically set BUNDLE value.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
The ICE handshake actually finished after recieve
and handle binding request from peer when the peer
is not ice lite mode.
Log the ice status when it really done.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
Files with multiple consecutive ID3v2 tags were getting low probe
scores (1) because av_probe_input_format3 only skipped the first tag,
leaving subsequent tags to be treated as non-audio data.
Fix by looping to skip all consecutive ID3v2 tags before probing.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Exporting unknown layouts as unspec type is pointless in a format that expects
the user to remix the channels in location specific ways.
This simplifies assumptions and reduces the chances of heap buffer overflows.
Fixes: heap-buffer-overflow
Fixes: clusterfuzz-testcase-minimized-ffmpeg_dem_IAMF_fuzzer-6363647720095744
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: James Almer <jamrial@gmail.com>
The limiter_buf was being allocated with buf_size (3000ms worth of
samples) instead of limiter_buf_size (210ms worth of samples).
This resulted in allocating approximately 14x more memory than needed
for the limiter buffer. At 192kHz stereo:
- buf_size: 1,152,000 samples = 9.2 MB
- limiter_buf_size: 80,640 samples = 0.6 MB
- Wasted: ~8.2 MB per filter instance
The bug was introduced when the limiter buffer was added but the
wrong size variable was used in av_malloc_array().
Add support for standard -pass and -passlogfile options, matching the behavior
of libx264.
Add the -x265-stats option to specify the stats filename.
Update documentation.
Signed-off-by: Werner Robitza <werner.robitza@gmail.com>
Previously we used the codec or at the time of decoding fb_priv for this, but
fb_priv can be nonzero even if an external frame buffer is not set, so it's
cleaner to use the capability flag directly.
Also check the result of vpx_codec_set_frame_buffer_functions.
Signed-off-by: Marton Balint <cus@passwd.hu>
It is possible that the error happens with the alpha encoder, not the normal
one, so let's always pass the affected encoder to the logging function.
Signed-off-by: Marton Balint <cus@passwd.hu>
The main reason this was written was due to Nvidia. Nvidia always
has a fickle upload path, and seemed to have a shortcut for the
host image upload path. This seems to have been patched out of
recent driver versions.
This upload path relies on the driver keeping the same layout,
down to the stride for the images. Which is an assumption that's
not portable.
Rather than relying on this fickle upload path, what we'd like when
we want pure bandwidth is to decouple uploads to a separate queue,
and let the GPU pull the data from RAM via uploads.
It'll be slower with a single-threaded decoder, but currently all
of our compute-based decoders and the decoders that sit underneath
them support frame threading.
The buffer is prepared for avio, so we need to free
the buffer at the end when avio_alloc_context() fails,
otherwise, it will cause memleak.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
When prescale is enabled, time fields are converted to the output
timebase before expression evaluation. This allows option specification
even if the input timebase is unknown.
The setts bsf has an option to change TB. However the filter only
changed the TB and did not rescale the ts and duration, so it
effectively and silently stretched or squeezed the stream.
The pts, dts and duration are now rescaled to maintain temporal fidelity.
This patch adds screen capture support via AMF vsrc_amf:
ffmpeg -y \
-filter_complex "vsrc_amf=framerate=120:capture_mode=keep_framerate" \
-c:v hevc_amf \
-frames:v 300 \
output_grab.mp4
If the HW frames pool is insufficient,
increase extra_hw_frames in filter_complex.
This avoids undefined behavior when -fshort-enums is enabled, where accessing enum arrays through int pointers causes incorrect memory reads.
Signed-off-by: niyinghao <niyinghao@xiaomi.com>
ff_mlp_restart_checksum() used the (undocumented) layout
of the CRC tables and therefore broke on x86 when the
clmul implementation added in dc03cffe9c
is used. This commit fixes this (and issue #21506).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The fd_set need <sys/select.h> on linux, but it's
not included in os_support.h, it's included in os_support.c
So this patch use poll() to replace select() to avoid
this issue and simplify the code.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
The fd_set need <sys/select.h> on linux, but it's
not included in os_support.h, it's included in os_support.c
So this patch use poll() to replace select() to avoid
this issue and simplify the code.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
fix a simple index bug in ff_aac_usac_reset_state()
that writes past the end of ChannelElement.ch[2] for CPE
ff_aac_usac_reset_state() loops over channels with j < ch, but
incorrectly takes &che->ch[ch]. For CPE (ch == 2) this becomes
che->ch[2], which is one past the end of ChannelElement.ch[2], and the
subsequent memset() causes an intra-object out-of-bounds write.
index the channel element with the loop variable (j).
The extradata allocated in mov_write_single_packet() for AV1 was missing
the required AV_INPUT_BUFFER_PADDING_SIZE padding bytes. This could lead
to out-of-bounds reads when the extradata is parsed by bitstream readers.
Replace av_memdup() with av_malloc() + memset() + memcpy() to ensure
proper padding is present and zeroed.
Reproduced with:
./ffmpeg -y -f lavfi -i "testsrc=duration=1:size=320x240:rate=30" -c:v libaom-av1 -cpu-used 8 -crf 50 test-av1.mp4
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Extract the Sample Aspect Ratio (SAR) from render_width_minus_1 and
render_height_minus_1 in the sequence header.
The AV1 specification defines the render dimensions, which can be used
in conjunction with the coded dimensions to determine the pixel aspect
ratio. This ensures consistent aspect ratio handling for AV1 streams
encapsulated in containers like MP4 or MKV, as observed in the updated
FATE tests where SAR changes from 0/1 to 1/1.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Its rarely respected by implementations, its fairly new (1 year old),
and it has a scuffed define (neither glslc nor glslang enable the
"GL_EXT_nontemporal_keyword" define if its enabled, unlike all other extensions).
We use INT32_MIN/MAX so as to ensure we dont have to deal with 64bit width or height
on a int is int64 system. int64 width would overflow in a system where we assume the product of 2
values fit in int64
Fixes: #YWH-PGM40646-14
Found-by: An0n99X
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Removed in f2affdfafb by mistake.
Also, fix the logic in probe_glslc() so it doesn't return false when debug is disabled.
Signed-off-by: James Almer <jamrial@gmail.com>
This commit marks unused parameters in get_vlc_multi() to fix
-Wunused-parameter warnings when building with -Wall/-Wextra.
No functional changes intended.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This commit implements ROI (Region of Interest) encoding support for D3D12VA hardware encoders, enabling spatially-adaptive quality control for H.264, HEVC, and AV1 encoders.
Query for `D3D12_VIDEO_ENCODER_RATE_CONTROL_FLAG_ENABLE_DELTA_QP` support during initialization to check whether the hardware support delta QP. If delta QP is supported, then process `AV_FRAME_DATA_REGIONS_OF_INTEREST` side data and generate delta QP maps for each frame.
Sample command line:
ffmpeg.exe -hwaccel d3d12va -hwaccel_output_format d3d12 -i input.mp4 -vf addroi=x=480:y=270:w=960:h=540:qoffset=-1/5 -c:v hevc_d3d12va output.mp4
Update execute_model_th to support asynchronous execution by pushing tasks to the pending queue and notifying the worker thread.
Signed-off-by: Raja Rathour <imraja729@gmail.com>
Reviewed-by: Wenbin Chen <wenbin.chen@intel.com>
Reviewed-by: Guo Yejun <yejun.guo@intel.com>
While the inclusion of the bsf is forced by a configure check, it's better to not
just have the code here assume filter will never be NULL.
Fixes Coverity issue CID 1680305.
Signed-off-by: James Almer <jamrial@gmail.com>
Previous code truncated chapter numbers more than 999 and and time offsets more
than 99 hours.
The Vorbis comment format only allows 1000 chapters, so cap the number of
written chapters there.
This also fixes the GCC warnings for truncated strings.
Signed-off-by: Marton Balint <cus@passwd.hu>
And free them once they are guaranteed to be no longer needed, instead of freeing them
when returned with an enhanced output.
Signed-off-by: James Almer <jamrial@gmail.com>
This way we can ensure a frame reference will always exists for as long as the
external library needs the base picture.
Signed-off-by: James Almer <jamrial@gmail.com>
The faiure handling code is C and requires correct stack, global and
thread pointers. This restores them before returning to C. At the same
time, we no longer need to abort() afterwards.
Get and set remote addr when dtls server mode.
(Refer to url_bio_bread in tls_openssl.c)
add tls_handshake function to handle the dtls
or tls handshake
TODO:
add gnutls_pull_timeout function to make dtls
handshake really work.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
Ensure the allocated AVCodecContext is properly freed if avcodec_parameters_to_context fails.
Signed-off-by: 0xBat <monsterbat02@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Validate return value of av_malloc for dynamic_setting to avoid null pointer dereference.
Signed-off-by: 0xBat <monsterbat02@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Properly free decoder_ctx on failure to prevent a memory leak during initialization.
Signed-off-by: 0xBat <monsterbat02@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Implement NEON optimization for compute_weights_line.
Also update the function signature to use ptrdiff_t for stack arguments
(max_meaningful_diff, startx, endx). This is done to unify the stack
layout between Apple platforms (which pack 32-bit stack arguments tightly)
and the generic AAPCS64 ABI (which requires 8-byte stack slots for 32-bit
arguments). Using ptrdiff_t ensures 8-byte slots are used on all AArch64
platforms, avoiding ABI mismatches with the assembly implementation.
The x86 AVX2 prototype is updated to match the new signature.
Performance benchmark (AArch64) in MacOS M4:
./tests/checkasm/checkasm --test=vf_nlmeans --bench
compute_weights_line_c: 151.1 ( 1.00x)
compute_weights_line_neon: 62.6 ( 2.42x)
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Calculation of max_pts and limit_pts may overflow because adding
(int64_t + int64_t + float) results in a float that easily
overflows. This can trigger a very long av_usleep().
If an input packet results in several output packets, the side data will be
exported only by the first output packet, and be missing from the rest.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
Signed-off-by: James Almer <jamrial@gmail.com>
Many parsers will request data until they find what will be the start of the
next assembled packet in order to decide where to cut the current one. If this
happens, the loop in demux.c will, in case the demuxer exports already fully
assembled packets as is sometimes the case for MPEG-TS, discard the already
handled first input packet before it tries to move its side data to the output.
The affected FATE tests reflect this change by no longer dropping the side data
from the first input packet, nor exporting every other side data in the wrong
output packet.
Signed-off-by: James Almer <jamrial@gmail.com>
The APP parser can read a fixed number of bytes without checking len,
making len negative and passing it to bytestream2_skipu(), which takes
an unsigned size. This can advance the buffer by a huge amount and
results in undefined behavior.
Add small len guards in the fixed-size AVI1/LJIF paths and only skip
the tail if len > 0.
Signed-off-by: Ruikai Peng <ruikai@pwno.io>
Found via an unused-variable warning from GCC.
Also just forward the error from amf_init_frames_context().
This bug has been introduced in
4479d28103.
Reviewed-by: Jan Ekström <jeebjp@gmail.com>
Reviewed-by: Mikhail Mironov <mikhail.mironov@amd.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This check makes no sense, as the pointer arithmetic involved
in &fg->graph_print_buf would be UB if fg were NULL.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is its only user and because this function is so specialised
it is very likely to stay that way. So move it back to ffprobe.c
(where it already was before d7a3f68fea).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only two fields of AVTextFormatSection are ever modified:
entries_to_show and show_all_entries (they are only used
by ffprobe; the graph printing code always prints everything).
These fields do not belong into AVTextFormatSection, they are
more ffprobe-internal (and if the graph printing code ever
made use of them, these fields could very well be
per GraphPrintContext).
This commit therefore moves them out of AVTextFormatSection
and adds a callback to AVTextFormatContext to decide which
elements to discard. This also allows to make the AVTextFormatSections
const.
This also fixes a race when initializing the sections
for graphprint.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
sharpness, noise_reduction, mpeg_noise_reduction,
frame_rate_conversion, brightness and color are not actual fields in
L11 metadata, this appears to be based on perhaps an out of date
patent specification. Instead the two bytes are considered
reserved.
Drop reading them and write out 0s, the fields will be deprecated from
the struct in a future commit and scheduled for removal.
If extra IFDs occur outside of the top level, we don't allocate enough
buffer space for them. This commit changes our calculation to include
their size always, and then we shrink the allocated buffer once every
IFD is written (by at most 192 bytes).
Signed-off-by: Leo Izen <leo.izen@gmail.com>
blk_mv_type is only set for block 0-3, not for chroma blocks,
so it only needs four elements per MB.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
JPEG is mostly byte-aligned. We only really need GetBitContext while
decoding the image data.
Suggested-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Name the feature "arm_crc" rather than plain "crc", to make it
clear that this is about a CPU feature extension, not CRC
implementations in general.
This requires dealing with the extension slightly differently
than other extensions, as the name of the feature and the
".arch_extension" extension name differ.
Naming it with an "arm" prefix rather than "aarch64", as the
CPU extension also is available in 32 bit ARM form, even though
we don't intend to use it there.
This allows naming the ffmpeg wide feature with a different (more
elaborate) name than the raw cpu extension as it is spelled in
the ".arch_extension" directives.
We use a dummy aarch64 feature to work around an issue in older
Clang, where an .arch line such as ".arch armv8.2-a" doesn't take
effect immediately, while one like ".arch armv8.2-a+feature" works.
Previously, we used "crc" for this dummy feature to add (as an
old feature that would be supported widely by old toolchains).
But as we may want to actually use crc features and detect whether
they are supported, we may want to switch to another feature.
Use the "fp" feature instead, for the purposes of this extra
feature in the .arch lines. (The "fp" feature indicates floating
point support, which is implicitly part of the baseline feature
set anyway.)
Implemented the algorithm described in the paper titled
"Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
by Intel.
It is not used yet; the integration will be added in a separate commit.
Observed near 10x speedup on AMD Zen4 7950x:
av_crc_c: 22057.0 ( 1.00x)
av_crc_clmul: 2202.8 (10.01x)
This is in preparation for adding checkasm support for av_crc(),
which will always call the same function, but uses different CRC
tables to distinguish different implementations.
This reuses checkasm_check_func() for this; one could also add
a new function or use unions. This would allow to avoid casting
const away in the crc test to be added. It would also allow
to avoid converting function pointers to void* (which ISO C
does not allow).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The compiler was emitting a warning on every Cairo function replaced by
the `MOCK_FN_n` macros:
warning: 'cairo_...': redeclared without dllimport attribute after
being referenced with dll linkage
The macro `CAIRO_WIN32_STATIC_BUILD` prevents the attribute `dllimport` on
the declarations for these functions.
Signed-off-by: Ayose <ayosec@gmail.com>
Colors expressions (like `#RRGGBB`) can now be used as arguments for `setvar`
and `call`.
The trick of setting a variable with a `0xRRGGBBAA` value is not valid anymore.
Signed-off-by: Ayose <ayosec@gmail.com>
To be able to reuse colors from the original frame, the last value returned by
`p()` is tracked in the eval state, and if it is assigned to a variable, the
original color components are copied to `color_vars`. Thus, commands like
`setcolor` and `colorstop` can use those variables:
setvar pixel (p(0, 0))
...
setcolor pixel
`fate-filter-drawvg-video` now also verifies the `p()` function.
Signed-off-by: Ayose <ayosec@gmail.com>
The arguments for `setvar` and `call` commands can be colors (like `#rrggbb`).
This replaces the previous trick of using `0xRRGGBBAA` values to use colors as
procedure arguments.
The parser stores colors as the value expected by Cairo (a `double[4]`). This
array is allocated on the heap so the size of the union in `VGSArgument` is
not increased (i.e. it is still 8 bytes, instead of 32).
Signed-off-by: Ayose <ayosec@gmail.com>
In libcairo, colors are defined as 4 separate components, and each one is double
between 0 and 1. Before this commit, colors stored in variables (like `defhsla`)
were converted to a `0xRRGGBBAA` value, which introduced some issues due to
rounding errors.
Now, when a color is assigned to a variable, the original values (a `double[4]`)
are stored in a dedicated array (`color_vars`), so no conversion is needed.
This change also reduces the cost of reading a color from a variable (no need
for `av_be2ne32`, or the `color[i] / 255` operations).
Signed-off-by: Ayose <ayosec@gmail.com>
It only contains a single field, so add this directly to MPVContext
and remove private_ctx. This avoids an indirection in
ff_h261_loop_filter().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are only used by the decoder (which has them twice, because
the IntraX8Context contains a WMV2DSPContext whose put_mspel_pixels
functions were unused), so move them there.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
WMV1 and WMV2 use other scantables and therefore
ff_msmpeg4_common_init() reinitializes them. Yet WMV2
also uses a different IDCT overwriting the ordinary one,
so that the IDCT permutation changes and therefore
ff_wmv2_common_init() (called after ff_msmpeg4_common_init())
needs to reinitialize the scantables again.
Avoid this by calling ff_wmv2_common_init() in
ff_msmpeg4_common_init().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When a frame is exported to DRM, it may be written to to read to in an asyncronous fashion. Make sure, on unmap of a Vulkan frame that was mapped to DRM, to import any fences that were put on the dmabuf
Not entirely sure if it should instead use some entirely different
approach here, given that images exceeding 2GB don't seem that crazy
to me, but so far processing such images results in a heap overflow,
since the size addition overflows and a much too small packet is
allocated and its size never checked again when writing into it.
Fixes #YWH-PGM40646-32
Both FF_VK_EXT_VIDEO_ENCODE_QUEUE and FF_VK_EXT_VIDEO_MAINTENANCE_1 are
required, not only one of them.
Found by VVL.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Both FF_VK_EXT_VIDEO_ENCODE_QUEUE and FF_VK_EXT_VIDEO_MAINTENANCE_1 are
required, not only one of them.
Found by VVL.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Both FF_VK_EXT_VIDEO_ENCODE_QUEUE and FF_VK_EXT_VIDEO_MAINTENANCE_1 are
required, not only one of them.
Found by VVL.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
The temporary hw_frames_ctx created for header parsing was not being
fully released, leading to a leak. Explicitly unreference the temporary
context after use to resolve the leak.
Signed-off-by: hajin-chung <hajinchung1@gmail.com>
Maliciously constructed input EXIF blocks could use the extra IFDs we
set aside in noncontinugous ways, which will cause them to be written
as subdirectories of IFD0 rather than existing IFDs. The base tag size
is (correctly) excluded from the size calculation, but if they are
being written as additional tags, the allocation will be too small and
a write may overflow.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
If AVOptionArrayDef.def is NULL, av_opt_is_set_to_default() should return true
when the field in the object is NULL.
Signed-off-by: James Almer <jamrial@gmail.com>
With the limiting of video usages on the image views, we no longer need
a yuv sampler, since no multi-plane image will be created with the
SAMPLED usage bit.
Lone DPBs will only be requested when the implementation requires
a separate DPB. These images are never given out, and are never used for
anything other than a DPB. Also most implementations requiring these
won't support any other usages on DPBs, so limiting the usages would fix
some validation errors.
Prior to the fix, in the case of a tile containing multiple slices
(pps_num_exp_slices_in_tile != 0) the number of slices was temporarily
allowed to exceed pps_num_slices_in_pic_minus1+1 and therefore
VVC_MAX_SLICES. The number of slices was later verified, but while the
current slice index was higher than expected it was used to write to a
array of size VVC_MAX_SLICES, leading to an OOB write.
To rectify this, the patch adds some checks at an earlier stage, to
ensure that the slice index i + j at no point exceeds
pps_num_slices_in_pic_minus1.
Fixes #YWH-PGM40646-30
Firstly, mathops.h is a private header of libavcodec and should not be used
in fftools. It may reference libavcodec internal symbols, causing link
error with dynamic library, e.g.,
fftools/ffmpeg_filter.c:2687: undefined reference to `ff_rv_zbb_supported'
Secondly, mid_pred operates on int types, while current use case is
int64_t.
Previously, the pkt_size option only affected write buffering in the file
protocol. This change extends its effect to read mode as well.
On embedded systems with limited RAM, users can now reduce I/O buffer
memory by setting a smaller pkt_size.
Signed-off-by: caifan3 <caifan3@xiaomi.com>
The fields are defined as 8 bit long unsigned ints. Fortunately, writing most sane values
as leb is equivalent, which is why no tests are affected.
Signed-off-by: James Almer <jamrial@gmail.com>
The bug was caused by incorrect reading of the step_index field: the original
code read a 16-bit value (including a padding byte) instead of the correct
8-bit step_index as defined by the ADPCM format. This led to distorted audio
or incorrect step calculations when decoding specific ADPCM-encoded files.
Fix by:
1. Reading step_index as an 8-bit unsigned byte via bytestream2_get_byteu()
2. Skipping the subsequent 8-bit padding byte with bytestream2_skip()
The original fix was intended only for sequential mjpeg, but it was also used for progressive
which broke. This commit fixes this regression
Fixes: issue21225
The testcase 6381/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEGLS_fuzzer-5665032743419904 still exits within 240ms
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The doxygen comments were missed as these functions were updated
during review; they don't take separate pointer/length parameters
but use an AVBPrint struct now instead.
Also clarify that ff_make_codec_str doesn't log if the logctx
parameter is NULL.
Several files use values for primaries and transfer characteristics other than
BT709 and IEC_61966 respectively, while still being GBR.
Also always set color_range while at it. It's always coded in the bistream.
Fixes part of #21304.
Signed-off-by: James Almer <jamrial@gmail.com>
MSVC build was failing due to preprocessor directives being used within function-like macro argument lists, which is undefined behavior according to the C11 spec. Fixed by replacing the directive blocks with macro definitions outside of the macro invocations.
A layer could define a loudspeaker_layout that corresponds to a certain channel layout,
but then report an amount of substream_count and coupled_substream_count that might not
match the expected channel count.
Fixes: Invalid write of size 4
Fixes: clusterfuzz-testcase-minimized-ffmpeg_dem_IAMF_fuzzer-6687125346254848
Signed-off-by: James Almer <jamrial@gmail.com>
The semantics of sh_num_ref_idx_active_minus1[ i ] state that
When the current slice is a P slice, the value of NumRefIdxActive[ 0 ] shall be greater than 0.
When the current slice is a B slice, both NumRefIdxActive[ 0 ] and NumRefIdxActive[ 1 ] shall be greater than 0.
Fixes: use of uninitialized memory
Fixes: 449549597/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VVC_fuzzer-5600497089445888
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reported-by: Michael Niedermayer michael@niedermayer.cc
The current logic implicitly pulled the new value range out of SwsComps using
ff_sws_apply_op_q(), but this was quite ill-formed and not very robust. In
particular, it only worked because of the implicit assumption that the value
range was always set to 0b1111...111.
This actually poses a serious problem for 32-bit packed formats, whose
value range actually does not fit into AVRational. In the past, it only
worked because the value would implicitly overflow to -1, which SWS_OP_UNPACK
would then correctly extract the bits out from again.
In general, it's cleaner (and sufficient) to just explicitly reset the value
range on SWS_OP_UNPACK again.
The current behavior of assuming the value range implicitly on SWS_OP_READ
has a number of serious drawbacks and shortcomings:
- It ignored the effects of SWS_OP_RSHIFT, such as for p010 and related
MSB-aligned formats. (This is actually a bug)
- It adds a needless dependency on the "purely informative" src/dst fields
inside SwsOpList.
- It is difficult to reason about when acted upon by SWS_OP_SWAP_BYTES, and
the existing hack of simply ignoring SWAP_BYTES on the value range is not
a very good solution here.
Instead, we need a more principled way for the op list generating code
to communicate extra metadata about the operations read to the optimizer.
I think the simplest way of doing this is to allow the SwsComps field attached
to SWS_OP_READ to carry additional, user-provided information about the values
read.
This requires changing ff_sws_op_list_update_comps() slightly to not completely
overwrite SwsComps on SWS_OP_READ, but instead merge the implicit information
with the explictly provided one.
I think this is ultimately a better home, since the semantics of this are
not really tied to optimization itself; and because I want to make it an
explicitly suported part of the user-facing API (rather than just an
internal-use field).
The secondary motivating reason here is that I intend to use internal
helpers of `ops.c` inside the next commit. (Though this is a weak reason
on its own, and not sufficient to justify this move by itself.)
Fixes: call to function handle_rendition_args through pointer to incorrect function type 'void (*)(void *, const char *, int, char **, int *)'
Fixes: 464965411/clusterfuzz-testcase-minimized-ffmpeg_dem_HLS_fuzzer-4790164406992896
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fix#21260
The early code didn't reset the JxlDecoder. If the
caller re-send the same packet to JxlDecoder after
avcodec_flush_buffers() (free ctx->frame). It will
directly hit SUCCESS event (skip all EVENTs).
In this case, the ctx->frame already flushed by
avcodec_flush_buffers, so the received frame
will be invalid (has no private_ref) that causes
assertion in decode_receive_frame_internal().
MPV meet this issue because it seek the jxl image
(second read the same pkt) when it does crop
through config `C vf toggle crop=in_w:in_w/2.4`.
This patch make sure hit the entire SUCCESS event
(contains reset the JxlDecoder) when frame_complete
is true.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
They require no special video decode extension. The push descriptor
extension was a placeholder for when the check was performed inside
the init function.
Fixes: signed integer overflow: -1159988356 + -1082982400 cannot be represented in type 'int'
Fixes: 461519938/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-4777382021234688
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Some tools like v4l2-ctl dump data without skip padding. If the
padding size is not an integer multiple of the number of pixel
bytes, we cannot handle it by using a larger width, e.g.,, for
RGB24 image with 32 bytes padding in each line.
This function was assuming that the bits are MSB-aligned, but they are
LSB-aligned in both practice (and in the actual backend).
Also update the documentation of SwsPackOp to make this clearer.
Fixes an incorrect omission of a clamp after decoding e.g. rgb4, since
the max value range was incorrectly determined as 0 as a result of unpacking
the MSB bits instead of the LSB bits:
bgr4 -> gray:
[ u8 XXXX -> +XXX] SWS_OP_READ : 1 elem(s) packed >> 1
[ u8 .XXX -> +++X] SWS_OP_UNPACK : {1 2 1 0}
[ u8 ...X -> +++X] SWS_OP_SWIZZLE : 2103
[ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32
[f32 ...X -> .++X] SWS_OP_LINEAR : dot3 [...]
[f32 .XXX -> .++X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5}
+ [f32 .XXX -> .++X] SWS_OP_MIN : x <= {255 _ _ _}
[f32 .XXX -> +++X] SWS_OP_CONVERT : f32 -> u8
[ u8 .XXX -> +++X] SWS_OP_WRITE : 1 elem(s) planar >> 0
(X = unused, + = exact, 0 = zero)
The function violated the ABI requirement not to write below SP
(this breaks asynchronous signal handling). On RV32, it also broke
did not align SP to 16 bytes and did not restore it correctly.
No changes to benchmarks as this patch only changes a few immediate
offsets.
Fix several AMF-related issues.
Check the return value of amf_init_frames_context() correctly in amfdec,
as it returns int rather than AMF_RESULT.
Handle possible NULL surfaces returned from QueryInterface() in
vf_amf_common to avoid passing invalid data to amf_amfsurface_to_avframe().
Remove FILTER_SINGLE_PIXFMT from vf_sr_amf since it must not be used
together with a query formats function.
aom_codec_control() takes control id as int. It could be AV1E_ or common
AV1_ enum in encoder, and AV1D_ for decoder.
While upstream provides AOM_CODEC_CONTROL_TYPECHECKED() macro to check
the provided enum value, we wrap those calls in codecctl_ functions,
which makes it not feasible to use.
To avoid complicating this needlessly, just use int.
Fixes: warning: implicit conversion from enumeration type 'enum aom_com_control_id' to different enumeration type 'enum aome_enc_control_id'
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
By default, the D3D12 video encoder uses MAXIMUM, which means no restriction—it uses the highest precision supported by the driver.
Applications may want to reduce precision to improve speed or reduce power consumption. This requires the encoder to support user-defined motion estimation precision modes.
D3D12_VIDEO_ENCODER_MOTION_ESTIMATION_PRECISION_MODE defines several precision modes:
maximum: No restriction, uses the maximum precision supported by the driver.
full_pixel: Allows only full-pixel precision.
half_pixel: Allows half-pixel precision.
quarter-pixel: Allows quarter-pixel precision.
eighth-pixel: Allows eighth-pixel precision (introduced in Windows 11).
Sample Command Line:
ffmpeg -hwaccel d3d12va -hwaccel_output_format d3d12 -extra_hw_frames 20 -i input.mp4 -an -c:v h264_d3d12va -me_precision half_pixel out.mp4
Most EXIF metadata is in IFD0 and most EXIF payloads only contain
one IFD, but it is possible for there to be more IFDs after the
existing trailing one. exiftool and similar software report these IFDs
as IFD1, IFD2, etc. This commit reads those additional IFDs and attaches
them as dummy entries in the top-level IFD ranging from 0xFFFC down to
0xFFED, which are unused by the EXIF spec. The EXIF API is only able to
return and work with a single IFD, so by attaching it as a subdirectory
this metadata can be preserved.
This is done transparently through the read/write process. Upon parsing
an additional IFD1, it will be attached, but it will be written with
av_exif_write after IFD0 rather than as a subdirectory, as intended.
Existing files without more than one IFD, i.e. most files, will be unaffected
by this change, as well as API clients looking to parse specific fields, but
now more metadata is parsed and written, rather than simply being discarded
as trailing data.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Before this commit, exif_parse_ifd_list didn't free *ifd upon failure,
relying on the caller to do so instead. We only guarded some of the
calls against this function, not all of them, so sometimes it leaked.
This commit fixes this, so exif_parse_ifd_list freeds *ifd upon failure
so callers do not have to guard its invocation with a free wrapper.
Fixes: ossfuzz 440747118: Integer-overflow in av_strerror
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Instead of blindly interleaving re-ordering and minimizing optimizations,
separate this loop into several passes - the first pass will minimize the
operation list in-place as much as possible, and the second pass will apply any
desired re-orderings. (We also want to try pushing clear back before any other
re-orderings, as this can trigger more phase 1 optimizations)
This restructuring leads to significantly more predictable and stable behavior,
especially when introducing more operation types going forwards. Does not
actually affect the current results, but matters with some upcoming changes
I have planned.
For AVX2, movdqu is as fast as movdqa when used on aligned addresses,
so don't instantiate aligned/unaligned versions.
(The check was btw overtly strict: The AVX2 code only uses 16 byte
stores, so it would be enough for dst to be 16-byte aligned.)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These functions are only used on Conroe (they are overwritten
by SSSE3 functions using xmm registers if the SSSE3SLOW is not set)
which is very old (introduced in 2006), so remove them.
Btw: The checkasm test (which uses declare_func and not
declare_func_emms since cd8a33bcce)
would fail on a Conroe, yet no one ever reported any such failure.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This avoids needing to use the extra_conf variable. That variable
is problematic for setting a value that contains spaces.
This adds options for another tool in the same fashion as other
tools were added in 523d688c2b.
Busybox-w32 uses regular Windows style paths with drive letters,
but with forward slashes; thus an absolute path starts with "c:/".
Make the target_path() function in fate-run.sh (which converts a
potentially relative path to an absolute one, under the target_path
prefix) handle this case.
With this in place, running fate tests almost works in
busybox-w32 - only one issue remains. A patch [1] has been sent to
upstream busybox for fixing that issue (which also is present if
running fate tests on busybox on Linux), but it hasn't been
responded to yet.
[1] https://lists.busybox.net/pipermail/busybox/2025-December/091851.html
Busybox-w32 [1] works for building ffmpeg on Windows (as an
alternative to msys2, cygwin or WSL).
On busybox-w32, "uname" returns "Windows_NT"; recognize this
in exesuf() as having an .exe suffix.
If building in this environment with a mingw toolchain, one has
to explicitly set --target-os=mingw32. (We probably don't
want to imply that this uname, set as target_os_default, would
default to mingw?) But despite what is set with --target-os,
one can't override the configure variable "host_os", which
exesuf() has to recognize.
[1] https://github.com/rmyorston/busybox-w32
If Zbb is enabled at compilation (e.g. Ubuntu), the compiler should
compile the new C mid_pred() function correctly. But if Zbb is *not*
enabled (e.g. Debian), then we can at least fallback at run-time.
On SiFive-U74, before:
sub_median_pred_c: 1331.9 ( 1.00x)
sub_median_pred_rvb_b: 881.8 ( 1.51x)
After:
sub_median_pred_c: 1133.1 ( 1.00x)
sub_median_pred_rvb_b: 875.7 ( 1.29x)
This reduces the minimum instruction emission for mid_pred()
(i.e. median of 3) down to:
- 3 comparisons and 4 conditional moves, or
- 4 min/max.
With that the compiler can eliminate any branch. This optimal
situation is attainable with Clang 21 on Arm64, RVA22 and x86,
with GCC 15 on Arm64 and x86 (RVA22 goes from 2 to 1 branch).
These optimisations also work on Arm32 and LoongArch.
The same algorithm is already implemented via inline assembler for some
architectures such as x86 and Arm32, but notably not Arm64 and RVA22.
Besides, using C code allows the compiler to schedule instruction
properly.
Even on architectures with neither conditional moves nor min/max, this
leads to a visible performance improvement for C code, as seen here for
RVA20 code running on SiFive-U74:
Before:
sub_median_pred_c: 1657.5 ( 1.00x)
sub_median_pred_rvb_b: 875.9 ( 1.89x)
After:
sub_median_pred_c: 1331.9 ( 1.00x)
sub_median_pred_rvb_b: 881.8 ( 1.51x)
Note that this commit leaves the x86 and Arm32 code intact so it has
no effects on those ISA's.
When building DLLs on ARM64EC, the default use of `dumpbin
-linkermember:1` fails because ARM64EC static libraries use a
different linker member format. Use `-linkermember:32` for ARM64EC
to correctly extract symbols.
Additionally, MSVC inserts $exit_thunk and $entry_thunk symbols
for ARM64EC to handle x64 ↔ ARM64 transitions. These are internal
thunks and must not be exported. Filter them out when generating
the .def file to avoid unresolved symbols or invalid exports.
Trim out the leading '#' on ARM64EC function symbols. This is only
relevant on ARM64EC, but it is benign to do that filtering on
all architectures (such symbols aren't expected on other
architectures).
Simplify the sed command by removing the symbol address with a
sed expression instead of a later "cut" command.
This ensures correct symbol extraction and stable DLL generation
on ARM64EC targets, while keeping behavior unchanged for other
Windows architectures.
Explicitly spell it out that we are not going to modify the
individual libraries for the purposes of improving conformance
to ARM64EC.
We may (or may not) accept build system patches for making such
a build succeed, provided that it does not require changes to
the actual library code.
Since af97c9865f,
the return value of avio_read() has been compared against
an uint32_t, so that the int is promoted to uint32_t for
the comparison (on common systems with 32bit ints). The upshot was
that errors returned from avio_read() were ignored, so that
the buffer could be uninitialized on success.
Fix this by using ffio_read_size() instead.
Fixes: MemorySanitizer: use-of-uninitialized-value
Fixes: 443923343/clusterfuzz-testcase-minimized-ffmpeg_dem_FLAC_fuzzer-5458132865449984
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
read_frame_internal() may result in a stream being modified without also
returning a packet from it. Given said function only bothered to update the
internal stream context for the returned packet, the result would be a desync
between the stream's AVCodecParameters and the internal AVCodecContext.
This change makes sure all streams are updated within the
avformat_find_stream_info() loop.
Fixes #YWH-PGM40646-20
Signed-off-by: James Almer <jamrial@gmail.com>
vp8 encoder can be configured to drop frames, when e.g. bitrate
overshoot is detected. At present the code responsible for
managing an internal fifo assumes that we will get an output frame per
each frame fed into encoder. That is not the case if the encoder can
decide to drop frames.
Running:
ffmpeg -stream_loop 100 -i dash_video3.webm -c:v libvpx -b:v 50k
-drop-threshold 20 -screen-content-mode 2 output.webm
results in lots of warnings like:
[libvpx @ 0x563fd8aba100] Mismatching timestamps: libvpx 2187 queued
2185; this is a bug, please report it
[libvpx @ 0x563fd8aba100] Mismatching timestamps: libvpx 2189 queued
2186; this is a bug, please report it
followed by:
[vost#0:0/libvpx @ 0x563fd8ab9b40] [enc:libvpx @ 0x563fd8aba080] Error
submitting video frame to the encoder
[vost#0:0/libvpx @ 0x563fd8ab9b40] [enc:libvpx @ 0x563fd8aba080] Error
encoding a frame: No space left on device
[vost#0:0/libvpx @ 0x563fd8ab9b40] Task finished with error code: -28
(No space left on device)
[vost#0:0/libvpx @ 0x563fd8ab9b40] Terminating thread with return code
-28 (No space left on device)
The reason for the above error is that each dropped frame leaves an
extra item in the fifo, which eventually overflows.
The proposed fix is to keep popping elements from the fifo until the
one with the matching pts is found. A side effect of this change is that
the code no longer considers pts mismatch to be a bug.
This has likely regressed around 5bda4ec6c3
when fifo started to be universally used.
Signed-off-by: Dariusz Marcinkiewicz <darekm@google.com>
Use the concat protocol, to test the parser's capabilities to differentiate between
EOC maker before SOC marker, on top of false EOC marker positives and EOC maker on EOF.
Signed-off-by: James Almer <jamrial@gmail.com>
JPEG-XS streams can have the bytes corresponding to certain markers as part of
slice data, and no considerations were made for it, so we need to add checks
for false positives.
This fixes assembling several samples.
Signed-off-by: James Almer <jamrial@gmail.com>
The code assumed that the destination buffer was zeroed, a misbehaviour
with which checkasm is bug-compatible as it zeroes the destination
buffer. The fixed code is even faster:
SpacemiT X60:
sub_left_predict_c: 51792.5 ( 1.00x)
sub_left_predict_rvv_i32: 3504.4 (14.78x)
This requires updating the order of the dither matrix offsets as well. In
particular, this can only be done cleanly if the dither matrix offsets are
compatible between duplicated channels; but this should be guaranteed after
the previous commits.
As a side note, this also fixes a bug where we pushed SWS_OP_SWIZZLE past
SWS_OP_DITHER even for very low bit depth output (e.g. rgb4), which led to
a big loss in accuracy due to loss of per-channel dither noise.
Improves loss of e.g. gray -> rgb444 from 0.00358659 to 0.00261414,
now beating legacy swscale in these cases as well.
Otherwise, this is invalid; as the result of applying the next operation
after channel duplication may not commute with the result of duplicating
the result of applying the next operation on only one channel.
In practice, this was using the last seen channel's coefficients.
Note that this did not actually affect anything in practice, because the only
relevant ops (MIN/MAX) were always generated with identical coefficients for
identical channel ranges.
However, it will matter moving forwards, as e.g. dither ops may not be
commuted freely if their matrix offsets differ per channel.
The current usage is ambiguous between "affects each component equally" and
"affects each component independently" - and arguably, the current behavior
was a bug (since SWS_OP_DITHER should not commute with a SWIZZLE, at least
from a bit-exactness PoV).
However, when trying to define cleaner replacements for these concepts, I
realized there are too many special cases; and given that we only have two
use sites, I decided to just split them directly into "commute" functions
for those particular usage cases.
As an added benefit, this moves the commutation logic out of the already-long
ff_sws_ops_list_optimize().
On the surface, this trades a tiny bit of PSNR for not introducing chroma
noise into grayscale images. However, the main reason for this change is
actually motivated by a desire to avoid regressing the status quo of
duplicating swizzles being able to be commuted past dither ops.
Instead of computing y + N with a hard-coded index offset, calculate the
relative offset as a 16-bit integer in C and add that to the pointer directly.
Since we no longer mask the resulting combined address, this may result in
overread, but that's fine since we over-provisioned the array in the previous
commit.
I want to change the way offsets are calculated inside the dither asm;
and the cleanest way to solve that problem is to just over-allocate the
entire dither matrix based on the maximum offset range we expected.
This is an exceptionally unlikely (in fact, currently impossible) case to
actually hit, and not worth micro-optimizing for. More specifically, having
this special case prevents me from easily adding per-row offsets.
This check was wrong; 1 << 0 = 1. The intent was to skip allocating a 1x1
matrix by assuming it is a constant 0.5. But as written, the check never
actually executed.
This did not affect the runtime performance, nor did it leak memory; but it
did mean we didn't hit the intended `assert`.
Since we only need 8 bytes to store the dither matrix pointer, we actually
still have 8 bytes left-over. That means we could either store the 8-bit
row offset directly, or alternatively compute a 16-bit pointer offsets.
I have chosen to do the former for the C backend, in the interest of
simplicity.
The one downside of this approach is that it would fail on hypothetical
128-bit platforms; although I seriously hope that this code does not live
long enough to see the need for 128-bit addressable memory.
To improve decorrelation between components, we offset the dither matrix
slightly for each component. This is currently done by adding a hard-coded
offset of {0, 3, 2, 5} to each of the four components, respectively.
However, this represents a serious challenge when re-ordering SwsDitherOp
past a swizzle, or when splitting an SwsOpList into multiple sub-operations
(e.g. for decoupling luma from subsampled chroma when they are independent).
To fix this on a fundamental level, we have to keep track of the offset per
channel as part of the SwsDitherOp metadata, and respect those values at
runtime.
This commit merely adds the metadata; the update to the underlying backends
will come in a follow-up commit. The FATE change is merely due to the
added offsets in the op list print-out.
The shader needs ~3 loads per DCT coeff.
This data was not observed to get efficiently stored
in the upper cached levels, loading it explicitely in
shared memory fixes that.
Also reduce code size by moving the bitstream
initialization outside of the switch/case.
In commit 6e0034ab7e, the image crop adjustment
was moved after the fitting logic. However, this moved the adjustment inside the
`if (src == ref)` branch, thus missing applying the same un-rotation to input
frames that are *not* the reference frame.
Fix this by pulling the logic back out of the branch again. While we could just
move it after the fitting logic, I think it's more clear to the intent of the
code to just preserve the (rotated) crop rect as a separate variable
`crop_orig`.
Fixes: 6e0034ab7e
(Like the old code, the new code limits the number of threads to 64,
even when the user explicitly set a higher thread count. I don't know
whether it is intentional to apply this limit even when the user
explicitly supplied the number of threads.)
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The pixel format has already been checked generically.
This also fixes the bug that the earlier code ignored
the return value of set_pix_fmt().
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Regression since: 536475ea05.
The JPEG-XS PES path trusted header_size from the payload and advanced
pkt->data/pkt->size without validation, so the trailing memset could
write out of bounds when header_size > pkt->size. Reject such packets,
marking them corrupt and returning an error to avoid the OOB write.
Repro (ASan):
ASAN_OPTIONS=halt_on_error=1:detect_leaks=0 ./ffmpeg -v debug -nostdin -i poc-jpegxs.ts -copy_unknown -map 0 -c copy -f null /dev/null
Crash in new_pes_packet memset on crafted TS with stream_id 0xbd,
stream_type 0x32, header_size 0xFFFFFF00, payload starting with jxes.
Found-by: Pwno
The format does not contain audio timestamps and the calculated audio pts
values were only correct for compressed audio. It is better to remove PTS
calculation entirely and let the generic code handle it.
Fixes ticket #8595.
Fixes#20983.
Fate test changes are because of the different audio time base which is now
always 1/sample_rate.
Signed-off-by: Marton Balint <cus@passwd.hu>
sub_median_pred_mmxext() calculates a predictor from the left, top
and topleft pixel values. The topleft values need to be initialized
differently for the first loop initialization than for the others
in order to avoid reading ptr[-1]. So it has been initialized before
the loop and then read again at the end of the loop, so that the last
value read was never used. Yet this can lead to reads beyond the end
of the buffer, e.g. with
ffmpeg -cpuflags mmx+mmxext -f lavfi -i "color=size=64x4,format=yuv420p" \
-vf vflip -c:v ffvhuff -pred median -frames 1 -f null -
Fix this by not reading the value at the end of the loop.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
sub_median_pred_mmxext() calculates a predictor from the left, top
and topleft pixel values. The left value is simply read via
ptr[-1], although this is not guaranteed to be inside the buffer
in case of negative strides. This happens e.g. with
ffmpeg -i fate-suite/mpeg2/dvd_single_frame.vob -vf vflip \
-c:v magicyuv -pred median -f null -
Fix this by reading the first value like the topleft value.
Also change the documentation of sub_median_pred to reflect this
change (and the one from 791b5954bc).
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The issue is that .flush gets called asynchronously, and modifies the
video session state while its being used for decoding. This did not
result in issues since all known vendors do not keep important state
there, but its not compliant with the specs.
Its not necessary to flush the decoder at all when seeking,
so simply don't.
Fixes#20487
This reverts commit 8f48a62, 9af8782, and bd3e71b.
Commit 8f48a62 extends tx to 2M, resulting in the tx_float bss
section reaching a size of 4M.
This isn't a issue on devices with normal memory sizes and OS
supporting virtual memory. But it's a real issue for embedded devices
with realtime OS, which may not support virtual memory, e.g., Nuttx.
This 4M of bss section map to physical memory directly, which is a
scarce resource on embedded devices.
perm gets incremented in the loop in such a manner that
it already has the value it is set to here except for
the first loop iteration.
Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fix#21056
Refer to RFC 3267 Section 4.4.2:
A ToC entry takes the following format in octet-aligned mode:
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|F| FT |Q|P|P|
+-+-+-+-+-+-+-+-+
P bits: padding bits, MUST be set to zero.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
The DPX Vulkan unpack shader computes a word offset as
uint off = (line_off + pix_off >> 5);
Due to GLSL operator precedence this is evaluated as
line_off + (pix_off >> 5) rather than (line_off + pix_off) >> 5.
Since line_off is in bits while off is a 32-bit word index,
scanlines beyond y=0 use an inflated offset and the shader reads
past the end of the DPX slice buffer.
Parenthesize the expression so that the sum is shifted as intended:
uint off = (line_off + pix_off) >> 5;
This corrects the unpacked data and removes the CRC mismatch
observed between the software and Vulkan DPX decoders for
mispacked 12-bit DPX samples. The GPU OOB read itself is only
observable indirectly via this corruption since it occurs inside
the shader.
Repro on x86_64 with Vulkan/llvmpipe (531ce713a0):
./configure --cc=clang --disable-optimizations --disable-stripping \
--enable-debug=3 --disable-doc --disable-ffplay \
--enable-vulkan --enable-libshaderc \
--enable-hwaccel=dpx_vulkan \
--extra-cflags='-fsanitize=address -fno-omit-frame-pointer' \
--extra-ldflags='-fsanitize=address' && make
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.json
PoC: packed 12-bit DPX with the packing flag cleared so the unpack
shader runs (4x64 gbrp12le), e.g. poc12_packed0.dpx.
Software decode:
./ffmpeg -v error -i poc12_packed0.dpx -f framecrc -
-> 0, ..., 1536, 0x26cf81c2
Vulkan hwaccel decode:
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.json \
./ffmpeg -v error -init_hw_device vulkan \
-hwaccel vulkan -hwaccel_output_format vulkan \
-i poc12_packed0.dpx \
-vf hwdownload,format=gbrp12le -f framecrc -
-> 0, ..., 1536, 0x71e10a51
The only difference between the two runs is the Vulkan unpack
shader, and the stable CRC mismatch indicates that it is reading
past the intended DPX slice region.
Regression since: 531ce713a0
Found-by: Pwno
Regression since: 3ba570de8b (port from MMX to SSE2).
The SSE2 inline asm in libavfilter/x86/vf_noise.c (line_noise_sse2 and
line_noise_avg_sse2) uses aligned loads/stores (movdqa, movntdq) but never
checks pointer alignment. When the filter reuses an input frame (common
path when av_frame_is_writable() is true), it may receive misaligned data
from upstream filters that adjust frame->data[i] in place, notably vf_crop:
- vf_crop adjusts plane pointers by arbitrary byte offsets
(frame->data[plane] += ...), so an x offset of 1 on 8-bit formats produces
a 1‑byte misalignment.
- The noise filter then calls the SSE2 path directly on those pointers
without realigning or falling back.
Repro on x86_64/SSE2 (current HEAD at that commit):
./ffmpeg -v error -f lavfi -i testsrc=s=320x240:rate=1 \
-vf "format=yuv420p,crop=w=319:x=1:h=240:exact=1,noise=alls=50" \
-frames:v 1 -f null -
This crashes with SIGSEGV at the aligned load in line_noise_sse2 (movdqa
(%r9,%rax),%xmm0; effective address misaligned by 1 byte).
Impact: denial of service via crafted filtergraphs (e.g., crop + noise).
Applies to planar 8-bit formats where upstream filters can shift data
pointers without reallocating.
Found-by: Pwno OSS Team
Required for the following commit, where a parsing function expects the buffer
to include the country code bytes.
Signed-off-by: James Almer <jamrial@gmail.com>
Add a public API for producing RFC 4281/6381 codecs trings for
MIME types.
This can be required for providing alternative video files to
a web browser, letting the browser pick the best file it supports.
Such strings also allow querying a browser whether it supports
a certain codec combination.
Finally, if implementing a DASH/HLS segmenter outside of libavformat,
one also has to generate such strings.
Generating such strings for H264/AAC is very simple, but for
more modern codecs, it can require a lot of nontrivial codec
specific parsing of extradata.
As libavformat already implements this, expose it for users as well.
The old, internal function ff_make_codec_str is kept and used by
the HLS and DASH muxers; the old function takes a logging context
which can be used for logging auxillary info about how the string
generation worked out.
ff_vk_video_common_init() calls ff_vk_video_common_uninit() on failure
which leaves dangling object handles. Those get freed again when the
destructor of FFVulkanDecodeShared calls ff_vk_video_common_uninit()
again.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Older versions of Clang (Xcode 14, llvm.org Clang 13 and 14)
do support and recognize SME, but when enabled through
".arch_extension sme" it fails to transitively enable support
for Streaming SVE; this was fixed in [1].
This issue results in those versions currently detecting support
for SME, but later failing to build cpu_sme.s with errors like
"error: instruction requires: sve or sme" or "error: instruction
requires: streaming-sve or sve", on the "cntb x0" instruction.
Extend the check for this instruction set extension, to test
with two instructions, both specifically a SME instruction
(smstart) and an instruction that is available in Streaming SVE
mode (cntb).
For the configure check, add an extra parameter to
check_archext_insn for an optional second instruction to check.
It would be tempting to just pass both instructions through
the same parameter, as "smstart; cntb x0". However, Darwin
targets use a different token (%%) for starting a new
instruction on the same line - those targets interpret ";"
as the start of a comment. Due to that, such a check would
entirely ignore the second instruction on Darwin targets.
To avoid dealing with the variability in passing multiple
instructions on one line, just pass the optional second
instruction on a separate line.
[1] https://github.com/llvm/llvm-project/commit/ff3f3a54e2d1b05c36943bf88ae0be7475d622ed
After the full ffmpeg CLI multithreading changes went in, this
test started depending on how far the input side read and decoded
the input compared to how quickly the output encoded things, causing
spurious failures on the CI.
To my knowledge all of the failures have so far been valid correct
results, but unfortunately FATE's built in checks mostly consist of
whether there is a difference against an exact result.
This way we still get the CI and valgrind running of the code,
but stop its comparison. Reference file is left around so that
the previous reference is still available.
The header read check stored the comparison result into ret, so read
failures became ret=1 and were treated as success, leaving the VMD header
uninitialized and letting parsing continue with bogus state.
Regression since: ee623a43e3.
Found-by: Pwno
We already parse the EXIF side data to extract the orientation, so we
should add it to the output file as an EXIF box.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Before this commit, we ignore the display matrix side data if any EXIF
side data is present, even if that side data contains no orientation
tag. This allows us to calculate the orientation from the display
matrix sidedata first, if present. Ideally the decoder will have
removed the orientation tag upon decoding and attached the data as
display matrix side data instead, so this makes our orientation code
respect this behavior.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
_cflags_noopt was previously set to optimize for size as a workaround
for issues in MSVC builds without any optimizations. Specifically the
lack of DCE and constant merging. This is no longer needed, as those
issues have been fixed in our codebase.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
There is an edge case not covered by the current logic: If there is only
a single auto-filter inserted, but the auto-inserted filter is incompatible
with a *different* format attribute (after settling the previous formats),
we may need a second auto-filter (e.g. `scale`) to settle the newly introduced
incompatibility.
A regression test demonstrating the issue is added.
An ASAN heap-buffer-overflow in scalable_channel_layout_config was caused by an
unchecked assumption that the channel layout of a scalable audio layer is a
superset of the previous layer's channel layout.
scalable_channel_layout_config constructs a channel layout map by copying
channels from the previous layer and adding new ones. The memory allocation is
based on the target loudspeaker_layout. However, if the target layout doesn't
encompass all previous channels (e.g., Mono to Stereo), copying previous
channels followed by adding current ones could exceed the allocated size,
causing a heap buffer overflow.
This commit adds an exception for the know case of Mono -> Stereo, and a check
to ensure the previous layer's channel layout is a subset of the current
layer's layout by comparing their masks. If the condition isn't met,
an error is returned.
Fixes: https://issues.oss-fuzz.com/issues/464965414
Co-authored-by: Oliver Chang <ochang@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
These provide a way for apps that initialize Vulkan themselves to know
which extensions we may be able to use without having to hardcode it.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Unfortunately, this is exceptionally difficult to handle in the general case,
when packed/bitstream formats come into play - the actual interpretation of
the offset, shift etc. are so difficult to deal with in a general case that
I think it's simpler to continue falling back to a static table of variants
for these exceptions. They are fortunately small in number.
This is the only function that actually has the ability to return an
error, so just move the pixel type assignment here and add a check to
ensure a valid pixel type is found.
The last 3dnow functions have been removed in commit
5ef613bcb0, so don't test
it in checkasm.
(This will affect only one test, namely scalarproduct_and_madd_int16
from lossless_audiodsp: It does not use an SSSE3 function when
the 3dnow flag is set. So for old AMDs (which advertise support for
3dnow), said SSSE3 function is never tested. Now it will.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These provide a way for apps that initialize Vulkan themselves to know
which extensions we may be able to use without having to hardcode it.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Fixes: warning: 'sscanf' may overflow; destination buffer in argument 7 has size 32, but the corresponding specifier may require size 33 [-Wfortify-source]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This cannot really happen, but to suppress compiler warnings, we can
just return AVERROR_BUG here.
Fixes: warning: variable 'kernel_name' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Turns out these are not, in fact, purely informative - but the optimizer
can take them into account. This should be documented properly.
I tried to think of a way to avoid needing this in the optimizer, but any
way I could think of would require shoving this to SwsReadWriteOp, which I
am particularly unwilling to do.
Four rows of four bytes fit into one xmm register; therefore
one can arrange the rows as follows (A,B,C: first, second, third etc.
row)
xmm0: ABABABAB BCBCBCBC
xmm1: CDCDCDCD DEDEDEDE
xmm2: EFEFEFEF FGFGFGFG
xmm3: GHGHGHGH HIHIHIHI
and use four pmaddubsw to calculate two rows in parallel. The history
fits into four registers, making this possible even on 32bit systems.
Old benchmarks (Unix 64):
vp9_avg_8tap_smooth_4v_8bpp_c: 105.5 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3: 16.4 ( 6.44x)
vp9_put_8tap_smooth_4v_8bpp_c: 99.3 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3: 15.4 ( 6.44x)
New benchmarks (Unix 64):
vp9_avg_8tap_smooth_4v_8bpp_c: 105.0 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3: 11.8 ( 8.90x)
vp9_put_8tap_smooth_4v_8bpp_c: 99.7 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3: 10.7 ( 9.30x)
Old benchmarks (x86-32):
vp9_avg_8tap_smooth_4v_8bpp_c: 138.2 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3: 28.0 ( 4.93x)
vp9_put_8tap_smooth_4v_8bpp_c: 123.6 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3: 28.0 ( 4.41x)
New benchmarks (x86-32):
vp9_avg_8tap_smooth_4v_8bpp_c: 139.0 ( 1.00x)
vp9_avg_8tap_smooth_4v_8bpp_ssse3: 20.1 ( 6.92x)
vp9_put_8tap_smooth_4v_8bpp_c: 124.5 ( 1.00x)
vp9_put_8tap_smooth_4v_8bpp_ssse3: 19.9 ( 6.26x)
Loading the constants into registers did not turn out to be advantageous
here (not to mention Win64, where this would necessitate saving
and restoring ever more register); probably because there are only two
loop iterations.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Make it possible to run only a subset of the VP9 tests
in addition to all of them (via the vp9dsp test). This
reduces noise and speeds up testing.
FATE continues to use vp9dsp.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMXEXT functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This one uses the test framework added in the previous commit to add a
light-weight regression test to ensure the generated SwsOpsList is identical.
Only compare the md5sum, to make the reference file significantly smaller
(down from ~10 MB).
This is similar to swscale/tests/swscale.c, but significantly cheaper - it
merely prints the generated (optimized) operation list for every format
conversion.
Mostly useful for my own purposes as a regression test when making changes
to the ops optimizer. Note the distinction between this and tests/swscale.c,
the latter of which tests the result of *applying* an operation list for
equality.
There is an argument to be made that the two tests could be merged, but
I think the amount of overlap is small enough to not be worth the amount
of differences.
This code was previously broken; since YAF32BE/LE were not included as
part of the format enumeration. However, since we *always* know the correct
swizzle for YA formats, we can just special-case this by the number of
components instead.
Fixes a heap-buffer-overflow in `libavcodec/dpx.c` triggered by a stale
`unpadded_10bit` flag in the `DPXDecContext`. This flag, set for 10-bit
unpadded frames, persisted across `decode_frame` calls. If a subsequent
frame was 16-bit, the stale flag caused incorrect buffer size
validation, allowing truncated buffers to pass checks designed for
smaller 10-bit packed data. This led to an out-of-bounds read in
`av_image_copy_plane` during 16-bit decoding.
The fix explicitly resets `dpx->unpadded_10bit = 0` at the start of
`decode_frame` to ensure correct validation for each frame.
Fixes: https://issues.oss-fuzz.com/issues/464471792
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array read
Fixes: 464471792/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DPX_DEC_fuzzer-5275522210004992
The check if a native layout can be created from the sources was incomplete and
casued a crash with custom layouts if the layout contained a native channel
multiple times, as in this example command line:
ffmpeg -lavfi "sine[a0];sine,pan=FL+FL[a1];[a0][a1]amerge[aout]" -map "[aout]" -t 1 -f framecrc -
Signed-off-by: Marton Balint <cus@passwd.hu>
Add a "pkt_size" AVOption to the Unix domain socket protocol. This option
allows the user to specify the maximum packet size for packet-oriented
sockets (SOCK_DGRAM and SOCK_SEQPACKET).
Signed-off-by: caifan3 <caifan3@xiaomi.com>
This does not improve performance with current hardware due to the poor
performance of segmented accesses. Performance should be slightly better
with expensive or near-future hardware that I don't have, however it is
still limited by two other factors:
- There are only 4 elements.
- The final stores are necessarily indexed and hit multiple cache lines,
thus as slow as scalar.
SpacemiT X60:
dct_unquantize_h263_inter_c: 417.8 ( 1.00x)
dct_unquantize_h263_inter_rvv_i32: 66.0 ( 6.33x)
dct_unquantize_h263_intra_c: 140.2 ( 1.00x)
dct_unquantize_h263_intra_rvv_i32: 67.7 ( 2.07x)
Note that the C benchmarks are not stable, depending heavily on the
number of coefficients picked by the RNG. The R-V V benchmarks are
however very stable and generally better than C's.
The VK spec forbids using clear commands on YUV images,
so we need to allocate separate per-plane images.
This removes the need for a separate reset shader.
It's a name that communicates its functionality in a better way.
Since the function was introduced very recently, we can safely rename it.
Signed-off-by: James Almer <jamrial@gmail.com>
In an ideal world, a filtergraph like `-i A -i B ... -filter_complex concat`
would only keep resources in memory related to the file currently being output
by the concat filter. So ideally, we'd open and fully decode A, then open and
fully decode B, and so on.
Practically, however, fftools wants to get one frame from each input file in
order to initialize the filter graph (buffersrc parameters). So what happens
currently is that fftools will request a single frame from each input A, B, etc
that is plugged into the filtergraph.
When using frame threading, the design of the decoder (ff_thread_receive_frame)
is that it will not output any frames until we have received enough packets to
saturate all threads. This, however, forces the decoder to buffer at least as
many frames for each input file as we have threads, before outputting anything.
By decoding the first frame synchronously, we avoid this issue and allow
configuring the filter graph more quickly and without wasting excess resources
on frames that will not (yet) be used.
Normally, this function tries to make sure all threads are saturated with
work to do before returning any frames; and will continue requesting packets
until that is the case.
However, this significantly slows down initial decoding latency when only
requesting a single frame (to e.g. configure the filter graph), and also
wastes a lot of unnecessary memory in the event that the user does not intend
to decode more frames until later.
By introducing a new `flags` paramater and a new flag
`AV_CODEC_RECEIVE_FRAME_FLAG_SYNCHRONOUS` to go along with it, we can allow
users to temporarily bypass this logic.
h->context_initialized is zero after flush, which triggers call to
ff_get_format unconditionally. ff_get_format can be heavy with
ff_hwaccel_uninit and hwaccel_init. For example, it takes 20 ms on
macOS with videotoolbox. ff_get_format should not be called if
nothing changed. ff_get_format is guarantee to be called at the
first time and when video information changed with
(must_reinit || needs_reinit).
Fix#20760.
Add checkasm coverage for the XYZ12LE to RGB48LE path via the
ctx->xyz12Torgb48 hook. Integrate the test into the build and runner,
exercise a variety of widths/heights, compare against the C reference,
and benchmark when width is multiple of 4.
This improves test coverage for the new function pointer in preparation
for architecture-specific implementations in subsequent commits.
Signed-off-by: Arpad Panyik <Arpad.Panyik@arm.com>
Prepare for xyz12Torgb48 architecture-specific optimizations in
subsequent patches by:
- Grouping XYZ+RGB gamma LUTs and 3x3 matrices into SwsColorXform
(ctx->xyz2rgb and ctx->rgb2xyz), replacing scattered fields.
- Dropping the unused last matrix column giving the same or smaller
SwsInternal size.
- Renaming ff_xyz12Torgb48 and ff_rgb48Toxyz12 and routing calls via
the new per-context function pointer (ctx->xyz12Torgb48 and
ctx->rgb48Toxyz12) in graph.c and swscale.c.
- Adding ff_sws_init_xyzdsp and invoking it in swscale init paths
(normal and unscaled).
- Making fill_xyztables public to ease its setup later in checkasm.
These modifications do not introduce any functional changes.
Signed-off-by: Arpad Panyik <Arpad.Panyik@arm.com>
The width 16 epel functions never use four taps in any direction*,
so don't build said functions. Saves 4352B of .text and 89B of
.text.unlikely here.
*: mx and my in vp8_mc_luma() are always even.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For the epel functions, there can be no overflow as long as the sum
contains only one of the two large central coefficients; for bilinear
functions, there can be no overflow whatsoever.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
By changing the permutations used in the epel8_h{4,6} case
we can simply reuse the coefficient tables from the vertical epel
filters.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Doubling the register width allows to use only one pshufb and pmaddubsw.
Old benchmarks:
vp8_put_epel4_h4_c: 82.8 ( 1.00x)
vp8_put_epel4_h4_ssse3: 13.9 ( 5.96x)
New benchmarks:
vp8_put_epel4_h4_c: 82.7 ( 1.00x)
vp8_put_epel4_h4_ssse3: 11.7 ( 7.08x)
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Switching to xmm registers allows to process two rows in parallel,
leading to speedups. It is also ABI compliant (no more missing emms).
Old benchmarks:
vp8_put_epel4_v4_c: 96.8 ( 1.00x)
vp8_put_epel4_v4_ssse3: 28.2 ( 3.43x)
New benchmarks:
vp8_put_epel4_v4_c: 95.1 ( 1.00x)
vp8_put_epel4_v4_ssse3: 22.8 ( 4.17x)
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Switching to xmm registers allows to process two rows in parallel,
leading to speedups. It is also ABI compliant (no more missing emms).
Old benchmarks:
vp8_put_epel4_v6_c: 132.8 ( 1.00x)
vp8_put_epel4_v6_ssse3: 34.3 ( 3.87x)
New benchmarks:
vp8_put_epel4_v6_c: 131.5 ( 1.00x)
vp8_put_epel4_v6_ssse3: 27.1 ( 4.86x)
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Use GPRs on x64 and xmm registers else (using GPRs reduces codesize).
This avoids clobbering the floating point state and therefore no longer
breaks the ABI.
No change in benchmarks here.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX(EXT) functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
A heap-use-after-free vulnerability was identified in
`libavcodec/aac/aacdec.c`. When `che_configure` frees a
`ChannelElement` (`ac->che[type][id]`), it failed to clear all
references to it in `ac->tag_che_map`. `ac->tag_che_map` caches
pointers to `ChannelElement`s and can contain cross-type mappings (e.g.,
a `TYPE_SCE` tag mapping to a `TYPE_LFE` element).
In a USAC stream reconfiguration scenario, an LFE element was freed, but
a stale pointer remained in `ac->tag_che_map`. Subsequent calls to
`ff_aac_get_che` returned this dangling pointer, leading to a crash in
`decode_usac_core_coder`.
This commit fixes the issue by iterating over the entire
`ac->tag_che_map` in `che_configure` and clearing any entries that point
to the `ChannelElement` about to be freed, ensuring no dangling pointers
remain.
Fixes: https://issues.oss-fuzz.com/issues/440220467
Intra refresh is a technique that gradually refreshes the video by encoding rows or regions as intra macroblocks/CTUs spread over multiple frames, rather than using periodic I-frames.
This provides better error resilience for video streaming while maintaining more consistent bitrate.
Disable Intra Refresh (This is the default)
ffmpeg -init_hw_device d3d12va -hwaccel d3d12va -hwaccel_output_format d3d12 \
-i input.mp4 \
-c:v h264_d3d12va \
-intra_refresh_mode none \
-intra_refresh_duration 30 \
-g 60 \
output.h264
Enable Intra Refresh
ffmpeg -init_hw_device d3d12va -hwaccel d3d12va -hwaccel_output_format d3d12 \
-i input.mp4 \
-c:v h264_d3d12va \
-intra_refresh_mode row_based \
-intra_refresh_duration 30 \
-g 60 \
output.h264
Parameters
- `-intra_refresh_mode`: Set to `row_based` to enable row-based intra refresh, or `NONE` to disable
- `-intra_refresh_duration`: Number of frames over which to spread the intra refresh (default: 0 = use GOP size)
- `-g`: GOP size (should typically be larger than intra refresh duration)
Fixes single image videos
this works and creates our single image video
./ffmpeg -i lena.pnm /tmp/file.m2v
this fails after 3d96d83a0a:
./ffmpeg -i /tmp/file.m2v /tmp/file.jpg -y
This reverts commit 3d96d83a0a.
size_t cannot fit VK_WHOLE_SIZE on 32-bit builds.
Fixes: warning: conversion from 'long long unsigned int' to 'size_t' {aka 'unsigned int'} changes value from '18446744073709551615' to '4294967295'
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
There is no need to scan for NULL, if we inject it ourselves.
Fixes: warning: 'strncat' specified bound 10 equals source length [-Wstringop-overflow=]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Some XMVs introduce a blank packet at the end of the stream. Previously, we
didn't account for this and returned AVERROR_INVALIDDATA, indicating an issue
with the file. Instead, let's check for this and close out with AVERROR_EOF.
Fix#20940
The feedback and its sub-filter both request frame
from each other, casuing block since 4440e499ba
The feedback should only request inputs[1] once
rather than continuously request frame cause blocking.
This patch add check whether feedback already request
inputs[1] via ff_outlink_frame_wanted(ctx->outputs[1]),
if true, then exit and waiting inputs[0] because it means
we need more frames input to proceed.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
Fixes a heap-buffer-overflow in `decode_frame` where `header_len` read
from the bitstream was not validated against the remaining bytes in the
input buffer (`gb`). This allowed `gb_hdr` to be initialized with a size
exceeding the actual packet data, leading to an out-of-bounds read.
The fix adds a check to ensure `bytestream2_get_bytes_left(&gb)` is
greater than or equal to `header_len - 2` before initializing `gb_hdr`.
Fixes: https://issues.oss-fuzz.com/issues/439711053
Fixes this test under UBSan:
runtime error: call to function dct_unquantize_mpeg1_intra_c through pointer to incorrect function type 'void (*)(struct MpegEncContext *, short *, int, int)'
I don't know how I could forget this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This was accidentally removed in
357fc5243c.
This fixes test failures when built with Clang and MSVC;
surprisingly, the checkasm test did seem to pass when built with
GCC. Clang and MSVC also warn about the use of the uninitialized
variable, while GCC didn't.
Currently several inline ASM blocks used a value as
an input and rax as clobber register. The input value
was just moved into the register which then served as loop
counter. This is wasteful, as one can just use the value's
register directly as loop counter.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Unquantizing involves calculating
(block[j] * qscale * quant_matrix[j]) / 16
where / rounds towards zero. Arithmetic right shifts
naturally round towards -inf, so the earlier code
calculated the absolute value first, then used a right-shift
and then negated the result if necessary.
This commit uses a different procedure: It biases the product
for negative values of block[j] by 0xf. The combination of
this and the arithmetic right shift is the same as rounding
towards zero.
Furthermore, a write-only store to mm7 has been removed.
Benchmarks:
dct_unquantize_mpeg2_intra_c: 214.3 ( 1.00x)
dct_unquantize_mpeg2_intra_mmx (old): 43.0 ( 4.98x)
dct_unquantize_mpeg2_intra_mmx (new): 28.4 ( 7.56x)
(The bitexact flag and the test for correctness have beem removed
from checkasm for the benchmarks.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The H.263 unquantize functions modified an input parameter.
(And they did so since this code was added in
7f3f5ec87b. I am surprised
that this didn't cause issues, particularly with the intra function.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is necessary so that the mpegvideo_unquantize checkasm test
does not pull mpegvideo.o and then all of libavcodec into checkasm.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These functions currently operate on the assumption that the number
of coefficients to process is always of the form 16k+m with m<=4 or >8.
Yet this is not true when the IDCT permutation is of type FF_IDCT_PERM_LIBMPEG2
(i.e. when FF_IDCT_INT is in use).
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This adds a test for the mpegvideo unquantize functions.
It has been written in order to be able to easily bench
these functions. It should be noted that the random input
fed to the tested functions is not necessarily representative
of the stuff actually occuring in the wild. So benchmarks should
be taken with a grain of salt; but comparisons between two functions
that do not depend on branch predictions are valid (the usecase
for this is to port the x86 mmx functions to use xmm registers).
During testing I have found a bug in the arm/aarch64 neon optimizations
when using the LIBMPEG2 permutation (used by FF_IDCT_INT): The code
seems to be based on the presumption that the remainder of the number
of coefficients to process is always <= 4 mod 16. The test therefore
sometimes fails for these arches.
Hint: I am not certain that 16 bits are enough for the intermediate
values of all the computations involved; e.g. both FLV and MPEG-4
escape values can go beyond that after the corresponding
multiplications. The input in this test is nevertheless designed
to fit into 16 bits.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Forgotten in 70a7df049c.
Using the wrong scantable matters for codecs for which both scantables
can differ, namely the MPEG-4 decoder and the WMV1/2 codecs.
For WMV1 it can lead to wrong output in case the IDCT permutation
is FF_IDCT_PERM_PARTTRANS, because in this case the entries of
of the intra scantable's raster end are not always <= the corresponding
entries of the inter scantable's raster end when the former is
initialized via ff_wmv1_scantable[1] and the latter via ff_wmv1_scantable[0].
FF_IDCT_PERM_PARTTRANS is used iff the Neon IDCT is used (for both arm
and aarch64).* Said IDCT is not used during FATE, so that this issue
went unnoticed.
WMV2 uses the same scantables, but uses a custom IDCT
which always uses FF_IDCT_PERM_NONE for which the inter_scantable,
so that the output is always correct for it.
The scantable for MPEG-4 can change mid-stream (for the decoder),
but since c41818dc5d only the intra
scantable is updated, so that both scantables can get out of sync.
In such a case the unquantize intra functions could unquantize
an incorrect number of coefficients.
Using raster_end of the wrong scantable can also lead to an
unnecessarily large amount of coefficients unquantized.
*: FF_IDCT_PERM_SIMPLE and FF_IDCT_PERM_TRANSPOSE would also not work,
but they are not used at all by arm and aarch64.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This patch adds support for DeckLink SDK 14.3 and newer by using
the legacy interfaces in the header <DeckLinkAPI_v14_2_1.h>.
The missing QueryInterface implementations are also provided.
With --disable-asm, ARCH_X86_32 is set to 0, but we still build the
checkasm binary. Update the check so it is config.h agnostic.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Reading the spec for what this flag means, it copies the data verbatim, including any swizzling/tiling, this has two issues
1. the format may not be what ffmpeg expects elsewhere, as it is expecing normal pitch linear host memeory in `swf`
2. the size of the copied data may not match the size of buffer provided, causing heap buffer overflow
It seems like addition of this flag is an oversight as it seems to be for caching/backups of image data, just to be used with copying back to the GPU with the MEMCPY flag, which is *not* how its used in ffmpeg.
Additionally, set memoryRowLength as if it isn't set, it assumes pitch = width_in_bytes, which I don't think is necessarily the case
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.
This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.
This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.
(x86/ops.c has already been put in X86ASM-OBJS.)
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.
This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.
This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This stores a small buffer in shared memory per decode thread (16 bytes),
which helps reduce the number of memory accesses.
The bitstream buffer is first aligned to a 4 byte boundary, so that the
buffer can be filled with a single memory request.
It's not guaranteed that the conversion filter name string will be
deduplicated to the same memory location. While this is common
optimization to do, we cannot rely on it always happening.
Fixes regression since 8b375b2ffd.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
ff_h264_luma_dc_dequant_idct_sse2() does not pass checkasm for certain
seeds, because the input to packssdw no longer fits into an int16_t,
leading to saturation, where the C code just truncates. I don't know
whether the spec contains provisions that ensure that valid input
must not exceed 16 bit or whether the such inputs (even if invalid)
can be triggered by the actual code and not only the test.
This commit adapts the behavior of the function to the C reference code
to fix the test. packssdw is avoided, instead the lower words are
directly transfered to GPRs to be written out. This has unfortunately
led to a slight performance regression here (14.5 vs 15.1 cycles).
Fixes issue #20835.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
pw_1 is currently loaded in both codepaths. Generate it earlier instead.
Gives tiny speedups (15 vs 14.5 cycles) and reduces codesize.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is ABI compliant and gives a tiny speedup here (and is 16B smaller).
Old benchmarks:
h264_luma_dc_dequant_idct_8_c: 33.2 ( 1.00x)
h264_luma_dc_dequant_idct_8_sse2: 16.0 ( 2.07x)
New benchmarks:
h264_luma_dc_dequant_idct_8_c: 33.0 ( 1.00x)
h264_luma_dc_dequant_idct_8_sse2: 15.0 ( 2.20x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only exported (i.e. cglobal) functions need it; stride is already
sign-extended when it reaches any of the internal functions used here,
so don't sign-extend again.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows increased internal precision.
In addition, we can introduce an offset to the DC coefficient
during the second IDCT step, to remove a per-element addition
in the output codepath.
Finally, by processing columns first we can remove the barrier
after loading coefficients.
Signed-off-by: averne <averne381@gmail.com>
When combining rotation with a FIT_ mode other than FIT_FILL, the fitting
logic was operating on the un-rotated rects, when it should have been
operating on the rotated (output) rects.
If the SRC_PATH variable contains certain characters (like a `:`, which may
happen when FATE is executed on Windows), the value for the `file` option is
broken, so `make fate-filter-drawvg-video` always fails.
The solution in this commit is to copy the `drawvg.lines` to the `tests/data`
directory (which already has temp files), so the value for `file` is a fixed
string with no problematic characters.
Signed-off-by: Ayose <ayosec@gmail.com>
Commit ba4b73c977 caused a regression in
the usage of avg_frame_rate to detect the frame rate of raw h264/hevc
bitstreams: after the commit, avg_frame_rate is always the value of the
-framerate option (which is set to 25 by default) instead of the actual
frame rate derived from the bitstream SPS/VPS NALUs.
This commit fixes the regression by setting the framerate codec
parameter to the value of the framerate option instead. After this
change, bitstreams without timing information will derive avg_frame_rate
from the -framerate option, while bitstreams with timing information
will derive avg_frame_rate from the bitstream itself.
The h264-bsf-dts2pts test now returns the correct frame durations for a
bitstream with a mix of single-field and double-field frames.
Signed-off-by: Gavin Li <git@thegavinli.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The supported YUV pixel formats were separated between planar
and semiplanar. This approach reduces the number of CUDA kernels
for all pixel formats.
This patch:
1. Adds support for YUV 4:2:2 planar and semi-planar formats:
yuv422p, yuv422p10, nv16, p210, p216
2. Implements new conversion structures and kernel definitions
for planar and semi-planar formats
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Add support for additional pixel formats in CUDA hardware context:
- Planar formats (yuv420p10, yuv422p, yuv422p10, yuv444p10)
- Semiplanar formats (nv16, p210, p216)
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Playback to a decklink device with a newer version of the
DeckLink SDK (14.3) stalls because the driver code calls
IDeckLinkVideoFrame::QueryInterface, which is not
implemented by ffmpeg.
This patch implements decklink_frame::QueryInterface,
so that playback works with both older (12.x) and
newer (>= 14.3) drivers.
Note: The patch still does not allow the code to compile
with DeckLink SDK 14.3 or newer, as the API has changed.
seq_decode is used to ensure that a picture and all of its reference
pictures use the same SPS. Any time the SPS changes, seq_decode should
be incremented. Prior to this patch, seq_decode was incremented in
frame_context_setup, which is called after the SPS is potentially
changed in decode_sps. Should the decoder encounter an error between
changing the SPS and incrementing seq_decode, the SPS could be modified
while seq_decode was not incremented, which could lead to invalid
reference pictures and various downstream issues. By instead updating
seq_decode within the picture set manager, we ensure seq_decode and the
SPS are always updated in tandem.
When PARSER_FLAG_COMPLETE_FRAMES is set, opus_parse() calls
set_frame_duration even on flush (buf_size==0), which triggers
a spurious "Error parsing Opus packet header" at EOF.
Match streaming-path behavior by skipping duration parsing on empty buffers.
Fixes#20954
Store the intermediate values as words, clipped to the 0..255 range
instead.
Old benchmarks:
filter_diag4_c: 353.4 ( 1.00x)
filter_diag4_sse2: 57.5 ( 6.15x)
New benchmarks:
filter_diag4_c: 350.6 ( 1.00x)
filter_diag4_sse2: 55.1 ( 6.36x)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only the two middle coefficients are so huge that overflow can happen.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For most systems (particularly all x64), the stack is already
guaranteed to be sufficiently aligned. So just use x86inc's
stack feature which does the right thing.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
VP3's frame managment is actually simple: It has three frame slots:
current, last and golden. After having decoded the current frame,
the old last frame will be freed and replaced by the current frame.
If the current frame is a keyframe, it also takes over the golden slot.
The VP3 decoder handled this like this: In single-threaded mode,
the above procedure was carried out (on success). Doing so with
frame-threading is impossible, as it would lead to data races.
Instead vp3_update_thread_context() created new references
to these frames and then carried out said procedure.
This means that vp3_update_thread_context() is not just a "dumb"
function that only copies certain fields from src to dst; instead
it actually processes them. E.g. trying to copy the decoding state
from A to B and then from B to C (with no decode_frame call in between)
will not be equivalent to copying from A to C, as both current and last
frames will be blank in the first case.
This commit changes this: Because last_frame won't be needed after
decoding, no reference to it will be created to it in
vp3_update_thread_context(); instead it is now always unreferenced
after decoding it (even on error). Replacing last_frame with the new
frame is now always performed when the new frame is allocated.
Replacing the golden frame is now done earlier, namely in decode_frame()
before ff_thread_finish_setup(), so that update_thread_context only
has to reference current frame and golden frame. Being dumb means
that update_thread_context also no longer checks whether the current
frame is valid, so that it can no longer error out.
This unifies the single- and multi-threaded codepaths; it can lead
to changes in output in single threaded mode: When erroring out,
the current frame would be discarded and not be put into one
of the reference slots at all in single-threaded mode. The new
code meanwhile does everything as the frame-threaded code already did
in order to reduce discrepancies between the two. It would be possible
to keep the old single-threaded behavior (one would need to postpone
replacing the golden frame to the end of vp3_decode_frame and would
need to swap the current frame and the last frame on error,
unreferencing the former).
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The dimensions are only set at two places: theora_decode_header()
and vp3_decode_init(). These functions are called during init
and during dimension changes, but the latter is only supported
(and attempted) when frame threading is not active. This implies that
the dimensions of the various worker threads in
vp3_update_thread_context() always coincide, so that these checks
are dead and can be removed.
(These checks would of course need to be removed when support
for dimension changes during frame threading is implemented;
and in any case, a dimension change is not an error.)
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
6c7a344b65 made the VLCs shared between
threads and did so in a way that was designed to support stream
reconfigurations, so that the structure containing the VLCs was
synced in update_thread_context. The idea was that the currently
active VLCs would just be passed along between threads.
Yet this was broken by 5acbdd2264:
Before this commit, submit_packet() was a no-op during flushing
for VP3, as it is a no-delay decoder, so it won't produce any output
during flushing. This meant that prev_thread in pthread_frame.c
contained the last dst thread that update_thread_context()
was called for (so that these VLCs could be passed along between
threads). Yet after said commit, submit_packet was no longer
a no-op during flushing and changed prev_thread in such a way
that it did not need to contain any VLCs at all*. When flushing,
prev_thread is used to pass the current state to the first worker
thread which is the one that is used to restart decoding.
It could therefore happen that the decoding thread did not contain
the VLCs at all any more after decoding restarts after flushing
leading to a crash (this scenario was never anticipated and
must not happen at all).
There is a simple, easily backportable fix given that we do not
support stream reconfigurations (yet) when using frame threading:
Don't sync the VLCs in update_thread_context(), instead do it once
during init.
This fixes forgejo issue #20346 and trac issue #11592.
(I don't know why 5acbdd2264
changed submit_packet() to no longer be a no-op when draining
no-delay decoders.)
*: The exact condition for the crash is nb_threads > 2*nb_frames.
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It allows us to easily synchronize the software and hardware
decoders, by removing the abstraction the Vulkan layer added by changing
the values written.
The Vulkan spec requires that all accesses to push data are uniform for
all invocations (e.g. can't be based on gl_WorkGroupID or gl_LocalInvocationID).
This commit optimizes the Vulkan decoder by splitting up decoding
from iDCT, and merging the few tables needed directly into the shader.
The speedup on Intel is 10x.
The decoder will reinit the hwaccel upon pixfmt/dimension changes,
so we can remove the f->use32bit and is_rgb variants of all shaders.
This speeds up init time.
Currently, the thread loop of ffmpeg_filter essentially works like this:
while (1) {
frame, idx = get_from_decoder();
err = send_to_filter_graph(frame);
if (err) { // i.e. EOF
close_input(idx);
continue;
}
while (filtered_frame = get_filtered_frame())
send_to_encoder(filtered_frame);
}
The exact details are not 100% correct since the actual control flow is a bit
more complicated as a result of the scheduler, but this is the general flow.
Notably, this leaves the possibility of leaving a no-longer-needed input
permanently open if the filter graph starts producing infinite frames (during
the second loop) *after* it finishes reading from an input, e.g. in a filter
graph like -af atrim,apad.
This patch avoids this issue by always querying the status of all filter graph
inputs and explicitly closing any that were closed downstream; after each round
of reading output frames. As a result, information about the filtergraph being
closed can now propagate back upstream, even if the filter is no longer
requesting any input frames (i.e. input_idx == fg->nb_inputs).
Fixes: https://trac.ffmpeg.org/ticket/11061
See-Also: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20457#issuecomment-6208
There is currently no way for API users to know that a buffersrc is no longer
accepting input, except by trying to feed it a frame and seeing what happens.
Of course, this is not possible if the user does not *have* a frame to feed,
but may still wish to know if the filter is still accepting input or not.
Since passing `frame == NULL` to `av_buffersrc_add_frame()` is already treated
as closing the input, we are left with no choice but to introduce a new
function for this.
We don't explicitly return the result of `ff_outlink_get_status()` to avoid
leaking internal status codes, and instead translate them all to AVERROR(EOF).
The internal functions for calculating the hadamard difference
of two 8x8 blocks have no epilogue on UNIX64, so one can avoid
the call altogether by placing the 8x8 function so that it directly
falls into the internal function.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The SSE2 and SSSE3 functions are now available everywhere,
making the MMXEXT functions irrelevant.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Use x86inc's stack alignment feature instead of allocating the stack
manually*; this means that this code now also automatically supports
unaligned stacks, so that the SSE2 and SSSE3 functions will now be
available everywhere.
*: The code for this was also buggy: It resulted in the stack pointer
to be 4 mod 8 for x64 for the mmxext version before it was disabled
in 542765ce3e, because it hardcode 4
instead of using gprsize.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The approach of this ASM routine is to process two channels at a time using
AVX instructions. Obviously, there is no point in doing this if there is only
a single channel; in which case the scalar loop would be better.
Fixes a performance regression when filtering mono audio on certain CPUs,
notably e.g. the Intel N100.
This ensures consistent color conversion between double and u8 and
guarantees that values remain consistent across different platforms,
especially when x87 math is used.
Note that libcairo also performs rounding internally when converting
doubles to integers, see _cairo_color_double_to_short().
Fixes: fate-filter-drawvg-interpreter
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Fix#20571
Avoid build errors with openssl forks (like libressl)
that lack some APIs.
This patch replace header check for OPENSSL_init_ssl
(was added in 1.1.0) with the OpenSSL 1.1.1 new API
DTLS_get_data_mtu.
Signed-off-by: Jack Lau <jacklau1222gm@gmail.com>
This allows using the tool for one-off reindentations without needing
the check_arm_indent.sh script (e.g. for use outside of ffmpeg),
without having to pipe the file through stdin/stdout.
Ensure that the fragment to close the header (`</div>\"]`) is written when the
function `mermaid_print_section_header` is called only once, which happens when
the filtergraph has no inputs.
Use a string table instead to avoid relocations and don't add
the tag string to libavutil/error.c: It is only used
during the test program, so move it there.
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Before commit e96d90eed6 lavu/internal.h
contained redefined various discouraged/forbidden functions to induce
compilation failures upon use, like e.g.
#define malloc please_use_av_malloc
In order to use these functions, some files had to undefine these
macros. This commit removes the remaining pointless undefs.
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Demuxers should not generate this error code when they encounter truncated
or otherwise invalid files. It's a code the underlying protocol should generate
when there are legitimate reading errors.
Signed-off-by: James Almer <jamrial@gmail.com>
1. A AVPixelFormat can map to multiple VA_FOURCCs, while
vaapi_format_from_pix_fmt() only returns the first item matched
before this patch.
2. vaapi_frames_init() use vaapi_format_from_pix_fmt() to get the
first item. Fourcc in this item may not be supported by the device.
This patch makes vaapi_format_from_pix_fmt return all matched items
iteratively, then use strict check in vaapi_frames_init to get the
right fourcc.
Previously, we set s->slice_initialized to 0 to prevent other slice
segments from depending on this slice segment only if hls_slice_header
failed. If decode_slice fails for some other reason, however, before
decode_slice_data is called to bring the context back into a consistent
state, then slices could depend on this slice segment while it is in an
invalid state. This can cause segmentation faults and other sorts of
nastiness. Patch fixes this by always setting s->slice_initialized to 0
while the state is inconsistent.
Resolves#11652.
Red Hat Enterprise Linux 9 is one patch version lower than what
FFmpeg currently requests. The slightly older versions still result
in a working build of FFmpeg with DVD support, so allow those
versions to be consumed to build FFmpeg.
Signed-off-by: Neal Gompa <neal@gompa.dev>
This function uses ff_sws_pixel_type_size to switch on the
size of the provided type. However, ff_sws_pixel_type_size returns
a size in bytes (from sizeof()), not a size in bits. Therefore,
this would previously never return the right thing but always
hit the av_unreachable() below.
As the function is entirely unused, just remove it.
This fixes compilation with MSVC 2026 18.0 when targeting ARM64,
which previously hit an internal compiler error [1].
[1] https://developercommunity.visualstudio.com/t/Internal-Compiler-Error-targeting-ARM64-/10962922
Fixes:
src/tests/checkasm/sw_ops.c:441:34: runtime error: shift exponent 32 is too large for 32-bit type 'int'
src/tests/checkasm/sw_ops.c:591:37: runtime error: shift exponent 32 is too large for 32-bit type 'int'
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes:
src/libswscale/ops_tmpl_int.c:292:23: runtime error: left shift of 188 by 24 places cannot be represented in type 'int'
src/libswscale/ops_tmpl_int.c:290:23: runtime error: left shift of 158 by 24 places cannot be represented in type 'int'
src/libswscale/ops_tmpl_int.c:293:23: runtime error: left shift of 136 by 24 places cannot be represented in type 'int'
src/libswscale/ops_tmpl_int.c:291:23: runtime error: left shift of 160 by 24 places cannot be represented in type 'int'
Signed-off-by: James Almer <jamrial@gmail.com>
This function is extremely small, so inlining it is appropriate (and
actually beneficial size-wise here). It furthermore allows to remove
the mpeg12codecs.h header and the mpeg12-encoders->mpeg12.o dependency.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
codecctl_intp() is used to populate values via aom_codec_control(), not
set them. This change moves the output of the return pointer's value
after the aom_codec_control() to avoid a memory sanitizer warning (use
of uninitialized value).
Signed-off-by: James Zern <jzern@google.com>
For GET_UTF8(val, GET_BYTE, ERROR), val has type of uint32_t,
GET_BYTE must return an unsigned integer, otherwise signed
extension happened due to val= (GET_BYTE), and GET_UTF8 went to
the error path.
This bug incidentally cancelled the bug where hb_buffer_add_utf8
was being called with incorrect argument, allowing drawtext to
function correctly on x86 and macOS ARM, which defined char as
signed. However, on Linux and Android ARM environments, because
char is unsigned by default, GET_UTF8 now returns the correct
return, which unexpectedly revealed issue #20906.
From the doc of HarfBuzz, what hb_buffer_add_utf8 needs is the
number of bytes, not Unicode character:
hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));
Fix issue #20906.
An installation of frei0r-plugins is required to run the tests,
which is usually seperate from the build headers. Some systems
have it packaged (e.g. apt install frei0r-plugins). An upstream
release extracted to FREI0R_PATH also works.
The distort0r filter requires dimensions to be divisible by 8.
av_frame_copy doesn't copy the input's PTS property, which resulted
in the frei0r filter always receiving the same static time.
Example that has a static distortion without patch:
ffmpeg -filter_complex "testsrc2=s=328x240:d=5,frei0r=distort0r" out.mp4
Before FFmpeg commit 531b0a316b,
FFmpeg used REG_SP as macro for the stack pointer, yet this
clashed with a REG_SP define in Solaris system headers, so it
was changed to REG_sp and a comment was added for this.
Libav fixed it by adding an FF_ prefix to the macros in
1e9c5bf4c1. FFmpeg switched
to using these prefixes in 9eb3da2f99,
using FF_REG_sp instead of Libav's FF_REG_SP. In said commit
the comment was changed to claim that Solaris system headers
define FF_REG_SP, but this is (most likely) wrong.
This commit removes the wrong comment and renames the (actually unused)
macro to FF_REG_SP to make it consistent with FF_REG_BP.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
qmat and bias always have a constant offset, so one can use one register
to address both of them. This allows to remove the check for HAVE_6REGS
(untested on a system where HAVE_6REGS is false).
Also avoid FF_REG_a while at it.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Both GCC and Clang completely unroll the unlikely loop at -O3,
leading to codesize bloat; their code is also suboptimal, as they
don't make use of pmulhrsw (even with -mssse3). This commit
therefore ports the whole function to external assembly. The new
function occupies 176B here vs 1406B for GCC.
Benchmarks for a testcase with huge qscale (notice that the C version
is unrolled just like the unlikely loop in the SSSE3 version):
add_8x8basis_c: 43.4 ( 1.00x)
add_8x8basis_ssse3 (old): 43.6 ( 1.00x)
add_8x8basis_ssse3 (new): 11.9 ( 3.63x)
Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is very simple to remove the MPVEncContext from it.
Notice that this also fixes a bug in x86/mpegvideoenc.c: It only
used the SSE2 version of denoise_dct when dct_algo was auto or mmx
(and it was therefore unused during FATE).
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
While cl.exe supports -guard:signret, armasm64 complains about
unknown flag. Note that -guard:ehcont is accepted by armasm64.
Fixes:
error A2029: unknown command-line argument or argument value -guard:signret
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
They are undefined behavior and UBSan warns about them
(in the checkasm test). Put the shifts in the constants
instead. This even gives a tiny speedup here:
Old benchmarks:
column_fidct_c: 3369.9 ( 1.00x)
column_fidct_sse2: 829.1 ( 4.06x)
New benchmarks:
column_fidct_c: 3304.2 ( 1.00x)
column_fidct_sse2: 827.9 ( 3.99x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Maybe uint64_t has been used as a poor man's alignment specifier?
Anyway, reading an uint64_t via an lvalue of type int16_t (as happens
in the C versions of the dsp functions) is undefined behavior.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It gains a lot because it has to operate on eight words;
it also saves 608B of .text here.
Old benchmarks:
column_fidct_c: 3365.7 ( 1.00x)
column_fidct_mmx: 1784.6 ( 1.89x)
New benchmarks:
column_fidct_c: 3361.5 ( 1.00x)
column_fidct_sse2: 801.1 ( 4.20x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This avoids some shift instructions and also gives us more headroom
in the registers. In fact, I have proven to myself that everything
that is supposed to fit into 16bits now actually does so.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It currently is not, because the shortcut mode uses different rounding
than the C code (as well as the non-shortcut code).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The x86 assembly uses the following pattern to zero all
the values with abs<threshold:
x -= threshold;
x satu+= threshold (unsigned saturated addition)
x += threshold
x satu-= threshold (unsigned saturated subtraction)
The reference C code meanwhile zeroed everything
with abs <= threshold. This commit makes the C code behave
like the x86 assembly to reduce discrepancies between the two.
An alternative would be to require SSSE3, so that
one can use pabsw, pcmpgtw for abs>threshold, followed by
a pand with the original data. Or one could modify the thresholds
to make both equal.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is possible because the requirements are fulfilled;
it is also beneficial performance and code-size wise.
For GCC 14 (with -O3), this reduced codesize by 26750B
here; for Clang 20, it was 432B.
Old benchmarks:
mul_thrmat_c: 4.3 ( 1.00x)
mul_thrmat_sse2: 4.3 ( 1.00x)
store_slice_c: 2810.8 ( 1.00x)
store_slice_sse2: 542.5 ( 5.18x)
store_slice2_c: 3817.0 ( 1.00x)
store_slice2_sse2: 410.4 ( 9.30x)
New benchmarks:
mul_thrmat_c: 4.3 ( 1.00x)
mul_thrmat_sse2: 4.3 ( 1.00x)
store_slice_c: 1510.1 ( 1.00x)
store_slice_sse2: 545.2 ( 2.77x)
store_slice2_c: 1763.5 ( 1.00x)
store_slice2_sse2: 408.3 ( 4.32x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is obviously what is intended and what the MMX code does;
yet I cannot rule out that it changes the output for some inputs:
I have observed individual src values which would lead to temp
values just above 512 if they came in pairs (i.e. if both inputs
were simultaneously huge).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This fixes an ABI violation, as mul_thrmat did not issue emms.
It seems that this ABI violation could reach the user, namely
if ff_get_video_buffer() fails. Notice that ff_get_video_buffer()
itself could fail because of this, namely if the allocator uses
floating point registers.
On x64 (where GCC already used SSE2 in the C version)
mul_thrmat_c: 4.4 ( 1.00x)
mul_thrmat_mmx: 8.6 ( 0.52x)
mul_thrmat_sse2: 4.4 ( 1.00x)
On 32bit (where SSE2 is not known to be available):
mul_thrmat_c: 56.0 ( 1.00x)
mul_thrmat_sse2: 6.0 ( 9.40x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is in preparation for adding checkasm tests; without it,
checkasm would pull all of libavfilter in.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
I found that the default value is not set for onfail option. I see that there is an attempt to set this value by default inside parse_slave_failure_policy_option. But look at the CONSUME_OPTION macro. If av_dict_get cannot find this option, then this function is not even called.
The code loads 32bit so we can at maximum use 32bit
the return type is also changed to uint16_t (was requested in review),
no path is known where a return value above 32767 is produced, but that was not exhaustively checked
Fixes: runtime error: shift exponent -9 is negative
Fixes: 439483046/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PRORES_RAW_DEC_fuzzer-6649466540326912
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
the values contain 3 4 bit values, thus using hex is more natural
and shows more information
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This flag does nothing since the deactivation of
the dsp_mask field of AVCodecContext in
commits 9ae6ba2883 and
9ae6ba2883 (it has
been superseded with better ways to override the CPU flags).
So deprecate it.
Reviewed-by: Lynne <dev@lynne.ee>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This fixes building after commit
1ce88d29d0.
That commit caused the following errors:
src/doc/fate.texi:234: @anchor expected braces.
src/doc/fate.texi:245: @item found outside of an insertion block.
src/doc/fate.texi:249: @item found outside of an insertion block.
src/doc/fate.texi:261: @item found outside of an insertion block.
src/doc/fate.texi:265: @item found outside of an insertion block.
src/doc/fate.texi:268: @item found outside of an insertion block.
src/doc/fate.texi:274: @item found outside of an insertion block.
src/doc/fate.texi:277: @item found outside of an insertion block.
src/doc/fate.texi:281: @item found outside of an insertion block.
src/doc/fate.texi:287: Unmatched `@end'.
./src/doc/fate.texi:65: Cross reference to nonexistent node `makefile variables' (perhaps incorrect sectioning?).
This is an extension to WAVEFORMATEX used for some AAC streams, defined in the
Windows SDK.
Fixes issue #20887.
Signed-off-by: James Almer <jamrial@gmail.com>
Sometimes functions for bit depth 9..16 are the same (because they
actually only depend on the underlying pixel type). The macros added
here allow to support this usecase.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
VkQueueFamilyQueryResultStatusPropertiesKHR was added with the
VK_KHR_video_queue extension, which NVK does not support yet.
Correctly classify it as part of the video extensions.
This adds support for detecting Zve32x, Zve32f, Zve64x and Zve64d on
a system that does not support (or declare) the full V feature set.
Because the code assumes that vector detected at runtime are at least
128-bit long, we have to manually check that too (in theory, vectors
can be 32-bit or 64-bit, though that makes little sense in 2025).
This reverts commit 7b18eafabd.
That commit added tests that don't work on Windows, and which
also fail in setups with cross/remote testing (with --target-exec
and --target-path).
See https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20876 for more
discussions about issues with that commit.
When the furthest-behind stream is being fed by a demuxer that is also
feeding packets to a choked filter graph, we need to unchoke that filter
graph to prevent the demuxer from getting stuck trying to write packets to
the choked filter graph.
This situation can also apply recursively - if the demuxer is also writing
to a filtergraph that is also reading from a choked demuxer, there is a
similar deadlock.
Solve all such deadlocks by just brute-force recursively unchoking all
nodes that can somehow prevent this demuxer from writing packets. This
should normally not result in any change in behavior, unless audio/video
streams are badly desynchronized, in which case it may result in extra
memory usage from the too-far-ahead stream buffering packets inside the
muxer. (But this is, of course, preferable to a deadlock)
Fixes: https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/20611
For mv-hevc, the second layer of IDR frame can be a P slice.
long_term_rps wasn't been reset before the patch, which leading to
ff_hevc_frame_nb_refs return incorrect result.
This fix decoding failure for samples from Pico VR.
Fixes: out of array access
No testcase
Found-by: Joshua Rogers <joshua@joshua.hu> with ZeroPath
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 2147483565 + 128 cannot be represented in type 'int'
Fixes: 428055715/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6358069900804096
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The qScale syntax element has a maximum value of 512, which would overflow the 16-bit store from the VLD shader in extreme cases.
This fixes that edge case by forwarding the element in a storage buffer, and applying the inverse quantization fully in the IDCT shader.
The one in dashenc was added in
fe5e6e34c0, while the one in hlsenc
was added later in 0afa171f25. Both
have had various additions on top; merge both implementations
into one shared. (Notable additions in
060e74e2a9,
1cf2f040e3,
a2b1dd0ce3 and
797f0b27c175022d896e46db4ac2873e3e0a70af.)
For H264/avc1, use the implementation from hlsenc (which doesn't
use temporary allocations). For most other codecs, use the
only implementation from whichever had one.
The original dashenc implementation tried to be generic based
on RFC 6381, looking up codec tags in ff_codec_movvideo_tags
or ff_codec_movaudio_tags, and doing specific extra additions
for "mp4a" and "mp4v". In practice, only AV_CODEC_ID_AAC
and AV_CODEC_ID_MPEG4 ever mapped to these; simplify this to
a more straightforward codec id based handling, and merge
with the AAC profile based code from hlsenc.
There's a slight behaviour difference from the old one in
dashenc; if there's no code for a specific codec ID, we previously
just output what we matched from the mov tag tables, but now
we won't output anything. But most commonly used codecs in
DASH should be covered here.
The frei0r API expects the time in seconds, but was given it in
milliseconds. The bug might exist since 41f1d3a (~14 years ago),
but plugins depending on the time are unwatchable without this
patch. For example:
ffmpeg -filter_complex "testsrc2=d=5,frei0r=distort0r" out.mp4
Signed-off-by: Stefan Breunig <stefan-ffmpeg-devel@breunig.xyz>
Using unsigned for fps is more natural since the corresponding
AVTimecode field is unsigned. It also fixes a -Wformat-truncation
warning from GCC 16: in case fps were negative, hours, minutes
and seconds would be negative, leading to additional '-' characters
which are not accounted for in AV_TIMECODE_STR_SIZE.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is no longer necessary now that the IDCTDSP is always ABI-compliant
(and free of MMX).
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Before this commit, the (32-bit only) simple idct came in three
versions: A pure MMX IDCT and idct-put and idct-add versions
which use SSE2 at the put and add stage, but still use pure MMX
for the actual IDCT.
This commit ports said IDCT to SSE2; this was entirely trivial
for the IDCT1-5 and IDCT7 parts (where one can directly use
the full register width) and was easy for IDCT6 and IDCT8
(involving a few movhps and pshufds). Unfortunately, DC_COND_INIT
and Z_COND_INIT still use only the lower half of the registers.
This saved 4658B here; the benchmarking option of the dct test tool
showed a 15% speedup.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Unnecessary since the Xvid IDCT no longer uses MMX registers at all.
(Notice that the simple MMX IDCT issues emms and is therefore ABI
compliant.)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is higly surprising and would necessitate emms in order to be ABI
compliant; but it is better just not to use them in the first place.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The test has been removed in bfb28b5ce8
when MMX idctdsp functions overridden by SSE2 were removed;
ff_simple_idct_mmx() has been completely disabled in this patch
for x64 and so the test should have been disabled on x64 instead
of removing it.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
bfb28b5ce8 removed the MMX idct_put
and idct_add functions, because they were overridden by SSE2 versions
(which use SSE2 only for the put/add part, not the actual IDCT).
This meant that for MMX, the idct functions are not set in unison,
so that the permutation which is meant to apply to all three
is incorrect on 32bit systems if SSE2 is unavailable/disabled.
Fix this by setting the MMX version only if SSE2 is enabled.
(No one complained, so apparently no one uses a new FFmpeg
with non-SSE2 capable systems.)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: out of array access
Fixes: zeropath/off-by-one-one-byte
Found-by: Joshua Rogers <joshua@joshua.hu>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The quantization parameter (qp) can exceed 63 when the base value
from frame header (0-63) is combined with the offset from slice data
(up to +2), resulting in qp=65. This causes out-of-bounds access to
the rv60_qp_to_idx[64] array in decode_cbp8(), decode_cbp16(), and
get_c4x4_set().
Fixes: Out-of-bounds read
Signed-off-by: oblivionsage <cookieandcream560@gmail.com>
No testsample is available
This is related to 61cbcaf93f and clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RV60_fuzzer-5160167345291264
which fixed rv60_qp_to_idx[qp + 32] out of array access
These 2 checks are not redundant and neither covers the cases of the other
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
update the switch statement for EXR_UINT in pxr24_uncompress to
correctly use the tile width td->xsize instead of using the full window
width s->xdelta. s->delta is larger than td->xsize which lead to two
buffer overflows when interacting with the ptr variable in the same
switch statement.
Fixes: out of bounds read and write
Found-by: veygax's insomnia network (INSOMNIA-1)
Signed-off-by: veygax <veyga@veygax.dev>
The existing is_luma check is fragile as depending on the order
of channels it can be set or reset
No testcase
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
There is no need to check the whole AVChannelLayout; checking
its nb_channels is enough.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
E.g. check that the ordinary entries are valid (e.g. no negative sample
rates, must have pix fmt descriptor) and also that the sentinels
are valid where they contain redundant information.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For decoders, the pixel format is negotiated via the get_format callback
(if there is a choice at all), so decoders should not set pix_fmts.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is not supposed to be set by decoders (where format negotiation
happens via the get_format callback).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Right now, FFCodec contains a dedicated pointer for
alpha_mode; this is wasteful, as there is only a very limited
range of options for this value, namely four.
So store it as a two-bit bitfield like color_ranges.
This reduces sizeof(FFCodec) by 16 here (the placement
of alpha_mode entailed unnecessary padding) and
saves 11328B of .data.rel.ro here (on a standard build with
no external libraries).
(If it were not for effective-type violations, one could
share the code and the table with AV_CODEC_CONFIG_COLOR_RANGE.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avcodec_get_supported_config() always sets a dummy for
the pointer to the number of entries in case it is not set.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
And rename it to ff_encode_add_stats_side_data() and move its
declaration to encode.h. Also constify the error pointee.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
pict_type is always used since 5e0eac3d4566839598f6d6fe5d77770023d6713a;
ctx in set_pix_fmt() seems to have been always unused.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The optimised version does not work if the stride is not a multiple 8,
which can occur as reproduce by vsynth3-asv1 and vsynth3-asv2 tests.
This reverts commit 02594c8c01.
Conflicts:
libavcodec/riscv/pixblockdsp_init.c
libavcodec/riscv/pixblockdsp_rvv.S
Combine P-frame with following B-frame into a single packet with the
latter's order. Emit a tail packet with a show_existing_frame header
to show it at the correct PTS.
384fe39623 introduced a regression in the
range conversion offset calculation, resulting in a slight green tint
in full-range RGB to YUV conversions of grayscale values.
The offset being calculated was not taking into consideration a bias
needed for correctly rounding the result from the multiplication stage,
leading to a truncated value.
Fixes issue #11646.
This one takes about 2.93s on my machine, but ensures that every pixel
format conversion roundtrips correctly. Note that due to existing bugs in
libswscale, this one only passes when using the new format conversion code.
Restrict the test to -v 16 (AV_LOG_ERROR) to avoid excess amounts of output.
We can't use ANSI color codes inside av_log(), so fall back to printf()
for these; but match the INFO verbosity level.
Also change the format slightly to drop SSIM numbers down to just below
VERBOSE level, since VERBOSE tends to generate a lot of swscale related
spam.
This is more informative than the current behavior, because when the first
MERGE() succeeds but the second fails, the original link already has
merged formats and thus the error message is confusing.
Doubling the register size allowed to avoid two pmaddubsw.
It is also ABI compliant (the old version lacked an emms)
and the average versions no longer rely on padding (the old versions
used pavgb with a memory operand reading eight bytes,
although only four are needed).
Old benchmarks (the latter four refer to RV40):
avg_h264_chroma_mc4_8_c: 145.7 ( 1.00x)
avg_h264_chroma_mc4_8_ssse3: 32.3 ( 4.51x)
put_h264_chroma_mc4_8_c: 136.1 ( 1.00x)
put_h264_chroma_mc4_8_ssse3: 29.0 ( 4.70x)
avg_chroma_mc4_c: 162.1 ( 1.00x)
avg_chroma_mc4_ssse3: 31.1 ( 5.22x)
put_chroma_mc4_c: 137.5 ( 1.00x)
put_chroma_mc4_ssse3: 28.6 ( 4.81x)
New benchmarks:
avg_h264_chroma_mc4_8_c: 146.7 ( 1.00x)
avg_h264_chroma_mc4_8_ssse3: 26.5 ( 5.53x)
put_h264_chroma_mc4_8_c: 136.8 ( 1.00x)
put_h264_chroma_mc4_8_ssse3: 22.5 ( 6.09x)
avg_chroma_mc4_c: 165.5 ( 1.00x)
avg_chroma_mc4_ssse3: 27.2 ( 6.08x)
put_chroma_mc4_c: 138.1 ( 1.00x)
put_chroma_mc4_ssse3: 23.2 ( 5.96x)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Vulkan headers define *FlagBits enum with individual bit values, and
coresponding *Flags typedef to be used to store the bitmask of
coresponding bits.
In practice those two types map to the same type, but for consistency
*Flags should be used.
Fixes MSVC warnings about type mismatch.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Wrong enum value was used to check unit_elems. While
AV_FRAME_DATA_MASTERING_DISPLAY_METADATA (11) would trigger when
UNIT_MASTERING_DISPLAY (2) was set, it also would match
UNIT_CONTENT_LIGHT_LEVEL (1) which is not expected.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Pico VR adds a '\0' after projection_type (a real C string than
a fourcc). It's not strictly correct, but doesn't affect parsing.
[prji: Projection Information Box]
position = 149574743
size = 17
version = 0
flags = 0x000000
projection_type = rect
Co-Authored-by: Keven Ma
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
In MSVC builds, object files built for shared or static libraries are
technically not compatible with each other. That's why building both
shared and static libraries simultaneously is not allowed in configure.
This may change in the future once dllimport/dllexport attributes are no
longer used. Which will be possible on next major bump. The only
remaining use of dllexport was changed in c6c8063186.
However, for test programs, we still build internal static libraries
that allow the test programs to access internal symbols.
In commit 8eca3fa619, I assumed that when
CONFIG_STATIC=0, we would never build a static library. We actually do
build one for internal purposes, for the test executables. In that case,
we only link the tested library statically (using the same object files
as built for the shared library), the rest of the libraries are still
linked dynamically.
Such libraries are never installed and are used only for test programs.
This change adds a -static suffix to these internal libraries to avoid
name conflicts. In the MSVC world, static libraries and import libraries
are generally the same thing and share the same naming conventions.
Fixes: 8eca3fa619
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This is a regression introduced by the addition of the rotation option,
which overrode the existing rotation attribute that may have been set to
the image.
To fix it, add the rotation istead of setting it - however we have to do this
directly when mapping, so as to not add it multiple times.
Fixes: 4f623b4c59
They are overridden by SSE2 and no longer needed by the no longer
existing nsse MMX functions. Saves 240B here.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Even nsse8 has to operate on eight words and therefore gains
a lot from xmm registers (and pabsw).
Old benchmarks:
nsse_0_c: 359.2 ( 1.00x)
nsse_0_mmx: 151.8 ( 2.37x)
nsse_1_c: 151.2 ( 1.00x)
nsse_1_mmx: 77.5 ( 1.95x)
New benchmarks:
nsse_0_c: 358.8 ( 1.00x)
nsse_0_ssse3: 62.2 ( 5.77x)
nsse_1_c: 151.2 ( 1.00x)
nsse_1_ssse3: 33.6 ( 4.50x)
The MMX nsse functions have been removed.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This will avoid using xmm registers that are volatile for Win64
in the next commit.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is a post-processing codec: given delta-x/y coordinates and a run length,
the r/g/b components of the 4 surrounding pixels are summed up, and the resulting
15bit value is used as index into a color quantization table to derive the
resulting pixel at the center.
It is only used in 10-20 frames of the Rebel Assault 2 LxxRETRY.SAN files
to slightly blur the outline of the "opening aperture" effect.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
This makes the final file truly hybrid: Externally the file
is a regular, non-fragmented file, but internally, the fragmented
form also exists un-overwritten.
To make any use of that, first, the fragments need to be muxed in
a position independent form, i.e. with empty_moov+default_base_moof
(or the dash or cmaf meta-flags).
Making use of the fragmented form when the file is finalized is
not entirely obvious though. One can dump the contents of the
single mdat box, and get the fragmented form. (This is a neat
trick, but not something that anybody really is expected to
want to do.)
The main expected use case is accessing fragments in the form of
byte range segments, for e.g. HLS.
Previously, the start of the file would look like this:
- ftyp
- free
- moov
- (moov contents)
After finalizing the file, it would look like this:
- ftyp
- free
- mdat (previously moov)
- (moov contents)
In this form, the size and type of the original moov box were
overwritten, and the original moov contents is just leftover
as unused data in the mdat box.
To avoid this issue, the start of the file now looks like this:
- ftyp
- free
- free
- ftyp
- moov
- (moov contents)
The second, hidden ftyp box inside mdat, would normally never be
seen.
After finalizing, the difference is that the mdat box now is
extended to cover the ftyp and the whole moov including its header
(and all the following fragments).
I.e., the start of the file looks like this:
- ftyp
- free
- mdat
- ftyp
- moov
- (moov contents)
This allows accessing the "ftyp+moov" pair sequentially as such,
with a byte range - this range is untouched when finalizing,
producing the same ftyp+moov pair both while writing, when the
file is fragmented, and after finalizing, when the file is
transformed to non-fragmented externally.
Note; the sequential two "free+free" boxes may look slightly
silly; it could be tempting to make the second one an mdat
from the get-go. However, some players of fragmented mp4 (in
particular, Apple's HLS player) bail out if the initialization
segment contains an mdat box - therefore, use a free box.
It could also be possible to use just one single free box with
8 bytes of padding at the start - but that would require more
changes to the finalization logic.
For a segmenting user of the muxer, the only unclarity is how
to determine the right byte range for the internal ftyp+moov
pair. Currently, this requires parsing the muxer output and skip
past anything up to the start of the non-empty free box.
Fix several indexing errors in attack detection logic and refine transient handling in the AAC psychoacoustic model.
- Change PSY_LAME_NUM_SUBBLOCKS from 3 to 2 to ensure full coverage of all 1024 MDCT samples, with each subblock containing exactly 1024 / (8 * 2) = 64 samples—matching LAME’s empirical design.
- Introduce next_attack0_zero state flag to stabilize attack[0] prediction across frames.
- Adjust attack threshold presets.
These changes improve the handling of periodic signals such as trumpet, especially under low bitrate conditions.
Makes it easier to see that width and height in DecodeContext is
actually a lcevc field.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is not necessary to do it more than once, as none of the fields
set change after init.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It's used by other parts of the module that will fail to build otherwise after
the aforementioned removal.
Signed-off-by: James Almer <jamrial@gmail.com>
wingdi.h defines its own PASSTHROUGH and it is included implicitly
by the VC-1 parser (which is mpegvideo-based and therefore includes
a lot of stuff).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
AVCodecParser has several fields which are not really meant
to be accessed by users, but it has no public-private
demarkation line, so these fields are technically public
and can therefore not simply be made private like
20f9727018 did for AVCodec.*
This commit therefore deprecates these fields and
schedules them to become private. All parsers have already
been switched to FFCodecParser, which (for now) is a union
of AVCodecParser and an unnamed clone of AVCodecParser
(new fields can be added at the end of this clone).
*: This is also the reason why split has never been removed despite
not being set for several years now.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The current code relies on AV_CODEC_ID_NONE being zero, so that
unused codec ids are set to their proper value. This commit adds
a macro to set unset ids to AV_CODEC_ID_NONE.
(The actual rationale for this macro is to simplify
the transition to making the private fields that are
currently public in avcodec.h really private.)
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It only contains declarations for some auxiliary functions for parsing
that parsers that only work with complete packets don't need.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The graph string is either freed or attached to the filtergraph, so it's best to
not leave a dangling pointer with the caller.
Signed-off-by: James Almer <jamrial@gmail.com>
The decode API can handle outputting delayed frames without relying on the
parser splitting off the ENDOFSEQ marker.
Signed-off-by: James Almer <jamrial@gmail.com>
These functions are currently always called with height either
being equal to the block size or block size+1. height is
a compile-time constant at every callsite. This makes it possible
to split this function into two to avoid the check inside
the function for whether height is odd or even.
The corresponding avg function is only used with height == block size,
so that it does not have a height parameter at all. Removing the
parameter from the put_l2 functions as well therefore simplifies
the C code.
The new functions increase the size of .text from qpel{dsp}.o
by 32B here, yet they save 464B of C code here.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX(EXT) functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The only difference between it and the H.264/VC-1 versions is the
bias constant which depends on the shift parameters for RV40.
This value ends up in a register and therefore one can reuse
the H.264 code by setting the registers for RV40 and then jumping
into the relevant H.264 function, making the four new functions
cheap (just 256 bytes in total). This approach uses one jump more
for the no-filter case and one jump less in the one-dimensional
case than an approach using separate functions.
avg_chroma_mc4_c: 167.5 ( 1.00x)
avg_chroma_mc4_mmxext: 48.1 ( 3.48x)
avg_chroma_mc4_ssse3: 31.1 ( 5.39x)
avg_chroma_mc8_c: 325.5 ( 1.00x)
avg_chroma_mc8_mmxext: 103.2 ( 3.15x)
avg_chroma_mc8_ssse3: 33.5 ( 9.71x)
put_chroma_mc4_c: 137.4 ( 1.00x)
put_chroma_mc4_mmx: 44.5 ( 3.09x)
put_chroma_mc4_ssse3: 28.4 ( 4.83x)
put_chroma_mc8_c: 271.4 ( 1.00x)
put_chroma_mc8_mmx: 99.9 ( 2.72x)
put_chroma_mc8_ssse3: 30.6 ( 8.86x)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is only used by mpegvideo decoders (for lowres). It is also only used
for bitdepth == 8, so don't build the bitdepth == 16 function at all any
more.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In the QTFF documentation for udta/"user data atoms", "name" is a
valid entry for "name of object" however it is currently skipped
leading to some missing metadata when copying for some files. This
isn't classified the same as "(C)nam" which is "Title of content"
so they aren't necessarily the same.
For the purpose of merging streams in a stream group, a filtergraph can't be
created once we know it will be used. Therefore, allow filtergraphs to be
removed from the scheduler after being added.
Signed-off-by: James Almer <jamrial@gmail.com>
Several formats are being designed where more than one independent video or
audio stream within a container are part of what should be a single combined
output. This is the case for HEIF (Defining a grid where the decoded output
of several separate video streams are to be placed to generate a single output
image) and IAMF (Defining audio streams where channels are present in separate
coded stream).
AVStreamGroup was designed and implemented in libavformat to convey this
information, but the actual merging is left to the caller.
This change allows the FFmpeg CLI to take said information, parse it, and
create filtergraphs to merge the streams, making the combined output be usable
automatically further in the process.
Signed-off-by: James Almer <jamrial@gmail.com>
This keeps global and per frame side data clearly separated, and actually
propagates the former as it comes out from the buffersink filter.
Signed-off-by: James Almer <jamrial@gmail.com>
Certain parameters, like calculated framerate, are unavailable when connecting
the output of a filtergraph to the input of another.
This fixes command lines like
ffmpeg -lavfi "testsrc=rate=1:duration=1[out0]" -filter_complex "[out0]null[out1]" -map [out1] -y out.png
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: use of uninitialized memory
Fixes: 427814450/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MAGICYUV_DEC_fuzzer-646512196065689
Fixes: 445961558/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_UTVIDEO_DEC_fuzzer-5515158672965632
the multi vlc code will otherwise return uninitialized data. Now one can argue that this data should
not be used, but on errors this data can remain ...
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This would happen if one of the extended transfer characteristics is in
use (currently only AVCOL_TRC_V_LOG).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Panasonic cameras record ProRes RAW in V-Log/V-Gamut internally.
Atomos is a brand of recorders which can record uncompressed
non-debayered RAW over HDMI.
All known cameras output linear RAW over HDMI, so mark the transfer function
as linear in that case.
The parser API doesn't work with packets, only raw data, so in order for it to
be made aware of new extradata propagated through packet side data we need to
pass it in some other form, namely, replacing the main extradata and ensuring
it will be parsed by restarting the parser.
Signed-off-by: James Almer <jamrial@gmail.com>
If avctx->ch_layout is unset (as it's allowed and even expeced by the
AV_CODEC_CAP_CHANNEL_CONF flag), the code setting substream masks will fail for
stereo and mono layouts unless a downmix channel was requested.
Fix this by deriving the mask with coded values only.
Fixes issue #20764.
Signed-off-by: James Almer <jamrial@gmail.com>
The function name 'file_read' is too generic and conflicts with a function
of the same name in the NuttX kernel. Since NuttX links kernel and userspace
into a single binary, this causes a symbol collision when building FFmpeg tools.
Signed-off-by: caifan3 <caifan3@xiaomi.com>
We have no use for 14-bit pixel formats for now, so remove support for gray14,
which was broken due to the LSB padding issue.
Similarly YUVA at 10/12 bit was broken for the same reason.
Fix the AV1 RTP depacketizer for instances where the OBU elements
did not have their OBU size fields removed according to the
recommendation (SHOULD) of the AV1 RTP specification.
Roger Hardiman courteously reported that the depacketizer doesn't
work correctly for fragmented OBU elements with their OBU size
fields still present and will incorrectly state that the
continuation of such a fragment was unexpected, because the
frag_obu_size field was used as a state and only incremented
when the size field needed to be updated on OBU size field
restoration. This patch solves the slip.
Change-Id: I95af8cc89862e8ecf79aabcf029dd95a20dfd7ad
There seem to be samples with no Video element on video tracks in the wild,
which even if not spec compliant, can be demuxed fine after lavf probes the
stream with a decoder/parser.
Relax the check to allow tracks with no dimensions to work, and also add a
check for strict spec compliance to maintain the current behavior if desired.
Fixes issue #20649.
Signed-off-by: James Almer <jamrial@gmail.com>
Add a shader-based Apple ProRes decoder.
It supports all codec features for profiles up to
the 4444 XQ profile, ie.:
- 4:2:2 and 4:4:4 chroma subsampling
- 10- and 12-bit component depth
- Interlacing
- Alpha
The implementation consists in two shaders: the
VLD kernel does entropy decoding for color/alpha,
and the IDCT kernel performs the inverse transform
on color components.
Benchmarks for a 4k yuv422p10 sample:
- AMD Radeon 6700XT: 178 fps
- Intel i7 Tiger Lake: 37 fps
- NVidia Orin Nano: 70 fps
In preparation for the Vulkan hwaccel.
The existing hwaccel code was designed around
videotoolbox, which ingests the whole frame
bitstream including picture headers.
This adapts the code to accomodate lower-level,
slice-based hwaccels.
The code that attempted to extract origin (4B.02) was put in the wrong
metadata handler and was also read as the wrong integer type. According
ST 377-1 Origin is of type "Position" (Int64) in the Timeline Track, not
a 8 bit unsigned integer in the Sequence.
The drawvg filter can draw vector graphics on top of a video, using libcairo. It
is enabled if FFmpeg is configured with `--enable-cairo`.
The language for drawvg scripts is documented in `doc/drawvg-reference.texi`.
There are two new tests:
- `fate-filter-drawvg-interpreter` launch a script with most commands, and
verify which libcairo functions are executed.
- `fate-filter-drawvg-video` render a very simple image, just to verify that
libcairo is working as expected.
Signed-off-by: Ayose <ayosec@gmail.com>
fate-lavf-flv is dependency of fate-api-seek, but the fate-api-seek is
setting CMP = null, which affects fate-lavf-flv. Reset CMP value to
expected by the flv test and use default diff compare.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
The last 3dnow functions have been removed in commit
5ef613bcb0. So remove 3dnow
from configure; the only thing kept is that --disable-amd3dnow
is still accepted, but no longer advertised by --help.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Supresses:
warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
Also drop L, as shift will never exceed 31.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This is how images encoded with specific transfer function should be
viewed. Image viewers that doesn't support named trc metadata, will
fallback to simple gAMA value and both of those cases should produce the
same image apperance for the viewer.
Fixes: https://github.com/mpv-player/mpv/issues/13438
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
The mismatch between neon and C functions can be reproduced
using the following bitstream and command line on aarch64 devices.
wget https://streams.videolan.org/ffmpeg/incoming/replay_intra_pred_16x16.h264
./ffmpeg -cpuflags 0 -threads 1 -i replay_intra_pred_16x16.h264 -f framemd5 -y md5_ref
./ffmpeg -threads 1 -i replay_intra_pred_16x16.h264 -f framemd5 -y md5_neon
Signed-off-by: Bin Peng <pengbin@visionular.com>
Previously, the LC3 encoder only accepted planar float (AV_SAMPLE_FMT_FLTP).
This change extends support to packed float (AV_SAMPLE_FMT_FLT) by properly
handling channel layout and sample stride.
The pcm data pointer and stride are now calculated based on the sample
format: for planar, use frame->data[ch]; for packed, use frame->data[0]
with channel offset. The stride is set to 1 for planar and number of
channels for packed layout.
This enables encoding from common packed audio sources without requiring
a prior planar conversion, improving usability and efficiency.
Signed-off-by: cenzhanquan1 <cenzhanquan1@xiaomi.com>
1. Adds support for respecting the requested sample format. Previously,
the decoder always used AV_SAMPLE_FMT_FLTP. Now it checks if the
caller requested a specific format via avctx->request_sample_fmt and
honors that request when supported.
2. Improves planar/interleaved audio buffer handling. The decoding
logic now properly handles both planar and interleaved sample
formats by calculating the correct stride and buffer pointers based
on the actual sample format.
The changes include:
- Added format mapping between AVSampleFormat and lc3_pcm_format
- Implemented format selection logic in initialization.
- Updated buffer pointer calculation for planar/interleaved data.
- Maintained backward compatibility with existing behavior.
Signed-off-by: cenzhanquan1 <cenzhanquan1@xiaomi.com>
Add dtls_active flag to specify the dtls role.
Properly set the send key and recv key depends on DTLS role:
As DTLS server, the recv key is client master key plus salt,
the send key is server master key plus salt.
As DTLS client, the recv key is server master key plus salt,
the send key is client master key plus salt.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
If using the delay_moov flag in combination with hybrid_fragment
(which is a potentially problematic combination otherwise - the
ftyp box does end up hidden in the end), then we need to flush
twice to get both the moov box and the first fragment, if the
file is finished before the first fragment is completed.
If samples were available when the moov was written, chunking
for those samples has been done already, which has to be reset
here.
This is the case when not using empty_moov, when the moov box
describes the first fragment - this case was accounted for already.
But if using the delay_moov flag, then those samples also were
available when writing the moov, so chunking for them has already
been done in this case as well.
Therefore, always reset chunking here (it should be harmless to
always do it), and update the comment to clarify the cases
involved here.
When calculating the max size of an output PNG packet, we should
include the size of a possible eXIf chunk that we may write.
This fixes a regression since d3190a64c3
as well as a bug that existed prior in the apng encoder since commit
4a580975d4.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
When splitting a 5 lines image in 2 slices one will be 3 lines and thus need more space
Fixes: Assertion sc->slice_coding_mode == 0 failed at libavcodec/ffv1enc.c:1668
Fixes: 422811239/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FFV1_fuzzer-4933405139861504
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
We do not support larger tiles as we use signed int
Alternatively we can check this in apv_decode_tile_component() or init_get_bits*()
or support bitstreams above 2gb length
Fixes: init_get_bits() failure later
Fixes: 421817631/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APV_fuzzer-4957386534354944
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This has the advantage of not violating the ABI by using
MMX registers without issuing emms; it e.g. allows
to remove an emms_c from bink.c.
This commit uses GP registers on Unix64 (there are not
enough volatile registers to do likewise on Win64) which
reduces codesize and is faster on some CPUs.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Snow calls some of the me_cmp_funcs with insufficient alignment
for the first pointer (see get_block_rd() in snowenc.c);
therefore SSE2 functions which really need this alignment
don't get set for Snow and 542765ce3e
consequently didn't remove MMXEXT functions which are overridden
by these SSE2 functions for normal codecs.
For reference, here is a command line which would segfault
if one simply used the ordinary SSE2 functions for Snow:
./ffmpeg -i mm-short.mpg -an -vcodec snow -t 0.2 -pix_fmt yuv444p \
-vstrict -2 -qscale 2 -flags +qpel -motion_est iter 444iter.avi
This commit adds unaligned SSE2 versions of these functions
and removes the MMXEXT ones. This in particular implies that
sad 16x16 now never uses MMX which allows to remove an emms_c
from ac3enc.c.
Benchmarks (u means unaligned version):
sad_0_c: 8.2 ( 1.00x)
sad_0_mmxext: 10.8 ( 0.76x)
sad_0_sse2: 6.2 ( 1.33x)
sad_0_sse2u: 6.7 ( 1.23x)
vsad_0_c: 44.7 ( 1.00x)
vsad_0_mmxext (approx): 12.2 ( 3.68x)
vsad_0_sse2 (approx): 7.8 ( 5.75x)
vsad_4_c: 88.4 ( 1.00x)
vsad_4_mmxext: 7.1 (12.46x)
vsad_4_sse2: 4.2 (21.15x)
vsad_4_sse2u: 5.5 (15.96x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The SSE2 function overriding them are currently only set
if the SSE2SLOW flag is not set and if the codec is not Snow.
The former affects only outdated processors (AMDs from
before Barcelona (i.e. before 2007)) and is therefore irrelevant.
Snow does not use the pix_abs function pointers at all,
so this is also no obstacle.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The new functions are faster than the existing exact
functions, yet get beaten by the nonexact functions
(they can avoid unpacking to words and back).
The exact (slow) MMX functions have therefore been
removed, which was actually beneficial size-wise
(416B of new functions, 619B of functions removed).
pix_abs_0_3_c: 216.8 ( 1.00x)
pix_abs_0_3_mmx: 71.8 ( 3.02x)
pix_abs_0_3_mmxext (approximative): 17.6 (12.34x)
pix_abs_0_3_sse2: 23.5 ( 9.23x)
pix_abs_0_3_sse2 (approximative): 9.9 (21.94x)
pix_abs_1_3_c: 98.4 ( 1.00x)
pix_abs_1_3_mmx: 36.9 ( 2.66x)
pix_abs_1_3_mmxext (approximative): 9.2 (10.73x)
pix_abs_1_3_sse2: 14.8 ( 6.63x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is a major rewrite of the exising nlmeans vulkan code, with bug
fixes and major performance improvement.
Fix visual artifacts found in ticket #10661, #10733. Add OOB checks for
image loading and patch sized area around the border. Correct chroma
plane height, strength and buffer barrier index.
Improve parallelism with component workgroup axis and more but smaller
workgroups. Split weights pass into vertical/horizontal (integral) and
weights passes. Remove h/v order logic to always calculate sum on
vertical pass. Remove atomic float requirement, which causes high memory
locking contentions, at the cost of higher memory usage of w/s buffer.
Use cache blocking in h pass to reduce memory bandwidth usage.
Write the moov tag at the end first, before overwriting the mdat size
at the start of the file.
In case writing the final moov box fails (e.g. due to being out
of disk), we haven't broken the initial moov box yet.
Thus if writing stops between these steps, we could end up with
a file with two moov boxes - which arguably is more feasible to
recover from, than from a file with no moov boxes at all.
The documentation states that this field is for enabling "extra" usage
flags. This conflicts with the implementation, and the rest of the comment,
though.
In resolving this ambiguity, I think it's better to lean towards the first
sentence and treat this field purely as specifying *extra* usage flags to
enable. Otherwise, this may break vulkan encoding or subsequent hwdownload
if the upstream filter did not specifically advertise this.
Change the default behavior and update the documentation slightly to more
clearly document the semantics.
This avoids having to fix up ABI violations via emms_c and
also leads to a 73% speedup for the line noise average version
here.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When using averaged noise with height > MAX_RES (i.e. 4096),
multiple threads would access the same prev_shift slot,
leading to races. Fix this by disabling slice threading
in such scenarios.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is not only UB, but also leads to races and nondeterministic
output, because the write one last the end of the buffer actually
conflicts with accesses by the thread that actually owns it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
"all" only exists to set options; it does not need the big arrays
contained in FilterParams.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This reverts commit 301141b576.
cluster[0].dts, pts and frag_info[0].time are already in presentation
timeline, so they shouldn't be shift by start_pts.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The diff and var functions benefit from psadbw, comb from wider
registers which allows to avoid reloading values, reducing the number
of loads from 48 to 10. Performance increased by 117% (the loop
in compute_metric() has been timed); codesize decreased by 144B.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to remove an emms_c from the filter. It also gives
25% speedup here (when timing the calls to store_slice using
START/STOP_TIMER).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Improves performance and no longer breaks the ABI (by forgetting
to call emms).
Old benchmarks:
add_8x8basis_c: 43.6 ( 1.00x)
add_8x8basis_ssse3: 12.3 ( 3.55x)
New benchmarks:
add_8x8basis_c: 43.0 ( 1.00x)
add_8x8basis_ssse3: 6.3 ( 6.79x)
Notice that the output of try_8x8basis_ssse3 changes a bit:
Before this commit, it computes certain values and adds the values
for i,i+1,i+4 and i+5 before right shifting them; now it adds
the values for i,i+1,i+8,i+9. The second pair in these lists
could be avoided (by shifting xmm0 and xmm1 before adding both together
instead of only shifting xmm0 after adding them), but the former
i,i+1 is inherent in using pmaddwd. This is the reason that this
function is not bitexact.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The only requirement of this code (and essentially the pmulhrsw
instruction) is that the scaled scale fits into an int16_t.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This loosens the coupling between CBS and the decoder by no longer using
CodedBitstreamH266Context (containing the most recently parsed PSs & PH)
to retrieve the PSs & PH in the decoder. Doing so is beneficial in two
ways:
1. It improves robustness to the case in which an AVPacket doesn't
contain precisely one PU.
2. It allows the decoder parameter set manager to properly handle the
case in which a single PU (erroneously) contains conflicting
parameter sets.
Signed-off-by: Frank Plowman <post@frankplowman.com>
Check only on arches that need said check.
(Btw: I do not see how h_loop_filter benefits from alignment
at all and why h_loop_filter_unaligned exists.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The old code operated on bytes and did lots of tricks
due to their limited range; it did not completely succeed,
which is why the old versions were not used when bitexact
output was requested.
In contrast, the new version is much simpler: It operates
on signed 16 bit words whose range is more than sufficient.
This means that these functions don't need a check for bitexactness
(and can be used in FATE).
Old benchmarks (for this, the AV_CODEC_FLAG_BITEXACT check has been
removed from checkasm):
h_loop_filter_c: 29.8 ( 1.00x)
h_loop_filter_mmxext: 32.2 ( 0.93x)
h_loop_filter_unaligned_c: 29.9 ( 1.00x)
h_loop_filter_unaligned_mmxext: 31.4 ( 0.95x)
v_loop_filter_c: 39.3 ( 1.00x)
v_loop_filter_mmxext: 14.2 ( 2.78x)
v_loop_filter_unaligned_c: 38.9 ( 1.00x)
v_loop_filter_unaligned_mmxext: 14.3 ( 2.72x)
New benchmarks:
h_loop_filter_c: 29.2 ( 1.00x)
h_loop_filter_sse2: 28.6 ( 1.02x)
h_loop_filter_unaligned_c: 29.0 ( 1.00x)
h_loop_filter_unaligned_sse2: 26.9 ( 1.08x)
v_loop_filter_c: 38.3 ( 1.00x)
v_loop_filter_sse2: 11.0 ( 3.47x)
v_loop_filter_unaligned_c: 35.5 ( 1.00x)
v_loop_filter_unaligned_sse2: 11.2 ( 3.18x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Add "license" as a long-form command line option alongside the existing
"L" short option for showing license information. This maintains
consistent option naming patterns with other commands that provide both
short and long forms (help/?/help, etc.) and improves command line
usability by providing more descriptive option names.
This SSSE3 function uses MMX registers (of course without emms
at the end) and processes eight bytes of input by unpacking
it into two MMX registers. This is very suboptimal given
that one can just use XMM registers to process eight words.
This commit switches them to using XMM registers.
Old benchmarks:
avg_pixels_tab[1][3]_c: 114.5 ( 1.00x)
avg_pixels_tab[1][3]_ssse3: 43.6 ( 2.62x)
put_pixels_tab[1][3]_c: 83.6 ( 1.00x)
put_pixels_tab[1][3]_ssse3: 34.0 ( 2.46x)
New benchmarks:
avg_pixels_tab[1][3]_c: 115.3 ( 1.00x)
avg_pixels_tab[1][3]_ssse3: 24.6 ( 4.69x)
put_pixels_tab[1][3]_c: 83.8 ( 1.00x)
put_pixels_tab[1][3]_ssse3: 19.7 ( 4.24x)
Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Given that one has to deal with 16 byte intermediates it is
unsurprising that SSE2 wins against MMX; the MMX version has
therefore been removed (as well as the now unused inline_asm.h).
The new function is even 32B smaller than the old MMX one.
Old benchmarks:
put_no_rnd_pixels_tab[1][3]_c: 84.1 ( 1.00x)
put_no_rnd_pixels_tab[1][3]_mmx: 41.1 ( 2.05x)
New benchmarks:
put_no_rnd_pixels_tab[1][3]_c: 84.0 ( 1.00x)
put_no_rnd_pixels_tab[1][3]_ssse3: 22.1 ( 3.80x)
Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also remove the now superseded MMX versions (the new functions have the
exact same codesize as the removed ones).
Old benchmarks:
avg_no_rnd_pixels_tab[0][3]_c: 233.7 ( 1.00x)
avg_no_rnd_pixels_tab[0][3]_mmx: 121.5 ( 1.92x)
put_no_rnd_pixels_tab[0][3]_c: 171.4 ( 1.00x)
put_no_rnd_pixels_tab[0][3]_mmx: 82.6 ( 2.08x)
New benchmarks:
avg_no_rnd_pixels_tab[0][3]_c: 233.3 ( 1.00x)
avg_no_rnd_pixels_tab[0][3]_sse2: 45.0 ( 5.18x)
put_no_rnd_pixels_tab[0][3]_c: 172.1 ( 1.00x)
put_no_rnd_pixels_tab[0][3]_sse2: 40.9 ( 4.21x)
Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Hint: The parts of this patch in decode_block_progressive()
and decode_block_refinement() rely on the fact that GET_VLC
returns -1 on error, so that it enters the codepaths for
actually coded block coefficients.
Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The v_lowpass wrappers (which are instantiated by this macro)
are only used in the put (and not the avg) form for SSSE3
(the avg form is only used for mc02, which doesn't exist
for SSSE3). Clang warns about the unused functions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When we parse a MakerNote, we first try to parse it as an IFD and if
that fails, we try to re-parse it as a binary blob. This is because
MakerNote is not well-documented in its nature.
However, if we fail to parse it the first time, we should not av_log
error messages about the parse failure, so instead we log these as
AV_LOG_DEBUG.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reported-by: Ramiro Polla <ramiro.polla@gmail.com>
This value is only useful when dtls handshake is NONBLOCK mode,
dtls handshake just need to call ffurl_handshake once since it
force block mode.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
See RFC 5245 Section 4.3
If an agent is a lite implementation, it MUST include an "a=ice-lite"
session-level attribute in its SDP. If an agent is a full
implementation, it MUST NOT include this attribute.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
The udp buffer size might be too small to easily
be full temporarily and return WSAEWOULDBLOCK.
The udp code will handle the windows error code
and convert it to AVERROR(EAGAIN).
This issue just can be reproduced on windows.
If sleep a interval and retry to send pkt when hit
EAGAIN, it will increase latency, and appropriate
interval is hard to define.
So this patch just remind user increase the buffer
size via -buffer_size to avoid this issue.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
Refer to RFC 9725 4.2,
"Once a session is set up, consent freshness as per
[RFC7675] SHALL be used to detect non-graceful
disconnection by full ICE implementations and DTLS
teardown for session termination by either side"
Refer to RFC 7675,
send STUN Binding request every 5s,
expire consent after 30s without response,
terminate session when the consent expired.
TODO:
Random consent check interval (4-6s) will be
added later.
Co-authored-by: Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
Signed-off-by: Jack Lau <jacklau1222@qq.com>
The various game engines implement the following blit types, from the decoded
result to the main canvas:
- normal (opaque) blit (c37/c47/c48)
- masked blit (c37/c48)
- interpolated-frame blit (c48)
Here an artificial frame is generated by looking up the pixels
from both buffers and picking a color from the interpolation table
for the artificial frame.
This is only supported in the decoder of "Making Magic".
Implement and hook up these 3 schemes for each of the 3 compresstion types,
and switch codec20 to a call to the opaque blit function.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Making Magic makes use of codec48 flag bit 0, which, when set,
means NOT to swap both buffers on even sequence numbers.
This fixes most of the artifacts in the Making Magic videos.
It's not complete though, bits 1 and 4 still need to be handled.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
- align the incoming widths to 4(c37) / 8(c47/78) pixels. LucasArts
game videos have these aligned.
- since these codecs use their 2/3 buffers for themselves, adjust the
stride to the aligned width, keeping it even, which gets rid of
an unaligned store in c48_4to8() found by the fuzzer with an
odd stride.
- clear the whole diff buffer, not just the area described by w/h.
- adjust the RLE "decoded_size" to the product of the aligned width
and reported height.
These changes are the result of various fuzzer-found issues; all my
test videos still work fine.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Add left/top offsets and clipping to codec20 (raw images),
use it for the copying of codec37/47/48 images to main buffer.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Codec37/47/48 have their own buffers; left/top are applied after
the decoding is done when copying to the main buffer. Don't add left/top
to their width/height when doing checks against the established buffer sizes.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
This implements XPAL the same way the DOS/Windows players do, with an
additional 768-entry table holding the palette left-shifted by 7 bits,
and adding the deltapal values to this.
This results in a perfectly smooth day-to-night transition in the last
30 seconds of the Outlaws RAE.SAN (ending) video, while before there
were visible brightness "pulses" when a new palette was loaded.
It also fixes color banding in the The Dig Intro (sq1.san), in the
scene showing the shuttle launch pad and the night sky.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
the c37 mvtable has only 255 pairs, change index 255 to zero to
avoid reading outside the table boundaries.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
When checking for oversized frames, check not only for the width
and height being larger, but also the area not outgrowing the
allocated buffer.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
When decode_init() is called for ANIM content, zero the dimensions
set in avctx width/height. Only SANM files have image dimensions in
their header, while ANIM do not.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Reimplement opcodes 0xFF and 0xFD the same way the c48 decoder
in the "Mysteries of the Sith" game engine does it:
The source pixel(s) and various pixels from inside the same and above
block of the second to last image rendered to the destination buffer
are used together with the interpolation table to generate a 4x4 pattern,
which is then expanded by doubling each pixel horizontally and vertically
to produce the final 8x8 block.
This fixes visible artifacts in frames 25-50 of the S1L1OCS.SAN
video of Mysteries of the Sith.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
It was initially implemented as 4 4x1 blocks, reimplement it as 4 2x2 blocks.
Fixes a few The Dig videos, esp. black dots on the asteroid in the
intro scene.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Codec48 opcodes F9 and FC take per-subblock indices into the motion vector
table from the source stream, however the table has only 255 entries.
Luckily, index 255 is index 0 of the following table, which means no
motion vector, the same as index 0 of the current table.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Create a separate GetByteContext from the general one, to be able
to limit the size of the FOBJ to the size described in the tag size.
Otherwise each fobj could theoretically use all the remaining data
in the FRME (which also contains audio, subtitles, ...).
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
some videos have a FTCH at the start of the video, to restore the
last image produced by the previous game file. This leads to
ugly messages like these:
[sanm @ 0x7f18cc001980] [IMGUTILS @ 0x7f18d7ffe8e0] Picture size 0x0 is invalid
[sanm @ 0x7f18cc001980] video_get_buffer: image parameters invalid
[sanm @ 0x7f18cc001980] get_buffer() failed
Fix this by not setting the got_frame_ptr when there is nothing to
restore/fetch. Seen with a lot of RA1 and the RA2 Level 11/12 videos.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Rename the generic motion_vectors to c47_mv, as this vector table
was initially introduced with codec47 which predates bl16 by 1-2 years,
and bl16 is a development of codec47 (with a bit of c48 thrown in).
No functional change.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Both of these encode a quarter-sized keyframe, with missing pixels
interpolated from the immediate neighbours.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Restructe the SANM (or BL16 as LucasArts calls it) decoder to make it
look like the others, as it is basically a development of old_codec47
for rgb565 values.
No functional changes.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
codec47 carries a 4-byte small codebook in its header. Read those
4 bytes into context member instead of awkwardly redirecting the
bytestream pointer every time it needs to be accessed.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
The mv check introduced with d5bdb0b705 broke MotS videos:
- their height (300 lines) is 37,5 blocks; unfortunately the videos try to
access up to 1 block more.
Extend the mv check to the aligned_height, which fixes most artifacts.
- don't return an error when an mv is invalid; rather skip the (subblock).
Gets rid of almost all artifacts.
Some artifacts still remain, esp in space scenes where the original
encoder apparently fetched black pixels from outside of the aligned
height. An increase of the buffer size by 8 lines will fix that later.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
- don't draw outside the buffers
- don't wrap around when coordinates go over the edge
this is especially noticeable in the e.g. O1OPEN.ANM, C1C3PO.ANM
RA1 files with planets wrapping around.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
- don't draw outside the buffers
- don't wrap around when coordinates go over the edge
this is especially noticeable in the e.g. O1OPEN.ANM, C1C3PO.ANM
RA1 files with planets wrapping around.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Use __MINGW_{PRINTF,SCANF}_FORMAT which matches the format check for
implementation that is actually used.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This commit deduplicates the wrappers around the fpel functions
for copying whole blocks (i.e. height equaling width). It does
this in a manner which avoids having push/pop function arguments
when the calling convention forces one to pass them on the stack
(as in 32bit systems).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is not based on the MMXEXT one, because the latter is quite
suboptimal: Motion vector types mc01 and mc03 (vertical motion vectors
with remainder of one quarter or three quarter) use different neighboring
lines for interpolation: mc01 uses two lines above and two lines below,
mc03 one line above and three lines below. The MMXEXT code uses
a common macro for all of them and therefore reads six lines
before it processes them (even reading lines which are not used
at all), leading to severe register pressure.
Another difference to the old code is that the positive and negative
parts of the sum to calculate are accumulated separately and
the subtraction is performed with unsigned saturation, so
that one can avoid biasing the sum.
The fact that the mc01 and mc03 filter coefficients are mirrors
of each other has been exploited to reduce mc01 to mc03.
But of course the most important different difference between
this code and the MMXEXT one is that XMM registers allow to
process eight words at a time, ideal for 8x8 subblocks,
whereas the MMXEXT code processes them in 4x8 or 4x16 blocks.
Benchmarks:
avg_cavs_qpel_pixels_tab[0][4]_c: 917.0 ( 1.00x)
avg_cavs_qpel_pixels_tab[0][4]_mmxext: 222.0 ( 4.13x)
avg_cavs_qpel_pixels_tab[0][4]_sse2: 89.0 (10.31x)
avg_cavs_qpel_pixels_tab[0][12]_c: 885.7 ( 1.00x)
avg_cavs_qpel_pixels_tab[0][12]_mmxext: 223.2 ( 3.97x)
avg_cavs_qpel_pixels_tab[0][12]_sse2: 88.5 (10.01x)
avg_cavs_qpel_pixels_tab[1][4]_c: 222.4 ( 1.00x)
avg_cavs_qpel_pixels_tab[1][4]_mmxext: 57.2 ( 3.89x)
avg_cavs_qpel_pixels_tab[1][4]_sse2: 23.3 ( 9.55x)
avg_cavs_qpel_pixels_tab[1][12]_c: 216.0 ( 1.00x)
avg_cavs_qpel_pixels_tab[1][12]_mmxext: 57.4 ( 3.76x)
avg_cavs_qpel_pixels_tab[1][12]_sse2: 22.6 ( 9.56x)
put_cavs_qpel_pixels_tab[0][4]_c: 750.9 ( 1.00x)
put_cavs_qpel_pixels_tab[0][4]_mmxext: 210.4 ( 3.57x)
put_cavs_qpel_pixels_tab[0][4]_sse2: 84.2 ( 8.92x)
put_cavs_qpel_pixels_tab[0][12]_c: 731.6 ( 1.00x)
put_cavs_qpel_pixels_tab[0][12]_mmxext: 210.7 ( 3.47x)
put_cavs_qpel_pixels_tab[0][12]_sse2: 84.1 ( 8.70x)
put_cavs_qpel_pixels_tab[1][4]_c: 191.7 ( 1.00x)
put_cavs_qpel_pixels_tab[1][4]_mmxext: 53.8 ( 3.56x)
put_cavs_qpel_pixels_tab[1][4]_sse2: 24.5 ( 7.83x)
put_cavs_qpel_pixels_tab[1][12]_c: 179.1 ( 1.00x)
put_cavs_qpel_pixels_tab[1][12]_mmxext: 53.9 ( 3.32x)
put_cavs_qpel_pixels_tab[1][12]_sse2: 24.0 ( 7.47x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Basically a direct port of the MMXEXT one. The main difference
is of course that one can process eight pixels (unpacked to words)
at a time, leading to speedups.
avg_cavs_qpel_pixels_tab[0][2]_c: 700.1 ( 1.00x)
avg_cavs_qpel_pixels_tab[0][2]_mmxext: 158.1 ( 4.43x)
avg_cavs_qpel_pixels_tab[0][2]_sse2: 86.0 ( 8.14x)
avg_cavs_qpel_pixels_tab[1][2]_c: 171.9 ( 1.00x)
avg_cavs_qpel_pixels_tab[1][2]_mmxext: 39.4 ( 4.36x)
avg_cavs_qpel_pixels_tab[1][2]_sse2: 21.7 ( 7.92x)
put_cavs_qpel_pixels_tab[0][2]_c: 525.7 ( 1.00x)
put_cavs_qpel_pixels_tab[0][2]_mmxext: 148.5 ( 3.54x)
put_cavs_qpel_pixels_tab[0][2]_sse2: 75.2 ( 6.99x)
put_cavs_qpel_pixels_tab[1][2]_c: 129.5 ( 1.00x)
put_cavs_qpel_pixels_tab[1][2]_mmxext: 36.7 ( 3.53x)
put_cavs_qpel_pixels_tab[1][2]_sse2: 19.0 ( 6.81x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The prediction involves terms of the form
(-1 * s0 - 2 * s1 + 96 * s2 + 42 * s3 - 7 * s4 + 64) >> 7,
where the s values are in the range of 0..255.
The sum can have values in the range -2550..35190, which
does not fit into a signed 16bit integer. The code uses
an arithmetic right shift, which does not yield the correct
result for values >= 2^15; such values should be clipped
to 255, yet are clipped to 0 instead.
Fix this by shifting the values by 4096, so that the range
is positive, use a logical right shift and subtract 32.
bunny.mp4 from the FATE suite can be used to reproduce the problem.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The implementation hardcodes access to 3 channels, so we need to check that
Fixes: out of array access
Fixes: BIGSLEEP-445394503-crash.exr
Found-by: Google Big Sleep
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Without rounding them up there are too few dc coeffs for the blocks.
We do not know if this way of handling odd dimensions is correct, as we have
no such DWA sample.
thus we ask the user for a sample if she encounters such a file
Fixes: out of array access
Fixes: BIGSLEEP-445392027-crash.exr
Found-by: Google Big Sleep
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array read
Fixes: dwa_uncompress.py.crash.exr
The code will read from the ac data even if ac_size is 0, thus that case
is not implemented and we ask for a sample and error out cleanly
Found-by: Google Big Sleep
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1. Remove the OP parameter from the QPEL_H264* macros. These are
a remnant of inline assembly and were forgotten in
610e00b359.
2. Pass the instruction set extension for the shift5 function
explicitly in the macro instead of using magic #defines.
3. Likewise, avoid magic #defines for (8|16)_v_lowpass_ssse3.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Every caller calls it three times in a loop, with slightly
modified arguments. So it makes sense to move the loop
into the callee.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Blocksize 2 is Snow-only, so move all the code pertaining
to it to snow.c. Also make the put array in H264QpelContext
smaller -- it only needs three sets of 16 function pointers.
This continues 6eb8bc4217
and b0c91c2fba.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
None of the other registers need to be preserved at this time,
so six XMM registers are always enough. Forgotten in
fa9ea5113b.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The muxed subtitle is created by libavformat, and as such that's what should be
reported. This is in line with the string we write for every other muxer.
After this change, the muxer will no longer be recompiled every time a commit
is made.
Signed-off-by: James Almer <jamrial@gmail.com>
If the information is coded at the container level, then that's what should be
exported. The user will still have access to values coded at the bitstream
level by firing a decoder.
Fixes issue #20121
Signed-off-by: James Almer <jamrial@gmail.com>
Some of the blend mode functions only depend on the underlying type
and therefore need only one version for 9, 10, 12, 14, 16 bits.
This saved 35104B with GCC and 26880B with Clang.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The blend functions currently convert strides from bytes to elements
of the type by using the stride /= sizeof(pixel) idiom. Yet this has
several drawbacks:
1. It invokes undefined behavior that happens to work when stride is
negative: size_t is typically the unsigned type of ptrdiff_t and
therefore the division will be performed as size_t, i.e. use logical
right shifts, making stride very big when sizeof(pixel) is > 1. This
works, because pointer to pixel for accesses entails an implicit
factor of sizeof(pixel) so that everything is correct modulo SIZE_MAX.
Yet this is UB and UBSan complains about it.
2. It makes the compiler emit actual shifts/ands to discard the low bits
shifted away.
3. There may be systems where alignof(uint16_t) or alignof(float) is
strictly smaller than their sizeof, so that the stride (in bytes) is
not guaranteed to be multiple of these sizeofs. In this case, dividing
by sizeof(pixel) is simply wrong.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Avoids having to sign-extend the strides in the assembly
(it also is more correct given that the qpel_mc_func
already uses ptrdiff_t).
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The horizontal 10bit MC SSE2 functions are currently duplicated:
They exist both in ordinary form as well as with a "sse2_cache64"
suffix. A comment in ff_h264qpel_init_x86() indicates that this
is due to older processors not liking accesses that cross cache
lines, yet these functions are identical to the non-cache64
functions (apart from the unavoidable changes in the rip-offset).
The only difference between these functions and the ordinary ones
are that the cache64 ones are created via a special form of the
INIT_XMM macro: "INIT_XMM sse2, cache64". This affects the name
and apparently defines cpuflags_cache64, yet nothing checks for
this, so both versions are identical. So remove the cache64 ones
and treat the remaining ones like ordinary SSE2 functions.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_{avg,put}_h264_qpel8or16_hv2_lowpass_ssse3()
currently is almost the disjoint union of the codepaths
for sizes 8 and 16. This size is a compile-time constant
at every callsite. So split the function and avoid
the runtime branch.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
tmpstride is unused. This also allows to remove said parameter
from lots of functions in h264_qpel.c.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are constant since the size 16 version is no longer emulated
via the size 8 version.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is only used by h264_qpel.c and only with height four
(which is unrolled) and uses a loop in order to handle
multiples of four as height. Remove the loop and the height
parameter and move the function to h264_qpel_8bit.asm.
This leads to a bit of code duplication, but this is simpler
than all the %if checks necessary to achieve the same outcome
in fpel.asm.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avg_h264_qpel only supports 16x16,8x8 and 4x4 blocksizes,
so it is currently unnecessarily large.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The 2x2 put functions are only used by Snow and Snow uses
only the eight bit versions. The rest is dead code. Disabling
it saved 41277B here.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It only needs it for some x86 fpel functions; instead
add a direct dependency for that.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The former is unnecessary since
3ceffe7839. The latter is since
ff_mpeg4_workaround_bugs() (and thereby setting the "old" qpeldsp
functions) has been moved inside #if CONFIG_MPEG4_DECODER.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This can be easily achieved by moving code only used by the MPEG-4
decoder behind #if CONFIG_MPEG4_DECODER.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_avg_pixels{4,8,16}_l2_mmxext() are always called with height
equal to their blocksize. And ff_{put,avg}_pixels4_l2_mmxext()
are furthermore always called with both strides being equal.
So remove these redundant function parameters.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is more correct given that qpel_mc_func already uses ptrdiff_t;
it also allows to avoid movsxdifnidn.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The ff_avg_pixels{4,8,16}_l2_mmxext() functions are only ever
used in the last step (the one that actually writes to the dst buffer)
where the number of lines to process is always equal to the
dimensions of the block, whereas ff_put_pixels{8,16}_mmxext()
are also used in intermediate calculations where the number of
lines can be 9 or 17.
The code in qpel.asm uses common macros for both and processes
more than one line per loop iteration; it therefore checks
for whether the number of lines is odd and treats this line separately;
yet this special handling is only needed for the put functions,
not the avg functions. It has therefore been %if'ed away for these.
The check is also not needed for ff_put_pixels4_l2_mmxext() which
is only used by H.264 which always processes four lines. Because
ff_{avg,put}_pixels4_l2_mmxext() processes four lines in a single loop
iteration, not only the odd-height handling, but the whole loop
could be removed.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
liblc3 supports arbitrary strides, so one can simply use a stride
of zero to make it read the same zero value again and again.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The code optimizes throughput by letting the encoder work on frame N
until frame N+1 is ready for submission, but this hurts low-delay uses
by delaying output by one frame. Don't delay output beyond what is
necessary when AV_CODEC_FLAG_LOW_DELAY is used.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Memory leaks can happen on normal case when break from while loop
early, and it can happen on error path with goto cleanup.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Note that this changes the code to work the same way as other protocols where
an URL parameter can override an AVOption.
Signed-off-by: Marton Balint <cus@passwd.hu>
This allows choosing whether the `fit_mode` merely controls the placement
of the image within the output resolution, or whether the output resolution
is also adjusted according to the given `fit_mode`.
The semantics of these keywords are well-defined by the CSS 'object-fit'
property. This is arguably more user-friendly and less obtuse than the
existing `normalize_sar` and `pad_crop_ratio` options. Additionally, this
comes with two new (useful) behaviors, `none` and `scale_down`, neither of
which map elegantly to the existing options.
One additional benefit of this option is that, unlike `normalize_sar`, it
does *not* also imply `reset_sar`; meaning that users can now choose to
have an anamorphic base layer and still have the overlay images scaled to fit
on top of it according to the chosen strategy.
See-Also: https://drafts.csswg.org/css-images/#the-object-fit
This allows upstream filters to observe EOF on their corresponding outputs
and terminate early, which is particularly useful for decoders and demuxers
that may then gracefully release their resources.
Prevents "leaking" memory for previously used, but now unused, filter inputs.
THis is currently done by sch_demux_send() (via demux_stream_send_to_dst()),
sch_enc_send() (via enc_send_to_dst()), and sch_dec_send() (via
dec_send_to_dst()), but not by sch_filter_send().
Implement the same queue-closing logic for them. The main benefit here is that
this will allow them to mark downstream inputs as send-done (in addition
to received-done), which is useful for a following commit.
Cut off a choked demuxer's output codec/filter queues, effectively preventing
them from processing packets while the demuxer is choked. Avoids downstream
nodes from piling up extra input that a demuxer shouldn't currently be
sending.
The main benefit of this is to avoid queuing up excess packets that don't want
to be decoded yet, reducing memory consumption for idle inputs by preventing
them from being read earlier than needed.
Currently, when a demuxer thread is choked, it will avoid queuing more
packets, but any packets already present on the thread queue will still be
processed.
This can be quite wasteful if the choke is due to e.g. decoder not being
needed yet, such as in a filter graph involving concatenation-style filters.
Adding the ability to propagate the choke status to the thread queue directly
allows downstream decoders and filter graphs to avoid unnecessary work and
buffering.
Reduces the effective latency between scheduler updates and changes in the
thread workfload.
Currently, while the filter graph is being initially created, the scheduler
continues demuxing frames on the last input that happened to be active before
the filter graph was complete.
This can lead to an excess number of decoded frames "piling" up on this input,
regardless of whether or not it will actually be requested by the configured
filter graph. Suspending the filter graph during this initialization phase
reduces the amount of wasted memory.
This field is just saving (typically) a single pointer indirection; and IMO
makes the logic and graph relations unnecessarily complicated. I am also
considering adding choking logic to decoders and encoders as well, which this
field would get in the way of.
Apart from the unchoking logic in unchoke_for_input(), the only other place
that uses this field is the (cold) function check_acyclic(), which can be
served just as well with a simple function to do the graph traversal there.
I tested this extensively under different conditions and could not come up
with any scenario where using a larger queue size was actually beneficial.
Moreover, having such a large default queue is very wasteful especially
for larger frame sizes; and can in the worst case lead to an extra ~50% memory
footprint per input (with the default 16 threads), regardless of whether that
input is currently in use or not.
My methodology was to add logging in the event of a queue underrun/overrun,
and then observe and then observe the frequency of such events in practice,
as well as the impact on performance. I came up with an example filter graph
involving decoding, filtering and encoding with several input files and
various changes to move the bottleneck around.
I found that, in all configurations I tested, with all thread counts and
bottlenecks, using a queue size of 2 frames yielded practically identical
performance to a queue size of 8 frames. I was only able to consistently
measure a slowdown when restricting the queue to a single frame, where the
underruns ended up making up almost 1.1% of frame events in the worst case.
A summary of my test log follows:
= Bottleneck in decoder =
ffmpeg -i A -i B -i C -filter_complex "concat=n=3" -f null -
== 16 threads ==
=== Queue statistics (dec -> filtergraph) ===
- 8 frames = 91355 underruns, 1 overrun
- 4 frames = 91381 underruns, 2 overruns
- 2 frames = 91326 underruns, 21 overruns
- 1 frame = 91284 underruns, 102 overruns
=== Time elapsed ===
- 8 frames = 14.37s
- 4 frames = 14.28s
- 2 frames = 14.27s
- 1 frame = 14.35s
== 1 thread ==
=== Queue statistics (dec -> filtergraph) ===
- 8 frames = 91801 underruns, 0 overruns
- 4 frames = 91929 underruns, 1 overrun
- 2 frames = 91854 underruns, 7 overruns
- 1 frame = 91745 underrons, 83 overruns
=== Time elapsed ===
- 8 frames = 39.51s
- 4 frames = 39.94s
- 2 frames = 39.91s
- 1 frame = 41.69s
= Bottleneck in filter graph: =
ffmpeg -i A -i B -i C -filter_complex "concat=n=3,scale=3840x2160" -f null -
== 16 threads ==
=== Queue statistics (dec -> filtergraph) ===
- 8 frames = 277 underruns, 84673 overruns
- 4 frames = 640 underruns, 86523 overruns
- 2 frames = 850 underruns, 88751 overruns
- 1 frame = 1028 underruns, 89957 overruns
=== Time elapsed ===
- 8 frames = 26.35s
- 4 frames = 26.31s
- 2 frames = 26.38s
- 1 frame = 26.55s
== 1 thread ==
=== Queue statistics (dec -> filtergraph) ===
- 8 frames = 29746 underruns, 57033 overruns
- 4 frames = 29940 underruns, 58948 overruns
- 2 frames = 30160 underruns, 60185 overruns
- 1 frame = 30259 underruns, 61126 overruns
=== Time elapsed ===
- 8 frames = 52.08s
- 4 frames = 52.49s
- 2 frames = 52.25s
- 1 frame = 52.69s
= Bottleneck in encoder: =
ffmpeg -i A -i B -i C -filter_complex "concat=n=3" -c:v libx264 -preset veryfast -f null -
== 1 thread ==
== Queue statistics (filtergraph -> enc) ==
- 8 frames = 26763 underruns, 63535 overruns
- 4 frames = 26863 underruns, 63810 overruns
- 2 frames = 27243 underruns, 63839 overruns
- 1 frame = 27670 underruns, 63953 overruns
== Time elapsed ==
- 8 frames = 89.45s
- 4 frames = 89.04s
- 2 frames = 89.24s
- 1 frame = 90.26s
The code in the decoder just cares about allocating enough extra hw frames
to cover the size of the queue; but there's no reason we actually *have* to
use this many. We can safely relax the assertion to a <= check.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
I have redacted the exact location of the FFmpeg server as writing
that in public seems just a bad idea
Refer to RFC 4588.
Add and set the basic param of RTX like
ssrc, payload_type, srtp.
Modify the SDP to add RTX info so that
the peer be able to parse the RTX packet.
There are more pateches to make RTX really
work.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
for live RTP streams. Some external applications, such as Qt Multimedia,
depend on this flag being set correctly.
Signed-off-by: Kaarle Ritvanen <kaarle.ritvanen@datakunkku.fi>
This is required placement by standard [[maybe_unused]] attribute, works
the same for __attribute__((unused)).
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This affects the {avg,put}_no_rnd_pixels16_{x,y}2 MMX and
(put-only) MMXEXT versions. Removing these functions saved
1184B here.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These currently only exist as MMX versions.
The added functions occupy 320B here. So far, they are only for
the x2 and y2 (i.e. right and down, not down-right) directions.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These currently only exist as MMX and (not bitexact) MMXEXT versions.
The added functions occupy 288B here. So far, they are only for
the x2 and y2 (i.e. right and down, not down-right) directions.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only the blocksizes 16 and 8 are implemented, yet the motion estimation
code touches the blocksize 4 entries. But really nothing touches
the blocksize 2 entries, so that we can reduce the put_no_rnd_pixels_tab
array size to [3][4].
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX(EXT) functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 version
of filter_line.
This commit therefore removes the overridden MMXEXT version
(which didn't abide by the ABI) which allows us to remove
an emms_c() from vf_gradfun.c, so that users with SSSE3 no longer
pay a price for the mere existence of an MMXEXT version.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX and 3DNOW functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Also merge the mpegvideoenc_qns_template.c file into the main file.
The 3DNOW functions removed in this commit were the last in the
codebase.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX(EXT) functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX(EXT) functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The mc00 versions (i.e. the qdsp functions with no subpixel
interpolation) are just wrappers around their fpel versions.
There are SSE2 versions of these, yet the qpel code only
uses the MMX(EXT) versions. This commit changes this and
also removes the MMX(EXT) versions.
This also allowed to remove ff_avg_pixels16_mmxext,
ff_put_pixels16_mmx.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
CPUs which support SSE2, but not in a fast way (so that
they get the additional AV_CPU_FLAG_SSE2SLOW) are ancient
nowadays (2007 and older), so ignore the distinction between
the two and remove MMX and MMXEXT functions that are now
overridden by SSE2 functions.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
CPUs which support SSE2, but not in a fast way (so that
they get the additional AV_CPU_FLAG_SSE2SLOW) are ancient
nowadays (2007 and older), so ignore the distinction between
the two and remove MMX and MMXEXT functions that are now
overridden by SSE2 functions.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The value of sizeof() is of type size_t which means that
an expression like
src1[i * src_stride1 + 4 * (int)sizeof(pixel)]
will use a very large offset if src_stride1 is sufficiently negative.
It works in practice (because it is correct modulo SIZE_MAX),
but UBSan treats it as error:
libavcodec/hpel_template.c:104:1: runtime error: addition of unsigned offset to 0x7ffdfa0391d8 overflowed to 0x7ffdfa0391cc
Fix this by casting sizeof(pixel) to int.
(This has been uncovered by a checkasm test for the hpeldsp
which will be added in a later commit.)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This commit fixes two issues in the documentation:
a) The documentation for {put,avg}_pixels_tab only mentions
widths 16 and 8, although it explicitly mentions that there
are four horizontal blocksizes. This part of the patch
basically reverts e5771f4f37.
b) The restrictions on height don't match the reality. While
most users abide by it, some do not:
i) vp56.c copies a 16x12 block.
ii) indeo3 can copy an arbitrary multiple of four lines
for block widths 4, 8 and 16.
iii) SVQ3 can use block sizes luma block sizes 16x16, 8x16,
16x8, 8x8, 4x8, 8x4 and 4x4 and the corresponding
8x8, 4x8, 8x4, 4x4, 2x4, 4x2 and 2x2 chroma block sizes.
This implies that for widths 2 and 4 height can be two
and is guaranteed to be at least even. For all other widths,
height can be a multiple of four.
Furthermore, a comment for the SVQ3 blocksizes has been added.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It indicates a constant frame size (in samples); yet the documentation
of the MPEGH_DECODER_OUTPUT_INFO structure says that it
"gives information about the currently decoded audio data"
and makes no guarantees about it being constant.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The pointer to the decoder is always set after init has been
called successfully.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The channel count is fixed, so we don't need to allocate
space to hold samples for nonexisting channels. This also
means that the maximum channel count is no longer hardcoded
in the macro.
Also switch to decoder_buffer_size to be samples-based.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is mostly pointless for decoders (only useful for those
codecs for which a floating-point and a fixed-point decoder
exist).
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In case some file has been updated. Generally fate samples shouldn't be
replaced to preserve compatibility with older revisions, but before
merge it may happen that files is replaced.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Suppresses implicit integer conversion narrowing warnings:
warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
Those implicit conversions are abundant in ffmpeg's code base.
Additionally equivalent warnings are not enabled for GCC/Clang by
default, so they are mostly left unfixed.
Suppress reports about them to reduce noise in MSVC build log.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This commit causes the libjxl decoder wrapper to hold onto the decoded
frame until the trailing metadata after the frame is possibly complete
before it submits the frame. This allows EXIF and other metadata boxes
that occur after the frame is fully rendered to be attached to it as
side data.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Normally, the OH_AVCodec is wrapped inside an AVBuffer
to be freed in its free callback; yet when creating
the AVBuffer fails, the decoder is never destroyed.
Reviewed-by: Zhao Zhili <quinkblack@foxmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When using a literal IPv6 address as hostname, it can contain a Zone ID
especially in the case of link-local addresses. Sending this to the
server in the Host header is not useful to the server and in some cases
servers refuse such requests.
To prevent any such issues, strip the Zone ID from the address if it's
an IPv6 address. This also removes it for the Cookies lookup.
Based on a patch by: Daniel N Pettersson <danielnp@axis.com>
The const here is unnecessary, because everything inside
SECTION_RODATA is automatically const and using it exports
these objects from the object file because const is a macro
in x86inc.asm.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is deprecated and doing so gives warnings from Clang.
Use CODEC_SAMPLEFMTS instead.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When parsing the RTSP message reason, the whole remainder
after parsing the status code was used, which would lead to
a leading space in the parsed reason string.
Blending onto independent alpha framebuffers is not possible under the
constraints of the supported blend operators. While we could handle
blending premul-onto-premul, this would break if the base layer is YUV,
since premultiplied alpha does not survive the (nonlinear) YUV conversion.
Fortunately, blending independent-onto-premul is just as easy, and works in
all cases. So just force this mode when using a linear intermediate blend
texture, which is always RGBA.
Instead of directly mutating `opts->params`. Avoids any possible leak of
overriden params between invocations of this function, as well as the later
`pl_render_image` during the linear output pass.
This was a regression introduced in 292c1df7c1.
Since we don't know the length of the stts data until after building the
index, since we're generating it, we need to merge any ctts data after,
since otherwise tts_count is set to 0, and no packets will be output.
We can't remove the merge entirely, because uncompressed PCM with
a ctts atom is technically valid (e.g. a constant CTS offset).
This fixes old-style uncompressed PCM demuxing.
Fixes#11490.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This reverts commits fd55c4b5f7
(for tls_openssl.c) and c74181a04b
(for tls_gnutls.c).
It is impossible for a pointer to a member of a structure
to be NULL: If the containing structure exists, the member
exists and can't have a NULL address; if the containing
structure does not exist, then getting a pointer to the
substructure via &c->tls_shared would already be undefined
behavior.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Reviewed-by: Steven Liu <lingjiujianke@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
- FreBSD/OpenBSD have elf_aux_info() on arm
- Wrap AT_HWCAP as the value is different for BSD vs Linux (16 vs 25)
Signed-off-by: Brad Smith <brad@comstyle.com>
Right now, the private contexts of every decoder supporting
H.274 film grain synthesis (namely H.264, HEVC and VVC)
contain a H274FilmGrainDatabase; said structure is very large
700442B before this commit) and takes up the overwhelming
majority of said contexts: Removing it reduces sizeof(H264Context)
by 92.88%, sizeof(HEVCContext) by 97.78% and sizeof(VVCContext)
by 99.86%. This is especially important for H.264 and HEVC
when using frame-threading.
The content of said film grain database does not depend on
any input parameter; it is shareable between all its users and
could be hardcoded in the binary (but isn't, because it is so huge).
This commit adds a database with static storage duration to h274.c
and uses it instead of the elements in the private contexts above.
It is still lazily initialized as-needed; a mutex is used
for the necessary synchronization. An alternative would be to use
an AV_ONCE to initialize the whole database either in the decoders'
init function (which would be wasteful given that most videos
don't use film grain synthesis) or in ff_h274_apply_film_grain().
Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Add a separator line to show_filters() function output to maintain
consistent formatting with other show functions like show_codecs,
show_formats_devices, etc. This provides uniform formatting across
all command-line output functions for better readability and
parsing by external tools.
To avoid pulling in the entire libavfilter when using the DSP functions
from checkasm.
The rest of the struct is not needed outside vf_idet.c and was moved there.
Fixes: out of array read
Fixes: BIGSLEEP-445383648
Fixes: crash_input_1/2/3.dhav
Found-by: Google Big Sleep
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Due to the recent id3v2 refactor, FLAC was left out due to
earlier code not checking for id3v2 presence on FLAC.
Without the id3v2 data parsed, detection of FLAC and therefore
demuxing fails.
Fixes 9d037c54f2
There's very little performance difference vs SSE2/SSSE3 and most
systems will use the AVX2 implementations anyway.
This reduces code size and compilation time by a significant amount.
The length check was too strict and if the end of the string was a percent
encoded sequence it did not decode correctly.
Signed-off-by: Marton Balint <cus@passwd.hu>
Set GNUTLS_DATAGRAM flag when is_dtls is true.
Set mtu when it's specified.
Modify the read/write function could use udp socket.
There are more patches to make dtls really work.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
using fast realloc leaves the entries uninitialized and frees garbage pointers on
errors
Fixes: bug_triggering_file
Found-by: *2ourc3, 5pider
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Depending on the threading backend the stdlib uses, creating a
mutex/condvar can be quite expensive.
So keep this object alive in the ctx, on which we synchronize via the
uninit mutex anyway.
Apparently, using a normal frame pool in a multithreaded environment
leads to strange resource leaks on shutdown, which vanish when using a
free threaded pool.
This allows compilers to optimize accesses like
ff_vvc_diag_scan_x[2][2][x] by baking the offset derived
from [2][2] into the relocation (so that it is performed
at link-time).
Reviewed-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This was added in 4f78711f9c, with the
commit message claiming that it's a Microsoft restriction that array
textures with ArraySize > 2 cannot be created with
D3D11_BIND_RENDER_TARGET.
I was unable to find any documentation or other references on that, and
a quick test also found it to not be the case. So this patch removes
that restriction.
This enables frame sources, like the d3d11 capture filters, to output
frames in an array texture, which is neccesary to pass those frames as
input to some hardware encoders.
AVBR does not use VAEncMiscParameterTypeHRD, so attempting to set
rc_buffer_size and bit_rate together will cause the rc_buffer_size
to be ignored if the VAAPI driver supports AVBR. We should prefer
regular VBR for that case.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Ensures it honors the THREADS variable if set to anything other than 1, which will be
useful to detect bugs.
Signed-off-by: James Almer <jamrial@gmail.com>
Regression since 5acbdd2264, which removed
setting both values from PerThreadContext.
Given the pthread code calls ff_decode_receive_frame_internal() on the frame,
any value set before it will be overwritten, so instead sync each thread's
DecodeContext and let ff_decode_receive_frame_internal() handle these values.
Fixes issue #20534.
Signed-off-by: James Almer <jamrial@gmail.com>
Attempts to base the fragmentation timing on other streams
as most receivers expect media fragments to be more or less
aligned.
Currently does not support fragmentation on subtitle track
only, as the subtitle packet queue timings would have to be
checked in addition to the current fragmentation timing logic.
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
This allows for direct dumping of the packets' contents (useful for
text based formats), while getting the timestamps/sizes etc from
ffprobe.
If used via TRANSCODE, the actually utilized muxer should be added
within the last argument as an additional dependency, as that is not
done automatically.
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
AVFormatContext.event_flags is checked against AVSTREAM_EVENT_FLAG,
which belongs to AVStream.event_flags. There is no real issue since
these two flags have the same value.
Saves 16KiB from SwrContext.
(FATE would also pass if one made the double matrix part of
the union, too, but I don't know whether this is truely correct,
because swri_rematrix() accesses the double matrix (to check whether
it is equal to 1.0) even when one of the other matrices is in use.)
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only the mxpeg decoder uses it (and the reference/bitmask feature
of ff_mjpeg_decode_sos()). So only initialize it for mxpeg which
allows to remove the mjpeg->hpeldsp dependency.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
mjpeg_decode_dc() currently has a special return value (0xfffff)
for the case of invalid data; this value does not overlap with
the ordinary return values, yet the compiler can't prove this
(at least on x86 where an asm version of NEG_USR32 is used
inside get_xbits()), so the non-error return value has to be checked
for being the special value even if mjpeg_decode_dc() is inlined.
This commit avoids this by separating error code and ordinary return
value. It also means that the ljpeg functions return the proper
error code and not -1 in case of invalid data.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, a DC error in decode_block() or decode_dc_progressive()
would lead to a warning from mjpeg_decode_dc() and a (less verbose)
error from the caller. Upgrade the former to an error status (all
callers treat is an error) and remove the latter.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no reason to add the address of an element of
MJpegDecodeContext (which makes logmessage nonreproducible).
Also avoid always printing a zero.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The behavior of the ffio_*_checksum feature is not well defined
when using avio_skip(). The code in oggdec.c relied on the skipped
data (four bytes) to be checksummed, which is mostly true
because short_seek_threshold is 32768 by default, so that
avio_seek() will normally read data instead of calling the
underlying seek function. Yet this has two problems:
a) It relies on implementation details of avio_seek().
b) There is an exception, namely if the AVIO_FLAG_DIRECT is set.
In this case the underlying seek function (if set) is always
called and the data is skipped, leading to CRC errors.
So don't skip the data.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Otherwise it might be > buf_ptr in which case ffio_get_checksum()
could segfault (s->buf_ptr - s->checksum_ptr would be negative
which would be converted to something very big when converted
to unsigned for the update_checksum callback).
Fixes ticket #11233.
Reported-by: Du4t
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes an issue uncovered since 0ce413af9c
where flags were properly separated but without the objcflags_filter
being initialized, if add_objcflags is called early as part of
add_allcflags, it would lead to the append() function using an empty
filter and trying to call the flag instead, leading to errors like:
./configure: line 976: -fsanitize=address: command not found
./configure: line 976: -fno-omit-frame-pointer: command not found
Add a new flag to the vf_colorspace filter which provides the user an
option to clamp the linear and delinear transfer characteristics LUT
values to the [0, 1] represented range. This helps constrain the
potential value range when converting between colorspaces.
Certain colors when going through the conversion can result in out of
gamut colors after the rotation. The colorspace filter allows that with
the extended range. The added clamping just keeps the colors within the
[0, 1) range rather than using that extended range. I'm not enough of a
color scientist to say which is correct, but there are certain
situations where we would prefer to keep the colors in gamut.
The example I have is:
A solid color image of 8-bit YUV: Y=157, U=164, V=98.
Specify the input as:
Input range: MPEG
In color matrix: BT470BG
In color primaries: BT470M
In color transfer characteristics: Gamma 28
Output as:
Out color range: JPEG
Out color matrix: BT.709
Out color primaries: BT.709
Out color transfer characteristics: BT.709
During the calculation you get:
Input YUV: y=157, u=164, v-98
Post-yuv2rgb BT.470BG: r=0.456055, g=0.684152, b=0.928606
Post-apply gamma28 linear LUT: r=0.110979, g=0.345494, b=0.812709
Post-color rotation BT.470M to BT.709: r=-0.04161, g=0.384626, b=0.852400
Post-apply Rec.709 delinear LUT: r=-0.16382, g=0.615932, b=0.923793
Post-rgb2yuv Rec.709 matrix: y=120, u=190, v=25
Where with this change, the delinear LUT output would be clamped to 0,
so the result would be:
r=0.000000, g=0.612390, b=0.918807 and a final output of
y=129, u=185, v=46
As for the long and av_clip64, this was just because lrint returned a
long, so I left it as that and then used av_clip64 to the [0,1) range to
avoid overflow. But re-reading, it looks like av_clip_int16 would
downcast that long to int anyway so the possibility of overflow already
existed there. I've put it back to int just to match the existing
behavior.
It was mistakenly removed in 0ce413af9c.
This matches how the Makefiles actually compile ObjC code, and
unlike C++, this seems to actually be benign and various existing
setups rely on that behaviour.
This was missed in 0ce413af9c.
This fixes proper detection of Objective C APIs (that are missing)
if targeting older macOS versions, such as the check for
AVCaptureSession.
Refer to RFC 4585 4.2,
Add SDP "a=rtcp-fb" so the peer can request rtx
through sending NACKs.
Add basic parsing for NACK packet.
Decrypt into a newly allocated buffer (do not
overwrite whip->buf) so multiple NACKs in a
bundled packet can be parsed in a loop while
keeping whip->buf available for sending rtx
Broken or wrong NACk are logged and skipped.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
Combining C++ with our w32pthreads does not work out.
The C++ standard lib headers might include pthread.h, which will then
define clashing types with our w32pthread header.
To solve this, switch to using C++ native std::thread.
The strategy to count free slots isn't reliable. The value of
INPUT_MAX_PACKETS is a hardcoded value, and MPP_DEC_GET_STREAM_COUNT
doesn't always work as expected. When freeslots is nonzero, the mpp
decoder still returns MPP_ERR_BUFFER_FULL. Before this patch, all
packets are dropped once went into MPP_ERR_BUFFER_FULL state.
Don't drop packet and return error in MPP_ERR_BUFFER_FULL state.
Receive frame to allow the decoder's state machine to resume.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
MinGWs ANSI stdio is quite slow, and when using UCRT, its extra features
are not needed.
The only troublesome part with ucrt printf is that is disagrees about
the size of "long double", but FFmpeg does not use that anywhere, let
alone prints it.
Right now, CFLAGS like -std=c17 leak onto the C++ compilers commandline,
leading to tons of warnings and even some errors.
Undefining __STRICT_ANSI__ causes strong warnings from the stdlib
headers. So only set it for C.
Alternatively, this could be a separate job, potentially keyed
to only run on PRs that touch files matching */aarch64/*. But
as this runs very quickly, it's probably less clutter to just
bundle it here.
The same also applies for arm assembly, but there are more known
deviations within that.
Add a script which checks all files, except for a few known files
that deviate, for various reasons.
Currently, any nonnegative return value from an init function
is just treated as success and otherwise ignored; while it is
not explicitly forbidden to return something else than 0 (there
is no documentation whatsoever), the assumption is that
they return zero on success. So change dvdsubdec.c accordingly.
Also change dvdsub_parse_extradata() to return zero on success.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Since ae448e00af the various
tableinit functions are not compiled unconditionally any more,
so that pcm_tablegen.c (which creates the hardcoded tables)
needs to be updated.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Wrongly removed in fe73b84879, it's required for
calls with a payload smaller than a full block.
Fixes issue #20474.
Signed-off-by: James Almer <jamrial@gmail.com>
It is supported only for H.264 on Intel Macs, but it can be used with
both H.264 and HEVC on Apple Silicon.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
This is a newly added field, so there's no point to try and keep backwards
compatibility with an older API - newer clients should just use the new
fields.
And add the missing goto fail. This should ensure the alpha mode is set and remove
bogus warnings printed by ffplay.
Signed-off-by: James Almer <jamrial@gmail.com>
enc_pkt->size is 0 after av_packet_unref, which makes the check invalid.
Fix regression from 3e4bfff2.
Co-Authored-by: Jin Bo <jinbo@loongson.cn>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
When PARSER_FLAG_COMPLETE_FRAMES is set (e.g. in Matroska),
unesc_index is not reset after the first frame, which causes
subsequent frames to inherit leftover header data.
This leads to incorrect keyframe marking and broken decoding.
See also this bug report: https://trac.ffmpeg.org/ticket/8515
Signed-off-by: DE-AI <max.christoph@pentaxmedical.com>
The max_frame_size in byte will be filled in
VAEncMiscParameterBufferMaxFrameSize which receives size in bit.
This requires max_frame_size * 8.
Signed-off-by: Tong Wu <wutong1208@outlook.com>
libjxl consumes no input if it returns an error, so this loops over
and over while it spits out the same error many times. If we got an
error from libjxl and consumed no input, then we instead consume the
whole packet.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Since 9dc79241d9 we now always pop the
orientation off of the IFD and use a display matrix instead. This means
we should not produce a warning and refuse if the orientation field
indicates a default orientation (i.e. 1).
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reported-by: Ramiro Polla <ramiro.polla@gmail.com>
This version check for whip is unnecessary.
Since several rencet patches to the configure
have already added version checks for OpenSSL.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
We have no way of knowing if the string was fully processed or got truncated
because of a parse error so it is better to use the original path if finding an
image sequence fails.
We do this by explicitly falling back to the PT_NONE mode if the provided
filename is not a valid pattern but the file exists or if the IO context is
already open.
This also means that filenames no longer need to be escaped even in sequence
mode if an invalid sequence (a sequence without %d) is provided, so a command
line such as ffmpeg -f image2 -i "100%.jpg" will just work, but "100%%.jpg"
will no longer work.
Signed-off-by: Marton Balint <cus@passwd.hu>
The macro USEC_PER_SEC is already defined in system headers on some platforms,such as Nuttx nuttx/clock.h.
Check if it's already defined before defining it to avoid compiler warnings.
Signed-off-by: caifan3 <caifan3@xiaomi.com>
This amends 307983b292 to fix
building with older versions of mingw-w64.
The previously checked constant, SP_PROT_DTLS1_X_CLIENT, was
added in mingw-w64 in df36f5deda23192d0ee99ffd661ea36df924e667
in 2020, and is included in released versions since v8.0.0.
The new checked constant SECPKG_ATTR_DTLS_MTU was added in
mingw-w64 in 0792283787cca8fc27dd38671107c791c87f4db3 in 2021,
and first appeared in mingw-w64 v9.0.0.
This fixes building with mingw-w64 v8, which is the version bundled
in Ubuntu 22.04.
Pure cleanup, no functional changes
Unify local pointer names to `TLSContext *c` and `TLSShared *s` to
reduce confusion from inconsistent names (e.g. p, s, c)
Signed-off-by: Jack Lau <jacklau1222@qq.com>
Reduce nested if statements to make the code structure clearer
Add logic to check if EOF is received from UDP socket.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
The profile contains profile_idc and constraint_set*_flag,
throws away high 8 bit flags and then we get profile_idc.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
profile_iop is an 8 bits field. Previous code copied
AVCodecParameters::profile (which can contain bits
beyond 8 bits) into profile_iop, producing overflow
and wrong values.
This patch maps the constrained flags into the proper
profile_iop bits (constraint_set1 / 3)
Signed-off-by: Jack Lau <jacklau1222@qq.com>
Use av_gettime_relative() replace av_gettime() to get
better measure
Changed the display precision from showing only integers
to showing two decimal places (in milliseconds)
Signed-off-by: Jack Lau <jacklau1222@qq.com>
fix issue #20407
Refer to RFC 5245 15.1,
the foundation may be any string up to 32 chars.
The old code could misread foundations as transport("udp").
This patch fully parse all these attr to avoid parsing error.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
fix issue #20402
See RFC 8445 7.1,
The PRIORITY and ICE-CONTROLLED or ICE-CONTROLLING must be included
in a Binding request.
FFmpeg is always ICE Controlling role so we use ICE-CONTROLLING attr.
Co-author-by: Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
Signed-off-by: Jack Lau <jacklau1222@qq.com>
Instead of just 2 files, generalize this filter to support crossfading
arbitrarily many files. This makes the filter essentially operate similar
to the `concat` filter, chaining multiple files one after another.
Aside from just adding more input pads, this requires rewriting the
activate function. Instead of a finite state machine, we keep track of the
currently active input index; and advance it only once the current input is
fully exhausted.
This results in arguably simpler logic overall.
This behavior is currently completely broken, leading to an abrupt end of the
first audio stream. I want to generalize this filter to multiple inputs, but
having too short input files will always represent a significant problem.
I considered a few approaches for how to handle this more gracefully, but
most of them come with their own problems; in particular when a short input
is sandwiched between two longer ones; or when there is a sequence of short
inputs. In the end, it's simplest to just shorten the crossfade window.
I also considered (and tested) padding the input with silence, but this also
has its own aesthetic implications and strange edge cases.
See-Also: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20388
Consider APPx fields that are too short to contain an id field (32-bit)
as stubs, and ignore them if AV_EF_EXPLODE is not set.
This has been seen in the MJPEG output from some webcams (such as the
Logitech C270 and C920) and the JPEG images embedded in DNG images
from the Pentax K-1 camera.
The loop to skip the remaining bytes was off by one for all markers
except for Adob.
This patch uses post-decrement instead of pre-decrement in the while
loop to make the len value easier to understand, and updates the len
value to reflect this change for the Adob marker.
This fixes compilation with librsvg 2.50.3: error: viewport undeclared
This was a regression since commit
86ed68420d.
Fixes#10722.
Reviewed-by: Leo Izen <leo.izen@gmail.com>
This can be done by reusing the destination array to store
a temporary LUT (with only half the amount of elements, but
double the element size). This relies on certain assumptions
about sizes, but they are always fulfilled for systems supported
by us (sizeof(double) == 8 is needed/guaranteed since commit
3383a53e7d). Furthermore,
sizeof(uint32_t) is always >= four because CHAR_BIT is eight
(in fact, sizeof(uint32_t) is four, because the exact width
types don't have any padding).
Reviewed-by: Zhao Zhili <quinkblack@foxmail.com>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_cbrt_tableinit{,_fixed}() uses a LUT of doubles of the same size
as the actual LUT to be initialized (8192 elems). Said LUT is only
used to initialize another LUT, but because it is static, the dirty
memory (64KiB) is not released. It is also too large to be put on
the stack.
This commit mitigates this: We only use a LUT for the powers of
odd values, thereby halving its size. The generated LUT stays unchanged.
Reviewed-by: Zhao Zhili <quinkblack@foxmail.com>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
av_hwframe_map() is documented to work with a blank dst frame, but
hwcontext_drm currently fails if dst->format == AV_PIX_FMT_NONE.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Since 9d037c54f2 id3v2_extra_meta
can only be != NULL if the input format wants ID3v2 tags to be
read generically.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This flag governs whether avformat_open_input() reads
ID3v2 tags generically; some demuxers without this flag
read these tags themselves in a non-generic way,
e.g. oma. This makes this flag an implementation detail
that should not be exposed to the user, i.e. an internal flag.
Given that 9d037c54f2
did not bump version and added no APIchanges entry
I deemded it inappropriate to bump version or add
an APIchanges entry for the removal of AVFMT_FLAG_ID3V2_AUTO.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This fixes the following compiler error, if compiling with MSVC
for ARM (32 bit):
src/libswscale/ops_chain.c(48): error C2719: 'priv': formal parameter with requested alignment of 16 won't be aligned
This change shouldn't affect the performance of this operation
(which in itself probably isn't relevant); instead of copying the
contents of the SwsOpPriv struct from the stack as parameter,
it gets copied straight from the caller function's stack frame
instead.
Separately from this issue, MSVC 17.8 and 17.9 end up in an
internal compiler error when compiling libswscale/ops.c, but
older and newer versions do compile it successfully.
ID3v2 headers are now only parsed for formats that explicitly support them,
avoiding premature data consumption that broke demuxing in other formats.
Introduces AVFMT_FLAG_ID3V2_AUTO and applies it to mp3, aac, tta, and wav.
Signed-off-by: nilfm <nil.fons@gmail.com>
It makes no sense to ever call it that way given that
avio_context_free() accepts a pointer to a pointer to an AVIOContext.
Other double-pointer-free functions like av_freep() also do it
that way (and therefore avio_context_free(NULL) still crashes
even with 870cfed231).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Broken in ae448e00af.
Notice that config_components.h is safe to include,
as it is valid for both the host and the target
(in contrast to config.h).
Reviewed-by: Zhao Zhili <quinkblack@foxmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This patch adds support for the texture array feature
used by AMD boards in the D3D12 HEVC encoder.
In texture array mode, a single texture array is shared for all
reference and reconstructed pictures using different subresources.
The implementation ensures compatibility
and has been successfully tested on AMD, Intel, and NVIDIA GPUs.
There's no reason to use a completely separate graph just to process the
alpha plane in isolation - zimg supports native alpha handling as part of the
main image.
Fixes several issues with this filter when adding or removing alpha planes,
and also adds support for scaling premultiplied alpha (which reduces artefacts
near the image borders).
Fortunately, we only care if this flag is set - otherwise, this filter is
alpha mode agnostic (since it is purely scaling, etc).
That said, there is an argument to be made that we should prefer premul
alpha when scaling, because scaling in straight alpha will leak garbage
pixels; but I think that would be a too backwards-incompatible change to
be worth thinking about at this time.
The overwhelming majority of references to ff_draw_init2() just set the
colorspace properties from a filter link. This wrapper allows us to update
all such referencen automatically.
Conceptually, these are pretty simple to handle, since they are basically
equivalent to the case of alpha being absent, since the only thing the
destination alpha plane is used for is unpremultiplyng straight alpha.
For planar formats, the total number of cases doesn't change in principle,
since before this patch we have:
- alpha present, overlay straight
- alpha present, overlay premultiplied
- alpha absent, overlay straight
- alpha absent, overlay premultiplied
And now we have:
- main straight, overlay straight
- main straight, overlay premultiplied
- main premultiplied, overlay straight
- main premultiplied, overlay premultiplied
That said, we do gain some cases now that we used to (incorrectly) ignore,
like premultiplied yuva420p10.
Notably, we can skip defining separate functions for the case of main alpha
being absent, since that can be a single cheap branch inside the function
itself, on whether or not to also process the alpha plane. Otherwise, as long
as we treat "alpha absent" as "main premultiplied", the per-plane logic will
skip touching the nonexistent alpha plane.
The only format that actually needs extra cases is packed rgb, but that's only
two additional cases in total.
Also arguably simplifies the assignment logic massively.
Chooses the desired output alpha mode. Note that this depends on
an upstream version of libplacebo new enough to respect the corresponding
AVFrame field in pl_map_avframe_ex.
While vf_scale cannot directly convert between premultiplied and straight
alpha, the effective tagging can still change as a result of a change in
the pixel format (i.e. adding or removing an alpha channel).
JPEG XL supports both premultiplied and straight alpha, and the basic info
struct contains signalling for this. Forward the correct tagging on decode
and encode.
OpenEXR always uses premultiplied alpha, as per the spec.
cf. https://openexr.com/en/latest/TechnicalIntroduction.html
> By convention, all color channels are premultiplied by alpha, so that
> `foreground + (1-alpha) x background` performs a correct “over” operation.
> (See Premultiplied vs. Un-Premultiplied Color Channels)
>
> In the visual effects industry premultiplied color channels are the norm,
> and application software packages typically use internal image
> representations that are also premultiplied.
PNG always uses straight alpha.
cf. https://www.w3.org/TR/PNG-Rationale.html
> Although each form of alpha storage has its advantages, we did not want to
> require all PNG viewers to handle both forms. We standardized on non-
> premultiplied alpha as being the lossless and more general case.
Following in the footsteps of the previous commit, this commit adds the
new fields to AVCodecContext so we can start properly setting it on codecs,
as well as limiting the list of supported options to detect a format mismatch
during encode.
This commit also sets up the necessary infrastructure to start using the
newly added field in all codecs.
We need a filter that can premultiply and unpremultiply the alpha channel
dynamically, on demand, in response to the negotiated alpha mode, analogous
to how vf_scale operates. Introduce a new filter "vf_premultiply_dynamic"
that accomplishes this.
Set the output link according to the chosen direction, and the input link
to its opposite (unless the input is not in-place, in which case there is no
alpha channel).
Also update the alpha mode on the output frame, since it may differ.
Instead of hard-coding the assumption that all video properties are handled
by vf_scale, generalize the struct slightly to allow different conversion
filters for each property being merged. The avfiltergraph code creates a list
of conversion filters needed and inserts all of them at once.
Because a conversion filter might itself e.g. not support all formats,
it's possible that we need to go through the whole process of negotiating
formats multiple times, and keep adding conversion filters until all of them
are settled. Do this by simply jumping back to the beginning of the loop
to ensure that the `convertor_count` field remains contiguous.
FFmpeg currently handles alpha in a quasi-arbitrary way. Some filters/codecs
assume alpha is premultiplied, others assume it is independent. If there is
to be any hope for order in this chaos, we need to start by defining an enum
for the possible range of values.
Use ff_adts_header_parse_buf() wrapper to simplify as GetBitContext is
no longer needed (it was introduced for USAC).
Partially reverts 64bb91fd3b.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
d119ae2fd8 removed the loop-breaking condition
received_sigterm.
Thus, signals no longer gracefully shutdown ffmpeg.
Fixes: #10834
Signed-off-by: Patrick Wang <mail6543210@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If we're invoked with range == UINT_MAX, we end up doing
"rnd() % (UINT_MAX + 1)", which is equal to "rnd() % 0". On
arm (on all platforms) and on MSVC i386, this ends up crashing
at runtime.
This fixes the crash.
In constant FPS mode, it's possible for there no be *no* input active at
the current timestamp, even though there would be active inputs later in
the timeline (nb_active > 0). This would previously hit the AVERROR_BUG case.
With this change, we can instead output an empty frame.
Instead of having a single s->status field to track the most recently
EOF'd stream, track the number of active streams directly and only query
the most recent status at the end.
The reason this worked before is because we implicitly relied on the
assumption that `ok` would always be true until all streams are EOF, but
because it's possible to have gaps in the timeline when mixing multiple
streams, this is not always the case in principle.
In practice, this fixes a bug where the filter would early-exit (EOF)
too soon, when any input reached EOF and there is a gap in the timeline.
When using libplacebo to composite multiple streams with complex PTS
values, it's possible for there to be a "hole" in the output stream; in
particular in constant FPS mode. This commit refactors output_frame() to
allow it to handle the case of there being no active input.
A --enable-* option should not have a condition where it's enabled by some
other mean, as is the case here if swscale is enabled.
Signed-off-by: James Almer <jamrial@gmail.com>
This patch adds format handling code for the new operations. This entails
fully decoding a format to standardized RGB, and the inverse.
Handling it this way means we can always guarantee that a conversion path
exists from A to B without having to explicitly cover logic for each path;
and choosing RGB instead of YUV as the intermediate (as was done in swscale
v1) is more flexible with regards to enabling further operations such as
primaries conversions, linear scaling, etc.
In the case of YUV->YUV transform, the redundant matrix multiplication will
be canceled out anyways.
Because of the lack of an external ABI on low-level kernels, we cannot
directly test internal functions. Instead, we construct a minimal op chain
consisting of a read, the op to be tested, and a write.
The bigger complication arises from the fact that the backend may generate
arbitrary internal state that needs to be passed back to the implementation,
which means we cannot directly call `func_ref` on the generated chain. To get
around this, always compile the op chain twice - once using the backend to be
tested, and once using the reference C backend.
The actual entry point may also just be a shared wrapper, so we need to
be very careful to run checkasm_check_func() on a pseudo-pointer that will
actually be unique for each combination of backend and active CPU flags.
This covers most 8-bit and 16-bit ops, and some 32-bit ops. It also covers all
floating point operations. While this is not yet 100% coverage, it's good
enough for the vast majority of formats out there.
Of special note is the packed shuffle fast path, which uses pshufb at vector
sizes up to AVX512.
Provides a generic fast path for any operation list that can be decomposed
into a series of memcpy and memset operations.
25% faster than the x86 backend for yuv444p -> yuva444p
33% faster than the x86 backend for gray -> yuvj444p
This will serve as a reference for the SIMD backends to come. That said,
with auto-vectorization enabled, the performance of this is not atrocious.
It easily beats the old C code and sometimes even the old SIMD.
In theory, we can dramatically speed it up by using GCC vectors instead of
arrays, but the performance gains from this are too dependent on exact GCC
versions and flags, so it practice it's not a substitute for a SIMD
implementation.
This can turn any compatible sequence of operations into a single packed
shuffle, including packed swizzling, grayscale->RGB conversion, endianness
swapping, RGB bit depth conversions, rgb24->rgb0 alpha clearing and more.
This handles the low-level execution of an op list, and integration into
the SwsGraph infrastructure. To handle frames with insufficient padding in
the stride (or a width smaller than one block size), we use a fallback loop
that pads the last column of pixels using `memcpy` into an appropriately
sized buffer.
This is responsible for taking a "naive" ops list and optimizing it
as much as possible. Also includes a small analyzer that generates component
metadata for use by the optimizer.
See docs/swscale-v2.txt for an in-depth introduction to the new approach.
This commit merely introduces the ops definitions and boilerplate functions.
The subsequent commits will flesh out the underlying implementation.
Give users and developers a way to opt in to the new format conversion code,
and more code from the swscale rewrite in general, even while development is
still ongoing.
We split the standard macro into its body (implementation) and declaration,
and use a macro argument in place of the raw `memcmp` call, with the major
difference that we now take the number of pixels to compare instead of the
number of bytes (to match the signature of float_near_ulp_array).
Sometimes, when measuring very small functions, rdtsc is not accurate enough
to get a reliable measurement. This increases the number of runs inside the
inner loop from 4 to 32, which should help a lot. Less important when using
the more precise linux-perf API, but still useful.
There should be no user-visible change since the number of runs is adjusted
to keep the total time spent measuring the same.
This behavior had no real justification and was just incredibly confusing,
since the in/out pointers passet to setup() did not match those passed to
run(), all for what is arguably an exception anyways (the palette setup).
This wrapping logic still considered any nonzero return from the ASM function
to be the overall result, but this is not true since the addition of
FF_ALPHA_TRANSPARENT.
Fix it by only early returning if FF_ALPHA_STRAIGHT is detected.
Fixes: 9b8b78a815
See-Also: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20301#issuecomment-4802
Extract Orientation and export it as a display matrix if present, and set the
frame's metadata with the remaining Exif entries.
Signed-off-by: James Almer <jamrial@gmail.com>
In hybrid_fragmented mode, the first moov at start should contain
mvex tag, since it's fmp4. When writing the moov at the second time,
it's not fmp4 any more, so mvex should be skipped.
Set CODECAPI_AVLowLatencyMode when AV_CODEC_FLAG_LOW_DELAY is enabled on
the AVCodecContext. AVLowLatencyMode can acheive lower latency encoding
that may not be accessible using eAVScenarioInfo options alone.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Currently, the aacencdsp checkasm tests fails for many seeds,
if the C code has been built with x87 math. This happens because
the excess precision of x87 math can make it end up rounding
to a different integer, and the checkasm tests checks that the
output integers match exactly between C and assembly.
One such failing case is "tests/checkasm/checkasm --test=aacencdsp
41" when compiled with GCC. When compiled with Clang, the test
seed 21 produces a failure.
To avoid the issue, we need to limit the precision of intermediates
to their nominal float range, matching the assembly implementations.
This can be achieved when compiling with GCC, by just adding a single
cast.
To observe the effect of this cast, compile the following
snippet,
int cast(float a, float b) {
return (int)
#ifdef CAST
(float)
#endif
(a + b);
}
with "gcc -m32 -std=c17 -O2", with/without -DCAST. For x86_64
cases (without the "-m32"), the cast doesn't make any difference
on the generated code.
This cast would seem to not have any effect, as a binary expression
with float inputs also would have the type float.
However, if compiling with GCC with -fexcess-precision=standard,
the cast forces limiting the precision according to the language
standard here - according to the GCC docs [1]:
> When compiling C or C++, if -fexcess-precision=standard is
> specified then excess precision follows the rules specified in
> ISO C99 or C++; in particular, both casts and assignments cause
> values to be rounded to their semantic types (whereas -ffloat-store
> only affects assignments). This option is enabled by default for
> C or C++ if a strict conformance option such as -std=c99 or
> -std=c++17 is used.
Ffmpeg's configure scripts enables -std=c17 by default.
This only helps with GCC though - the cast doesn't make any
difference for Clang. (Although, upstream Clang seems to default
to SSE math, while Ubuntu provided Clang defaults to x87 math.)
Limiting the precision with Clang would require casting to volatile
float for both intermediates here - and that does have a code
generation effect on all architectures.
[1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Using the built-in token seems to only partially work.
And in the latest Forgejo release the issue forcing the use of the
built-in token was fixed with:
https://codeberg.org/forgejo/forgejo/pulls/9025
Change the PCM table init to use compile-time #if instead of a runtime
`if (CONFIG_...)`. This makes sure code for disabled encoders never gets
built, without depending on compiler dead-code elimination.
Signed-off-by: YiboFang <fangyibo@xiaomi.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Add the max_frame_size option to support setting max frame size in
bytes. Max frame size is the maximum cap in the bitrate algorithm per
each encoded frame.
Signed-off-by: Tong Wu <wutong1208@outlook.com>
strftime returns 0 in case of an empty output (e.g. %p format string with some
locales), there is no way to distinguish this from a buffer-too-small error
condition. So we must use some heuristics to handle this case, and not consume
INT_MAX RAM and falsely report a truncated output.
Signed-off-by: Marton Balint <cus@passwd.hu>
The counters should be incremented for each new codec.
Catch-up to be in sync with codec_id.h again.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
If this function returns an error after ff_sws_graph_add_pass() has been
called, and the pass->free callback is therefore already set up to free the
context, the graph will end up freed twice: once by the pass->free callback
(during ff_sws_graph_free()), and once before that by failure path of the
caller (e.g. add_legacy_sws_pass(), or init_legacy_subpass() itself for
cascaded contexts.)
The solution is to redefine the ownership of SwsGraph to pass clearly from
the caller of add_legacy_sws_pass() to init_legacy_subpass(), which can then
deal with appropriately freeing the context conditional on whether or not the
pass was already registered in the pass list.
Reported-by: 김영민 <kunshim@naver.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
'decode_spectrum' reads 5 bits from bitstream to get
number of encoded subbands – so 31 means all 32
subbands are encoded. This value also is used to
determinate the number of used band in the hybrid
filterbank.
'subband_tab' array contains 33 values of MDCT spec
line positions started from 0 line and used to map
subband number in to the range of mdct lines.
Since the subband_num returned by decode_spectrum
actually is number – 1 and subband_tab started from 0
we need to add 1 to make num_bands calculation correct.
We're missing a call to av_exif_free here. We leak the internal heap-
allocated objects when &ifd goes out of scope.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
We're missing a call to av_exif_free here. We leak the internal heap-
allocated objects when &ifd goes out of scope.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
"glob_sequence" was deprecated since 2012. This also changes the default pattern
to "sequence", because "glob_sequence" was also the default.
Signed-off-by: Marton Balint <cus@passwd.hu>
simply whip->audio_first_seq + 1 for video could be
faster than generating a new one.
this patch will make whip be able to get the first
rtp seq easily and compute something sequence-based
(e.g. NACK and RTX handling)
Signed-off-by: Jack Lau <jacklau1222@qq.com>
The project introduced API breaking changes to some of their public functions,
and given the library is relatively new, just bump the minimum supported
version instead of adding ifdeffery to the source files.
Signed-off-by: James Almer <jamrial@gmail.com>
The min_packet_size/max_packet_size were hardcoded to 256KB for
performance. The new option makes it possible to configure a
smaller packet size on embedded devices with limited memory
resources.
Signed-off-by: yangsen5 <1441923451@qq.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
An EXIF IFD with 0 entries is legal, but does not contain metadata. We
should not attempt to allocate a struct with size zero in this case, as
this causes an allocation failure. Instead, we just leave the struct
empty.
Fixes: #20305.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Currently there's platform-dependent behavior where if no endianness
is requested, it writes the buffers in native-endian. This breaks FATE
tests on big-endian architecture. This commit changes the default to
little-endian buffers upon writing.
Fixes: #20291.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This makes the functions extensible, as future behavior change flags can be
introduced.
This is strictly speaking not an API break. Only if a user was setting
recursive to anything other than 1 it would now behave differently, but given
these functions have been in the tree for only a few days, the chances for that
are practically zero.
Signed-off-by: James Almer <jamrial@gmail.com>
use audio_ssrc + 1 for video_ssrc to avoid ssrc collision
See RFC 3550 Section 8,
The SSRC identifier carried in the RTP header and in various fields
of RTCP packets is a random 32 bit number that is required to be
globally unique within an RTP session.
simply + 1 also can get an unique ssrc, and faster than generating
new unique ssrc.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
av_exif_* functions are part of libavcodec's public API since
ad77345a5d, so the EXIF objects can't
be disabled.
Fixes compilation with --disable-everything.
Otherwise, the user requested priority of packet side data will be ignored.
For this, move the relevant functions to decode.c, as they need access to an
AVCodecContext.
Signed-off-by: James Almer <jamrial@gmail.com>
Just add an extra x/y offset parameter pair to process_frame_obj(),
and store the size of the data to FTCH in a separate context member.
The only valid sizes for FTCH are 6 and 12, reject any other.
Finally, if a FOBJ uses codecs37 and above, enforce it to be subversion 2,
to use the simpler STOR/FTCH method.
Fixes BIGSLEEP-440183164/process_ftch.anim
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
AFAIK the original decoder uses the frame buffers in very strange ways
our implementation seems to mimic that and that results in the
bitstream input to point into a frame buffer while code then
parses that and potentially reallocates the frame buffer
leaving pointers hanging into dealllocated space
This simply uses a temporary buffer
Fixes: Writing into freed buffers
Fixes: BIGSLEEP-440183164/old_codec21.anim
Fixes: BIGSLEEP-440183164/old_codec4.anim
Found-by: Google Big Sleep
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This commit fixes some memory and security issues due to improper
sanitizing and checks.
Fixes:
- CID 1665100..1665107
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reportedy-by: James Almer <jamrial@gmail.com>
It's like cmd 2. Fixes the Rebel Assault 1 Intro Video "C1BLOCK.ANM"
crossfade from planet view to space view and subsequent wrong colors.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
It makes sense to treat the presence of a frame duration and the presence
of frame rate metadata identically - because both convey effectively the same
amount of information.
In f121d95 and fa110c3 respectively, this information was stripped by default,
originally to work-around bugs when changing the PTS information of a stream
being fed to some encoders. (See https://trac.ffmpeg.org/ticket/10886)
Later, commit 959b799c restored the ability to preserve the frame rate
medatata via the `strip_fps` option, but this option did not extend to also
include the frame duration.
This commit resolves the scenario by making `frame_rate` and `duration`
handled in a consistent manner, so that the frame rate information is
generally preserved unless explicitly stripped by the user.
While it does regress the exact invocation presented in the trac ticket unless
using `strip_fps=yes`, I consider this an acceptable trade-off, especially in
light of the fact that the `fps` filter also exists and is arguably the better
tool for the task at hand.
Many of these additions are in separate commits in one set, so in the
interest of clarity, the API changes are all documented in one commmit
here.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This commit takes some of the sanitize code used by ff_exif_get_buffer
and exposes it as an ff_ API, so encoders who wish to modify the
sanitized IFD before calling av_exif_write can do so.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
libjxl provides the ability to get EXIF metadata from an ISO18181-2
JPEG XL container file. This commit enables reading it and attaching
it to an AVFrame using the new EXIF API.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
We don't need to print the tags here because they're added as dict
elements to AVFrame->metadata and are printed elsewhere with ffprobe
-show_frames.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This commit will cause TIFF files to store their tags in the EXIF
struct so tags such as orientation can be transfered to other formats
(such as PNG) in a way that doesn't corrupt the IFD.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Add an API function to retrieve am AVExifEntry struct with a given TIFF
tag ID from the AVExifMetadata ifd struct.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Add an API function that allows popping an exif entry out of the struct
entirely rather than requiring it be replaced with a default value.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Takes existing code that makes display matricies work with EXIF
orientation tags and expose the conversion as a public API.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Add support to write EXIF profiles using the new EXIF framework, namely
ff_exif_get_buffer, and writing them into eXIf chunks.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Add support to parse eXIf chunks using the new EXIF framework, as well
as EXIF profiles in zTXt and tEXt with Raw profile type exif.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This function is no longer called and its functionality
can be accessed using the new API in exif.h as well as the
various internal attach functions in exif_internal.h.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Add FF_API_OLD_EXIF deprecation guard based on lavc version < 62 in
order to allow removing the avpriv_ next major version bump.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This commit switches avidec to use the new metadata API exposed by
the previous commit instead of relying on the existing avipriv_ function
exposed by lavc.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This commit adds a structure to contain parsed EXIF metadata, as well
as code to read and write that struct from/to binary EXIF buffers. Some
internal functions have been moved to exif_internal.h. Code to read
from this new struct and write to an AVDictionary **dict has been added
as well in order to preserve interoperability with existing callers.
The only codec changes so far as of this commit are to call these
interop functions, but in future commits there will be codec changes to
use the new parsing routines instead.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This commit adds support for the additional side data type
AV_FRAME_DATA_EXIF, which contains a buffer of an EXIF metadata
payload.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This makes the enum TiffTypes public by moving it to the public header
exif.h and renaming it to AVTiffDataType, as well as adding an AV_
prefix in front of each of the entry names. This allows callers to use
enum AVTiffDataType without pulling in all of tiff_common.h, as that
header is not public.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Extensions in AAC USAC can be stored across multiple frames (mainly to keep CBR compliance).
This means that we need to reallocate a buffer when new data is received, accumulate the bitstream data,
and so on until the end of extension flag is signalled and the extension can be decoded.
This is made more complicated by the way in which the AAC channel layout switching is performed.
After decades of evolution, our AAC decoder evolved to double-buffer its entire configuration.
All changes are buffered, verified, and applied, on a per-frame basis if required, in often
random order.
Since we allocate the extension data on heap, this means that if configuration is applied,
in order to avoid double-freeing, we have to keep track of what we've allocated.
It should be noted that extensions which are spread in multiple frames are generally rare,
so an optimization to introduce av_refstruct_realloc() wouldn't generally be useful across the codebase.
Therefore, a copy is good enough for now.
Thanks to Michael Niedermayer for additional fixing.
Fixes: double free
Fixes: 393523547/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_LATM_fuzzer-6740617236905984
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
The previous change 26a2a76346 broke Vulkan decoding because the lf and
segmentation values contained within VP9RawFrameHeader can no longer be
updated.
Read the propogated values from the CBS instead.
This fixes the following warning:
libavutil/riscv/cpu.c:85:9: warning: label at end of compound statement is a C23 extension [-Wc23-extensions]
Signed-off-by: Levi Zim <rsworktech@outlook.com>
It can be useful to know if the alpha plane consists of fully opaque
pixels or not, in which case it can e.g. safely be stripped.
This only requires a very minor modification to the AVX2 routines, adding
an extra AND on the read alpha value with the reference alpha value, and a
single extra cheap test per line.
detect_alpha_8_full_c: 2849.1 ( 1.00x)
detect_alpha_8_full_avx2: 260.3 (10.95x)
detect_alpha_8_full_avx512icl: 130.2 (21.87x)
detect_alpha_8_limited_c: 8349.2 ( 1.00x)
detect_alpha_8_limited_avx2: 756.6 (11.04x)
detect_alpha_8_limited_avx512icl: 364.2 (22.93x)
detect_alpha_16_full_c: 1652.8 ( 1.00x)
detect_alpha_16_full_avx2: 236.5 ( 6.99x)
detect_alpha_16_full_avx512icl: 134.6 (12.28x)
detect_alpha_16_limited_c: 5263.1 ( 1.00x)
detect_alpha_16_limited_avx2: 797.4 ( 6.60x)
detect_alpha_16_limited_avx512icl: 400.3 (13.15x)
This safety margin was motivated by the fact that vf_premultiply sometimes
produces such illegally high values, but this has since been fixed by
603334a043, so there's no more reason to have this safety margin, at
least for our own code. (Of course, other sources may also produce such
broken files, but we shouldn't work around that - garbage in, garbage out.)
See-Also: 603334a043
Basically cosmetic.
I want to expand this to detect more than a single property about the alpha
channel at the same time; so we first need a way for this function to
return a more complex result.
Move the enum AlphaMode to the header and formally generalize the return
signature a bit to allow returning more than just one value.
instead of only AV-specific options. The previous code assumed that any
option not defining the codec in an `ffpreset` file is an AVOption. This
for example prevented the use of options defined in `OptionDef[]`, like
`-pix_fmt`, as part of preset files, requiring users to type these out
every time.
Closes: #1530
Signed-off-by: Andreas Hartmann <hartan@7x.de>
The DOS/Windows decoder precomputes a table of linear offsets of
all motion vectors given the current image width.
For larger widths (>=762), the pairs starting at indices 1 and 254
of motion_vectors[] will overflow the int16_t, changing the sign.
Playing back the 800x600 "jonesopn_8.snm" video of "Indiana Jones and
the Infernal Machine" reveals a lot of artifacts and a lot of
"Ignoring invalid motion vector (149, -41)->(136, 0), block size = 8"
messages, hinting at the wrong direction of the motion vectors.
Fix this by doing the calculation that the DOS/Windows players do,
let the value overflow and reextract the "new" mvec x/y components.
avio_seek() never supported SEEK_END and returned AVERROR(EINVAL) when
specified, so the later check "(whence != SEEK_END || force)" was always true.
This also means that AVSEEK_FORCE had no effect since 7a6fe01f99, that is 15
years ago. Rather than changing behaviour, let's document instead that the flag
has no effect.
Also fixed other commit 41ed7ab45f which confused ORing / passing AVSEEK_FORCE
and AVSEEK_SIZE in the docs.
Signed-off-by: Valerii Zapodovnikov <val.zapod.vz@gmail.com>
The setup code fow w,h,left,top is complex, the code using it also falls in
at least 2 different classes, one using left/top the other not.
To ensure no out of array access happens we add this clear check.
Fixes: out of array access
Fixes: 439261995/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SANM_fuzzer-5383455572819968
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The array is only ever needed for streams where each sample entry may have a
different value. Given that for non animated HEIF there's a single value that
applies to the image, use the field defined for that.
Fixes: NULL pointer dereference
Fixes: 437528618/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-6537287645331456
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: James Almer <jamrial@gmail.com>
Add myself to JPEG XL and PNG decoders/encoders/parsers/etc. as a
suggested reviewer. I'm the maintainer of the JXL files and I know the
PNG code very well.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This patch aims to simplify the dtls handshake process
since dtls handshake use force block mode
We can just use the return code instead of DTLSState enum
Signed-off-by: Jack Lau <jacklau1222@qq.com>
pu_info is read uninitialized on damaged input and at that point the following codepath is dependant
on the uninitialized data. In one of these pathes out of array accesses happen.
None of this is replicatable
Less uninitialized data also should result in more reproducable reports
Fixes: Use of uninitialized memory
Fixes: 418335931/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RV60_fuzzer-5103986067963904
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It was unnecessary convoluted, remove not needed memory allocations,
snprintf.
Also fixes posibility to call snprinft with NULL as %s input.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Commit f256487cd8 bumped requirement to
1.1.0 for OPENSSL_init_ssl.
Bump this again to 1.1.1, because it was an LTS version. Although it has
no mainline support anymore, it still has paid/premium support. 1.1.0 has
no support at all.
Motivated for use of BIO_read_ex() for next commits.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
in pack mode the array is passed into decode_plane() without being initialized or used
Fixes: use of uninitialized memory
Fixes: 438780119/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_UTVIDEO_DEC_fuzzer-5464037027807232
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When dimensions differ from context, those were updated using
ff_set_dimensions, however this overwrote the aligned coded_width and
coded_height that were set before, leading to a buffer overflow when
writing the frame data.
Fixes: OssFuzz 438771336
Fixes: Heap-buffer-overflow
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Marvin Scholz <epirat07@gmail.com>
Reviewed-by: Marvin Scholz <epirat07@gmail.com>
One of the design goals of this macro is the ability to support static
inline predicate functions. Which is emitting warning in clang that a
function may have side-effects and the condition will be ignored. MSVC
doesn't emit warning, but also ignore predicate in such cases.
Instead of using assume builtins, implement it using unreachable. Which
solves this case for MSVC and Clang.
This reverts ea56fe60ac, but also extends
it to MSVC, which is affected in the similar way.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Fixes: signed integer overflow: 10 * 1952737655 cannot be represented in type 'int'
Fixes: PoC_avi_demux
Found-by: 2ourc3 (Salim LARGO)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
x86/vc1dsp_init.c already includes asm.h before, so this is mostly
cosmetic change and only to clean checkheaders warning.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
cbs.mak is meant to contain tests strictly for the CBS framework, not for any
bsf that happens to use it under the hood.
Signed-off-by: James Almer <jamrial@gmail.com>
On Windows, static libraries are typically named with a .lib extension.
An exception to this is MinGW targets, which are treated as a distinct
target-os.
Using Windows-style naming allows `clang` to be used as the linker
driver, instead of invoking link or lld-link directly. The latter
approach requires manually specifying standard libraries when compiling
with `clang` rather than `clang-cl`, and manually specifying standard
libraries may be error-prone or incomplete.
Using Windows-style naming allows Clang to be used as the linker driver,
instead of invoking link or lld-link directly. The latter approach
requires manually specifying standard libraries, which may be
error-prone or incomplete.
This change also improves support for LTO and sanitizer builds, where
it's significantly easier to let the compiler driver manage the
necessary linker flags.
It fixes issues where Clang is asked to link `-lavcodec`, which gets
passed to the linker as avcodec.lib, resulting in an error like:
lld-link: error: could not open 'avcodec.lib': no such file or directory
This happens because `libavcodec.a` was unexpectedly generated, not
`avcodec.lib` expected by tooling.
Additionally, using `clang` (not `clang-cl`) is simplified, as it does
not use autolinking like MSVC/clang-cl does. Now `--ld=clang` can be
used to add all the required libraries. Previously, building with
`clang` was only possible by using `--ld=lld-link` and manually
specifying system dependencies in `extra-ldflags`.
Note that those changes doesn't affect mingw build. MSVC builds will
produce .lib static libraries now, but the linking process itself is not
affected, because filenames are passed directly.
To summarize in Windows non-mingw builds:
Static builds now produce `lib/avcodec.lib` instead of `lib/libavcodec.a`.
Shared builds remain unchanged, producing `bin/avcodec.lib` together with
`bin/avcodec-62.dll`.
This also removes setting LD_LIB from Win32/64 target as there is one
type of .lib in practice. We cannot build both shared and static at the
same time as noted by the next line.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Remove redundant av_freep() to avoid double free since task will be freed in dnn_free_model_tf() after the success of ff_queue_push_back().
Fixes: af052f9066 ("lavfi/dnn: fix mem leak in TF backend error handle")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
The code writing in the buffer has a wide range of error checks
which simply leave it partly uninitialized.
Initializing it on allocation ensures no sensitive data leaks and that
bugs are more reliably reproduceable
Fixes: use of uninitialized memory
Fixes: 435225510/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-4521918634196992
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Dithering relies on a 8 line dithering table and the code always uses it from
the beginning. So in order to make dithering independent from height of the
slices used we must enforce a 8 line alignment.
Fixes issue #20071.
Signed-off-by: Marton Balint <cus@passwd.hu>
You can technically feed the decoder a single packet with more than one AU in
it, even if a demuxer is unlikely to ever export such a thing.
Given the parser does no packetization to ensure we get a single AU, in order
to not discard any frame beyond the first, the decoupled input/output API
should be used.
Signed-off-by: James Almer <jamrial@gmail.com>
If a single fragment contains more than one frame unit, the tile information stored
in the private context will only correspond to one of them.
Fixes: crash (out of array access)
Fixes: 435489659/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APV_fuzzer-6194885205229568
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: out of array access
Fixes: BIGSLEEP-436511754/testcase.exr
Found-by: Google Big Sleep
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If we use the private BlackDetectContext for logging, the filter ID does
not get printed alongside the filter name. Using the parent AVFilterContext
makes this consistent with other filters.
When called from avformat_find_stream_info(), we are only interested in
decoding SOF fields.
This patch makes the decoder skip all other fields (including SOI, SOS,
and EOI). This also prevents the decoder from incorrectly printing the
warning "EOI missing, emulating" (which is the case since 2ae82788).
Fixes: #20119
pl_frame_mix_current() will return NULL if all frames are in the future,
but when libplacebo is using a frame mixer with a radius greater than 1,
future frames are expected to already be partially renderered. Instead, use
pl_frame_mix_nearest(), which is guaranteed to give us a valid frame for any
nonempty frame mix.
Fixes: 3091bca3ed
Previously, these tests failed when running on Windows, if the
system is configured with a time zone east of Greenwich, i.e.
with a positive GMT offset.
The muxer converts the creation_date given by the user using
av_parse_time to unix time, as a time_t. The creation_date is
interpreted as a local time, i.e. according to the current time
zone. (This time_t value is then converted back to a broken out
local time form with localtime_r.)
The given reference date/time, "1970-01-01T00:00:00", is the
origin point for unix time, corresponding to time_t zero. However
when interpreted as local time, this doesn't map to exactly zero.
Time zones east of Greenwich reached this time a number of hours
before the point of zero time_t - so the corresponding time_t
value essentially is minus the GMT offset, in seconds.
Windows mktime returns an error, returning (time_t)-1, when given
such a "struct tm", while e.g. glibc mktime happily returns a
negative time_t. av_parse_time doesn't check the return value of
mktime for potential errors.
This is observable with the following test snippet:
struct tm tm = { 0 };
tm.tm_year = 70;
tm.tm_isdst = -1;
tm.tm_mday = 1;
tm.tm_hour = 0;
time_t t = mktime(&tm);
printf("%d-%02d-%02d %02d:%02d:%02d\n", tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec);
printf("t %d\n", (int)t);
By varying the value of tm_hour and the system time zone, one
can observe that Windows mktime returns -1 for all time_t values
that would have been negative.
This range limit is also documented by Microsoft in detail at
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/mktime-mktime32-mktime64.
To avoid the issue, pick a different, arbitrary reference time,
which should have a nonnegative time_t for all time zones.
clr is passing into decode_run_p() its not used when not set
but this possibly triggers msan (it doesnt locally)
Fixes?: use of uninintialized memory
Fixes?: 436997807/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SCPR_fuzzer-6253316466606080
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This code always ignored the user-provided enc_ctx->chroma_sample_location
in favor of the location tagged on the frame. This leads to a very (IMHO)
unexpected outcome where -chroma_sample_location works differently from the
related options like -colorspace and -color_range, the latter of which
override the frame properties as a consequence of being configured on the
filter graph output.
The discrepancy comes from the fact that the chroma sample location does not
itself participate in filter graph negotiation.
Solve the situation by only overriding the enc_ctx option if it was left
unspecified by the user, and otherwise printing a warning if the requested
parameter does not match the frame parameter.
heif_items are reused and to avoid leaking memory or using stale name,
clear it first.
Fixes: 432505829/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-6654363487764480
Found-by: OSS-Fuzz
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
When this function returns, the fragment is never reset, so
current_frag->data_ref is never unref-ed, which ends up leaking it.
We call ff_cbs_fragment_reset to release the reference on its buffer.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
opt_codec() is mostly copied over from ffplay.c.
This is useful when different decoders have different capabilities,
or to test a specific decoder.
Signed-off-by: Maryla Ustarroz-Calonge <maryla@google.com>
Accept up to 13 ULP difference.
This fixes running "checkasm --test=ac3dsp 3044836819" on ARM.
Depending on how the SIMD implementations aggregate numbers,
larger/smaller values might not end up accumulated in exactly
the same way; the current NEON implementation for ARM aggregates
into vectors of 2 elements. If it would aggregate into vectors
of 4 elements instead, like the AArch64 version does, this particular
case would end up with a smaller difference.
AV_CLASS_CATEGORY_HWDEVICE was introduced in the past but
no corresponding colors were added for non-Win32 platforms.
Fixes bf1579c ("avutil/log,hwcontext: Add AV_CLASS_CATEGORY_HWDEVICE")
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
they use tests/ref/fate/sub-mcc-remux as input,
so prefix them with $(SRC_PATH) so building works
when not in the source directory.
Fixes: #20183
Reported-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Jacob Lifshay <programmerjake@gmail.com>
Don't overwrite the bitstream values when updating the top-level loop
filter and segmentation state, instead do the update separately at the
end of the frame parsing.
This also reverts the change to the passthrough tests which made them
have output not matching the input.
The SRTP profile string "SRTP_AES128_CM_SHA1_80" is only used when
use_srtp is enabled. Move its declaration inside the "if (s->use_srtp)"
block to limit scope
Signed-off-by: Jack Lau <jacklau1222@qq.com>
Pure cleanup, no functional changes
Unify local pointer names to `TLSContext *c` and `TLSShared *s` to
reduce confusion from inconsistent names (e.g. p, ctx, s, c)
Signed-off-by: Jack Lau <jacklau1222@qq.com>
Patches should be submitted to the [ffmpeg-devel mailing list](https://ffmpeg.org/mailman/listinfo/ffmpeg-devel) using `git format-patch` or `git send-email`. Github pull requests should be avoided because they are not part of our review process and **will be ignored**.
Patches should be submitted to [Forgejo](https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls) or the [ffmpeg-devel mailing list](https://ffmpeg.org/mailman/listinfo/ffmpeg-devel) using `git format-patch` or `git send-email`. Github pull requests should be avoided because they are not part of our review process and **will be ignored**.
See [https://ffmpeg.org/developer.html#Contributing](https://ffmpeg.org/developer.html#Contributing) for more information.
@@ -352,6 +352,17 @@ means auto (seekable for normal files, non-seekable for named pipes).
Many demuxers handle seekable and non-seekable resources differently,
overriding this might speed up opening certain files at the cost of losing some
features (e.g. accurate seeking).
@item pkt_size
Set the maximum packet size used for file I/O. A smaller value may reduce
memory usage. A higher value may increase throughput especially with networked
filesystems.
For reading, if explicitly set, it overrides the default internal buffer size
(32 KB) and limits the maximum amount of data read per operation.
For writing, this sets the size of each write operation. The default is 256 KB
for regular files, 32 KB otherwise.
@end table
@section ftp
@@ -406,26 +417,6 @@ Gophers protocol.
The Gopher protocol with TLS encapsulation.
@section hls
Read Apple HTTP Live Streaming compliant segmented stream as
a uniform one. The M3U8 playlists describing the segments can be
remote HTTP resources or local files, accessed using the standard
file protocol.
The nested protocol is declared by specifying
"+@var{proto}" after the hls URI scheme name, where @var{proto}
is either "file" or "http".
@example
hls+http://host/path/to/remote/resource.m3u8
hls+file://path/to/local/resource.m3u8
@end example
Using this protocol is discouraged - the hls demuxer should work
just as well (if not, please report the issues) and is more complete.
To use the hls demuxer instead, simply use the direct URLs to the
m3u8 files.
@section http
HTTP (Hyper Text Transfer Protocol).
@@ -462,6 +453,28 @@ Set the Referer header. Include 'Referer: URL' header in HTTP request.
@item multiple_requests
Use persistent connections if set to 1, default is 0.
@item request_size
Limit the size of requests made. This is useful for some pathological servers
that throttle unbounded range requests, as well as when expecting to seek
frequently. Disabled (set to 0) by default.
Note that if enabling this option, it's strongly recommended to also enable
the @option{multiple_requests} option, as well as setting
@option{short_seek_size} to the same value or higher. Doing so allows FFmpeg
to reuse a single HTTP connection wherever possible.
@item initial_request_size
Limit the size of initial requests. Similar to @code{request_size}, but only
used during initial format parsing. Useful for formats like MXF or MOV that
require frequent seeks during header parsing. Lasts until the demuxer makes a
read request larger than this size (without a seek in between), after which
the implementation will continue using requests as usual. Disabled (set to 0)
by default.
Note that if enabling this option, it's strongly recommended to also enable
the @option{multiple_requests} option, as well as setting
@option{short_seek_size} to the same value or higher.
@item post_data
Set custom HTTP post data.
@@ -986,6 +999,11 @@ Set TCP_NODELAY to disable Nagle's algorithm. Default value is 0.
@emph{Remark: Writing to the socket is currently not optimized to minimize system calls and reduces the efficiency / effect of TCP_NODELAY.}
@item tcp_keepalive=@var{1|0}
Enable the TCP keepalive mechanism to detect dead peers and help maintain long-lived idle connections. Default value is 0.
Only the basic keepalive option (SO_KEEPALIVE) can be enabled or disabled. Platform-specific tuning parameters such as TCP_KEEPIDLE, TCP_KEEPINTVL, or TCP_KEEPCNT are not configurable and will use the operating system's default values.
@end table
For example to read with @command{ffplay} a multimedia resource named
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.