avformat/hlsenc: fix segment duration with mixed stream time bases

When audio and video streams have different time bases (e.g. video at
1/90000 and audio at 1/48000), vs->start_pts was stored as a raw PTS
from whichever stream's packet arrived first. The segment split
comparison then subtracted this value from the current packet's PTS
without accounting for the time base difference, producing incorrect
elapsed time calculations.

This caused segments to be split at wrong points — either too
frequently (on every keyframe) or not at all, depending on the
relative magnitudes of the time bases.

Fix by normalizing vs->start_pts to AV_TIME_BASE_Q at the point of
assignment and converting pkt->pts to the same base before comparison.
This ensures the segment split decision is always unit-consistent
regardless of which stream's packet is being evaluated.

The bug is most easily triggered by HLS muxing with video passthrough
and audio transcode, where the video retains its container time base
while the audio encoder outputs in its native time base.

Signed-off-by: Ben Kepner <u6bkep@gmail.com>
This commit is contained in:
Ben Kepner
2026-04-05 16:53:06 -04:00
committed by stevenliu
parent 566ad7869e
commit a327bc0561
+9 -6
View File
@@ -2461,13 +2461,16 @@ static int hls_write_packet(AVFormatContext *s, AVPacket *pkt)
}
if (vs->start_pts == AV_NOPTS_VALUE) {
vs->start_pts = pkt->pts;
vs->start_pts = av_rescale_q(pkt->pts, st->time_base, AV_TIME_BASE_Q);
if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO)
vs->start_pts_from_audio = 1;
}
if (vs->start_pts_from_audio && st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO && vs->start_pts > pkt->pts) {
vs->start_pts = pkt->pts;
vs->start_pts_from_audio = 0;
if (vs->start_pts_from_audio && st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
int64_t video_start = av_rescale_q(pkt->pts, st->time_base, AV_TIME_BASE_Q);
if (vs->start_pts > video_start) {
vs->start_pts = video_start;
vs->start_pts_from_audio = 0;
}
}
if (vs->has_video) {
@@ -2498,8 +2501,8 @@ static int hls_write_packet(AVFormatContext *s, AVPacket *pkt)
}
can_split = can_split && (pkt->pts - vs->end_pts > 0);
if (vs->packets_written && can_split && av_compare_ts(pkt->pts - vs->start_pts, st->time_base,
end_pts, AV_TIME_BASE_Q) >= 0) {
if (vs->packets_written && can_split && (av_rescale_q(pkt->pts, st->time_base, AV_TIME_BASE_Q) - vs->start_pts
>= end_pts)) {
int64_t new_start_pos;
int byterange_mode = (hls->flags & HLS_SINGLE_FILE) || (hls->max_seg_size > 0);
double cur_duration;