Crazy Audio Offsets?
jewser
joshua at b...
Mon, 27 May 2002 01:04:43 -0000
--- In ExtractStream@y..., "anddmx2001" <anddmx2001@y...> wrote:
> When I usually do an extract, I usually extract as 1 file. This
time
> I decided to extract as multiple, smaller files. Now, when I Vsplit
> these files, this is what I got....
>
> Part1- perfect
> part2- 6ms early
> part3- 9ms late
> part4- 1ms early
> part5- 17ms early
> part6- 6ms late
> part7- 17ms late
> part8- 9ms early
> part9- 4ms early
> part10- 15ms early
> part11- 17ms late
>
> Is it normal for the offset to be diferrent from .ty to .ty? I have
> never extracted this way, so please ignore my ignorance. :)
Yeah. That is normal. Roger was pretty darn close. It works like
this...
First ignore the tivo part for a moment. When an mpeg-2 program
elementary stream (PES) is made there is a timestamp attached to
each "block" of real data. These blocks for video are a single
PICTURE (with an 00 00 01 00 header). These have a fairly regular
increase in timestamp from frame to frame. Which makes sense as it
should be 29.97 fps for a tivo stream. For an Audio stream these
blocks are just regularly spaced chunks. These remain consistent
through the length of a single encoding but can change from encoding
to encoding. I have seen 32ms, 36ms, and 48ms in my testing and
development.
These 2 streams, technically Elementary Streams, are "smashed"
together in a multiplexed arrangement. In the simplest lay terms "we
get a bunch of video chunks, then a bunch of audio chunks, then some
more video and so on..." What VSplit is current doing is taking
the "busted" multiplexed form from the tivo's TyStream and getting
rid of the multiplexing headers and information creating 2 standard
ES streams. These can then be used by a wide variety of tools etc...
Now the Tivo TyStream file is another layer on top of the "busted"
multiplex streams. It is almost completely independant of the actual
mpeg2 stream... This is a broad generalization but is reasonably
accurate. A few headers will line up as the start of an audio block
or video block. (Again these being either a picture or a 24ms segment
of audio.) The rest are just... something the tivo itself uses.
Now all of that was leading up to an answer for you question... Each
block picture or audio frame is potentially a different size.
Sometimes do the tivo's internal mechanisms and more often do to the
nature of encoding. A frame that changes "A LOT" will take more data
to encode it than a frame that is almost identical to a near
neighbor. So a regular splitting into 128k chunks will sometimes get
more or less packets of both types. As a result things rarely if
never line up consistently. But if it is all put together into 1
large file and processing cleanly it will line up from start to end.
All of the above comes from what I figured out in writing VSplit.
--jdiner