Crazy Audio Offsets?

jewser joshua at b...
Mon, 27 May 2002 01:04:43 -0000


--- In ExtractStream@y..., "anddmx2001" <anddmx2001@y...> wrote:
> When I usually do an extract, I usually extract as 1 file. This 
time 
> I decided to extract as multiple, smaller files. Now, when I Vsplit 
> these files, this is what I got.... 
> 
> Part1- perfect
> part2- 6ms early
> part3- 9ms late
> part4- 1ms early
> part5- 17ms early
> part6- 6ms late
> part7- 17ms late
> part8- 9ms early
> part9- 4ms early
> part10- 15ms early
> part11- 17ms late
> 
> Is it normal for the offset to be diferrent from .ty to .ty? I have 
> never extracted this way, so please ignore my ignorance. :)

Yeah. That is normal. Roger was pretty darn close. It works like 
this...

First ignore the tivo part for a moment. When an mpeg-2 program 
elementary stream (PES) is made there is a timestamp attached to 
each "block" of real data. These blocks for video are a single 
PICTURE (with an 00 00 01 00 header). These have a fairly regular 
increase in timestamp from frame to frame. Which makes sense as it 
should be 29.97 fps for a tivo stream. For an Audio stream these 
blocks are just regularly spaced chunks. These remain consistent 
through the length of a single encoding but can change from encoding 
to encoding. I have seen 32ms, 36ms, and 48ms in my testing and 
development.

These 2 streams, technically Elementary Streams, are "smashed" 
together in a multiplexed arrangement. In the simplest lay terms "we 
get a bunch of video chunks, then a bunch of audio chunks, then some 
more video and so on..." What VSplit is current doing is taking 
the "busted" multiplexed form from the tivo's TyStream and getting 
rid of the multiplexing headers and information creating 2 standard 
ES streams. These can then be used by a wide variety of tools etc...

Now the Tivo TyStream file is another layer on top of the "busted" 
multiplex streams. It is almost completely independant of the actual 
mpeg2 stream... This is a broad generalization but is reasonably 
accurate. A few headers will line up as the start of an audio block 
or video block. (Again these being either a picture or a 24ms segment 
of audio.) The rest are just... something the tivo itself uses.

Now all of that was leading up to an answer for you question... Each 
block picture or audio frame is potentially a different size. 
Sometimes do the tivo's internal mechanisms and more often do to the 
nature of encoding. A frame that changes "A LOT" will take more data 
to encode it than a frame that is almost identical to a near 
neighbor. So a regular splitting into 128k chunks will sometimes get 
more or less packets of both types. As a result things rarely if 
never line up consistently. But if it is all put together into 1 
large file and processing cleanly it will line up from start to end.

All of the above comes from what I figured out in writing VSplit.

--jdiner