[ExtractStream] Random SplitStream Thoughts...

Warren Toomey wkt at t...
Sat, 9 Feb 2002 23:20:14 +1000 (EST)


[ I just wrote this reply to Roger, and then I ralised that I had assumed
that Roger was working on a Unix/Linux system. If that's not the case,
if he's working on a Windoze system, then everything I've said may be 
untrue - Warren ]

In article by Roger Merchberger:
> 1) I think that splitstream can be made to work faster by allocating
> a lot more memory & increasing the file input/output buffers - right

If the program is bottlenecked by I/O then, no, getting it to allocate
more memory won't make it run any faster.

Just looking through the code right now, I can see it uses fwrite()
to write out the data, and fread() to read chunks in. They're pretty
efficient, and I doubt that you would improve matters by buffering any
more. You might be able to improve things a little bit with the use
of setvbuf(), but I wouldn't expect much.

[ Just tried it, I got a 5% improvement by using a CHUNK_SIZE
buffer and doing setvbuf(in_fp, mybuf, _IOFBF, sizeof(mybuf)); ]

In fact, it you buffer more chunks than you can fit into physical
memory, the process will start thrashing the virtual memory subsystem,
and this will _really_ slow the program down.

However, there might be one way of improving I/O performance, and that
is to use memory mapping. I'm a BSD person and I would recommend mmap()
and friends here. On SysV and relatives, something like shmat() and friends
could be used. I'm not a Linux person, but just looking at a Debian system
I can see something called memp_open().

> 2a) The first is very minor: the 128K memory allocated for the stream
> is never freed in the program, as far as I can see. Granted, as soon
> as the program exits, it'll be returned, but if the program abends
> the memory is lost until reboot.

No, this isn't true. Any Unix process will have all of its resources
freed by the operating system, regardless of how the process terminated.
So, this 128K buffer will always be reclaimed by the system.

> 2b) This one is fairly minor: When the memory is (asked to be)
> allocated, it's never checked to see if the memory allocation request
> actually went thru.

Yes, that's true and they should be checked.

> Well, I've been plinking with allocating memory & stuff, and I've
> gotten a better handle on it now, I think...
> I'll be testing changes to the new zss code in a few days.

What about compiling splitstream with the -pg option to gprof, and then
using gprof to see which functions are chewing the most CPU time?

[ Just did this, it's spending most of its time in fread() ]

It would also be a good idea to find out if Splitstream is more I/O
intensive or more CPU intensive. If you open another window and run top
before startign splitstream, you should see the CPU pretty idle. Then
start splitstream, if the CPU usage goes to 0.0% idle, then splitstream
must be CPU intensive, and you need to tune the code. If the CPU does
not go to 0.0% idle, then splitstream is not CPU intensive, and needs
to do its I/O faster to work.

[ I just tried splitstream out on my 266Mhz Pentium, and yes
it doesn't need CPU, it needs faster I/O ]

Suggestion if it is I/O intensive, make sure the input file and the
output files are on physically different disks. Then the operating
system can schedule I/O operations on each drive concurrently, and this
will speed things up.

Hope some of these suggestions are useful. Good luck with it Roger.

Warren
> 
> Laterz,
> Roger "Merch" Merchberger
> 
> __________________________________________________
> Do You Yahoo!?
> Send FREE Valentine eCards with Yahoo! Greetings!
> http://greetings.yahoo.com
> 
> 
> To unsubscribe from this group, send an email to:
> ExtractStream-unsubscribe@yahoogroups.com
> 
> 
> 
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 
> 
>