The problem

I was recently asked to help a fellow postgrad fix his network simulator tracefile analyser to run concurrently with the simulator. The problem was that the analyser was reading the tracefile faster than the simulator could fill it, and thus stopping its analysis when it reached the end of the file without waiting for more data.

The solution

The first solution was just to sit in an infinite loop waiting for more data, but then you have to kill the analyser to stop reading and you don't get the report at the end. After some consultation with the gurus in #schlock_mercenary on the Nightstar IRC network (thanks Rhamphoryncus (Adam Olsen in meatspace) and Vornicus), the following code was born:

import time

class logfile(file):
    def next(self):
        if self.logTimeout <= 0 or self.logSleepTime <= 0:
            counter = None
        else:
            counter = self.logTimeout/self.logSleepTime
        fpos = self.tell() # Store current position in file.
        while counter == None or counter > 0:
            line = self.readline()
            if line:
                if line[-1] == "\n" or not self.logOnlyCompleteLines:
                    return line
                self.seek(fpos) # "Unread" incomplete line.
            time.sleep(0.1)
            if counter != None: counter -= 1
        raise StopIteration

    def __init__(self, *args, **kwargs):
        file.__init__(self, *args, **kwargs)
        self.logTimeout = 5
        self.logSleepTime = 0.1
        self.logOnlyCompleteLines = True

if __name__ == "__main__":
    import sys
    if len(sys.argv) > 1:
        fname = sys.argv[1]
        f = logfile(fname)
        for line in f: print line,
        f.close()

How to use it

The class is used exactly as if it were a file. The only real difference is that a loop over the file contents (for line in file:) will wait a certain amount of time for more data before it ends. This threshold can be set with logfile.logTimeout = <number>. The length of time it sleeps between looking for new data can be set with logfile.logSleepTime = <number>. If either of those parameters are zero or negative, it will wait indefinitely for more data. If logfile.logOnlyCompleteLines is True, it'll wait for a complete line (ending in a newline) before it returns. This is useful if you don't want to (or can't) process part lines and whatever's writing the data doesn't buffer writes for full lines.

How it happened

I started with everything in an infinite while loop running over a normal file object, but that seemed messy. Rhamphoryncus suggested subclassing file, and things snowballed from there.

I tried overriding readline(), but apparently the next() in a normal file doesn't use it. Then I tried using the inherited next(), but the readahead kept causing problems. Since my implementation doesn't use the readahead it's not as efficient reading the existing parts of the file, but sometimes you have to make a tradeoff.

This works for everything I've tested it with. If you have any problems or ideas for features, give me a shout.