The problem
I was recently asked to help a fellow postgrad fix his network simulator tracefile analyser to run concurrently with the simulator. The problem was that the analyser was reading the tracefile faster than the simulator could fill it, and thus stopping its analysis when it reached the end of the file without waiting for more data.
The solution
The first solution was just to sit in an infinite loop waiting for more data, but then you have to kill the analyser to stop reading and you don't get the report at the end. After some consultation with the gurus in #schlock_mercenary on the Nightstar IRC network (thanks Rhamphoryncus (Adam Olsen in meatspace) and Vornicus), the following code was born:
import time class logfile(file): def next(self): if self.logTimeout <= 0 or self.logSleepTime <= 0: counter = None else: counter = self.logTimeout/self.logSleepTime fpos = self.tell() # Store current position in file. while counter == None or counter > 0: line = self.readline() if line: if line[-1] == "\n" or not self.logOnlyCompleteLines: return line self.seek(fpos) # "Unread" incomplete line. time.sleep(0.1) if counter != None: counter -= 1 raise StopIteration def __init__(self, *args, **kwargs): file.__init__(self, *args, **kwargs) self.logTimeout = 5 self.logSleepTime = 0.1 self.logOnlyCompleteLines = True if __name__ == "__main__": import sys if len(sys.argv) > 1: fname = sys.argv[1] f = logfile(fname) for line in f: print line, f.close()
How to use it
The class is used exactly as if it were a file. The only
real difference is that a loop over the file contents (for line in
file:) will wait a certain amount of time for more data before it
ends. This threshold can be set with logfile.logTimeout =
<number>. The length of time it sleeps between looking for
new data can be set with logfile.logSleepTime =
<number>. If either of those parameters are zero or
negative, it will wait indefinitely for more data. If
logfile.logOnlyCompleteLines is True, it'll wait
for a complete line (ending in a newline) before it returns. This is
useful if you don't want to (or can't) process part lines and whatever's
writing the data doesn't buffer writes for full lines.
How it happened
I started with everything in an infinite while loop
running over a normal file object, but that seemed messy.
Rhamphoryncus suggested subclassing file, and things
snowballed from there.
I tried overriding readline(), but apparently the
next() in a normal file doesn't use it. Then I
tried using the inherited next(), but the readahead kept
causing problems. Since my implementation doesn't use the readahead it's
not as efficient reading the existing parts of the file, but sometimes you
have to make a tradeoff.
This works for everything I've tested it with. If you have any problems or ideas for features, give me a shout.