Zane H. Healy wrote:
> He would like some
> volunteers to hack the python code to be more efficient or something,
> but I don't know python.
Sounds more like it needs rewriting in a more efficient language to me (isn't
Python interpreted? Probably not a good choice for search indexing!)
I can't imagine the indexing code's *that* complicated - I expect it's the
search side and how to quickly find results in the masses of data that tends
to be the tricky part.
"Hello, World" to run on my VMS server). It
sounds to me like this
is a good candidate for running either once a week, or once a month.
Hmm, isn't the dictionary (mapping words to codes) static and built up from
existing archives? 'new' words found subsequently will get longer codes
assigned to them and be less efficient, but if the initial sample data is
large it won't necessarily matter. Beats rebuilding the dictionary and
re-assigning codes all the time, assuming that's what's done at the moment.
If the dictionary is static then files can be indexed as they arrive rather
than the whole archive needing scanning every x hours to keep indexes in sync.
(This was the assumption I was basing some desktop search code on which I was
writing - but that's another one of those half-done projects that's sitting a
way down on the priority list to complete right now. I found myself not being
able to find anything in my local archives and couldn't find anything
available on the 'net to do the job)
cheers
Jules