ezmlm-archive(1) ezmlm-archive(1)
NAME
ezmlm-archive - create thread and author index for a mail-
ing list archive
SYNOPSIS
ezmlm-archive [ -cCFTvV ][ -f msg1 ] ][ -t msg2 ] dir
DESCRIPTION
ezmlm-archive reads the index files from a message
archive, and creates a subject index, a collection of sub-
ject files, and a collection of author files. These files
are suitable as an index for WWW access to, and navigation
through a mailing list archive by ezmlm-cgi(1).
The index files read are created by ezmlm-idx(1) on a per-
list basis and by ezmlm-send(1) on a per-message archive
for a indexed list.
The output files created are:
dir/archive/threads/yyyymm
The thread index. It contains one line per subject,
starting with the number of the first message with
that subject within the set investigated, ``:'', a
20 character subject hash, blank, ``'' where ``n''
is the number of messages in the thread, blank, and
the subject. The file ``yyyymm'' contains entries
for all threads that have messages in the month
``yyyymm'' or that have messages both before and
after that month. The subject hash is a key to the
subject files; the message number is a key to the
index file. The lines are in ascending order by
message number when the index is created de novo on
an existing archive. When the messages are added
one-by-one as in normal archive operation, ``n'' is
the number of message in the thread for the partic-
ular month and the order is in reverse of latest
message, i.e. the last extended thread is shown
last. The message number accompanying a thread is
always a message within the thread. It is the first
in archives created on existing lists, and the last
message in incrementally created archives. Use the
corresponding subject index file to get a list of
all messages in the thread in ascending order.
dir/archive/subjects/xx/yyyyyyyyyyyyyyyyyy
A subject file. The first line is the subject hash,
a space, and the subject. This is followed by one
line per message with this subject, in the format
message number, ``:'', date (yyyymm), ``:'', author
hash, blank, author from line. The lines are sorted
by message number. The author hash is a key to the
author files; the message number is a key to the
index file. The file in the example would be for
1
ezmlm-archive(1) ezmlm-archive(1)
the subject hash ``xxyyyyyyyyyyyyyyyyyy''.
dir/archive/authors/xx/yyyyyyyyyyyyyyyyyy
An author file. The first line is the author hash,
a space, and the author from line. This is fol-
lowed by one line per message with this author, in
the format message number, ``:'', date (yyyymm),
``:'', subject hash, blank, subject. The lines are
sorted by message number. The subject hash is a key
to the subject files; the message number is a key
to the index file. The file in the example would be
for the author hash ``xxyyyyyyyyyyyyyyyyyy''.
dir/archnum keeps track of the last message pro-
cessed. Normally, ezmlm-archive will process
entries for messages from one above the contents of
this file up to an including the message number in
dir/num.
OPTIONS
ezmlm-archive writes messages in a crash-proof manner when
run in normal mode. When overriding the normal message
range with any of the options listed, the normal sync(3)
of the output files is suppressed for efficiency. Should
the computer crash during this time the state of the
indices is not defined. Use the -s option in the
(extremely rare) cases where this would be a problem.
-c Create a new index. This overrides dir/archnum
causing ezmlm-archive to start with the first mes-
sage in the archive. Synonym for -f0. NOTE: ezmlm-
archive does not remove files in the index. While
it will overwrite/update old files it will not
remove files that are obsolete for other reasons.
-C (Default.) Process entries starting with the mes-
sage after the message listed in dir/archnum.
-f msg1
Process messages from the archive section (set of
100 messages) containing message msg1. This is
useful if you have removed part of the archive, as
it will shorten processing time and decrease memory
use. NOTE: ezmlm-archive does not remove files in
the index. While it will overwrite/update old files
it will not remove files that are obsolete for
other reasons. The number of messages per thread
will be incorrect when using of the -f and -t
switches leads to partial re-indexing of already
indexed messages.
-F (Default.) Do not change the starting message from
the default (see -C).
2
ezmlm-archive(1) ezmlm-archive(1)
-s Always sync files.
-S (Default.) Sync files, except when on of the mes-
sage range modifying options is used.
-t msg2
Process messages to message msg2 instead of the
last message in the archive. Again, files written
are corrected, but other files are not explicitly
removed.
-T (Default.) Process entries for messages up to the
last message in the archive.
-v Display ezmlm-archive version info.
-V Display ezmlm-archive version info.
MEMORY USAGE
ezmlm-archive stores its linked lists in memory. On at
32-bit architecture, it uses 12 bytes per message, 28
bytes per thread (plus one copy of the subject), and 20
bytes per author (plus one copy of the author from line).
In normal list use, it processes only at most a few mes-
sages at a time, but for initial processing of a large
archive, considerable amounts of memory may be used.
Assuming 40 bytes for subject/from line, 5 messages per
thread, 100,000 messages, and 1000 authors, this is 2.5
MB. For 1,000,000 messages this is about 20 MB.
Thus, for large archives, it may be useful to use the -t
switch to process the archive in multiple subsets, start-
ing with e.g. the first 100,000, then the next, and so on.
SEE ALSO
ezmlm-cgi(1), ezmlm-idx(1), ezmlm-send(1), ezmlm(5)
3
© 1994 Man-cgi 1.15, Panagiotis Christias <christia@theseas.ntua.gr>