The following multiple sequence alignment program may be obtained from the author: Martin Vingron Vingron@EMBL.Bitnet He distributes it via bitnet send/receive in C source. It is developed for VAX-VMS computers. See the copyright notice at the end of this file. My opinion of MALI is that it has some promise as a good multiple aligner (see the CABIOS article referenced below), but that it is currently too much of a memory hog to be useable with sequences of interesting length (>500 or 1000 bases) -- it requires several matrices of the order NxN, where N= maximum sequence length. -- Don Gilbert, IuBio archivist Archive@IuBio.Bio.Indiana.Edu - - - - - - - - - - - - - - - - - - - - - - - - - DESCRIPTION OF PROGRAMS MALI ~~~~ MALI takes as input one (or more) files of sequences in NBRF format. The sequences should be visibly similiar. Output is a file with the alignment of the sequences in a certain format. A readable alignment is produced from this file with SHOW. ~~~~ The FORMAT for the sequences is the one written by copying from the NBRF protein-database. Sequence has to be given in one letter code, such that the file starts with: >P1;name Then there is one line that will be ignored and then the sequence in more or less any format ending with "*". The name must not be more than 16 character. There can be any number of sequences in a file. See example.seq for an example. PRALI ~~~~~ PRALI takes two alignment-files (no sequence files!, but the ones produced by MALI or ALIPREP, see below) and calculates an alignment between these two. This can be applied to very similiar groups of sequences as well as to sequences where it is not clear if there is a relation- ship at all. Output is an alignment-file again. CORR_PRALI does the same job but uses the corrected profile as described in the paper. In case one wishes to align single sequences using PRALI, these sequences have to be transformed into alignment-files using ALIPREP. ~~~~~~~ Parameters: ~~~~~~~~~~ either copy the files align_defaults dayhoff_matrix into the directory where you are running the programs; or if you want to run the program in another directory set them as logical names. Hints: ~~~~~ regarding MALI: The wordlength (see file align_defaults) should be either 2 or 3. The program in the beginning prints a "number of fragments". 2 results in more fragments being found, 3 gives less fragments. If the number of fragments is very high (i.e. approaching 1000 or more) then the wordlength should be raised. If the number of fragments is low (under 50 e.g.) then either the sequences are weakly related or the wordlength was set to 3 and should be lowered. regarding PRALI: recommended choices for gapweights are: weight2 always 1. weight1 either 5 for 10. 5 is good for sequences of different length where many gaps are expected. 10 when few gaps are expected. In difficult cases maybe also try values in between. punishment for left and right end gaps: Usually leave both 0, which means that end gaps are not punished. Only where left and/or right ends should be aligned with each other set to 1. These programs are based on the algorithm described in: Martin Vingron and Patrick Argos A fast and sensitive multiple sequence alignment algorithm. CABIOS, vol.5, no.2, 1989, pages 115-121. ======================================================== INSTALLATION To install the programs use the command file install.com. This command procedure will also restore the full names of files in case the last letters were chopped off on the journey through BITNET. If you got the files as mail messages the subject header will contain the file name. If you have mms available you can use MALI.MMS and PRALI.MMS, which contain which programs to compile and link. Just type mms/descr=mali mms/descr=prali mms/descr=corr_prali to do this. Programs SHOW and ALIPREP just need to be compiled and linked. You should have gotten the following files: dir/col=1 ALIGN.C;1 ALIGN.H;1 ALIGN_DEFAULTS.;1 ALIPREP.C;1 ASK.C;1 BUILD_GRAPH.C;1 CALC_FRAGMENTS.C;1 CLUSTERING.C;1 CODES.C;1 CODES.H;1 CORR_PRALI.MMS;1 DAYHOFF_MATRIX.;1 ENCR.C;1 ERROR.C;1 ERROR.H;1 EXAMPLE.SEQ;1 GENERAL.H;1 INSTALL.COM;1 ITERALI.C;1 MAKEPROF.C;1 MAKE_CORR_PROF.C;1 MALI.C;1 MALI.MMS;1 MATCH_CLUSTERING.C;1 MAX_VALUE.C;1 PRALI.C;1 PRALI.MMS;1 READ.ME;1 READSEQUENCES.C;1 READ_SEQS.C;1 REFINE.C;1 RTRIM.C;1 SCPROD.C;1 SEQUENCE.H;1 SHOW.C;1 SHOW2.C;1 SHOW90.C;1 SHOW_EDIT.C;1 SLC.C;1 SORT.C;1 STORAGE.C;1 TWO_WAY_ALI.C;1 VEC_TO_LIST.C;1 WRITE_SEQ.C;1 Total of 44 files. ======================================== The academic user is required to adhere to the following policies. (1) The software will not be distributed to others beyond the immediate research group without the permission of the authors. (2) All bugs, their correction, and any program extensions are to be reported to the author. (3) The copyright messages must remain intact within the programs. (4) The software package or any parts of it in modified or original form cannot be sold commercially. (5) The software cannot be distributed to commercial users nor used for their purposes. COPYRIGHT BY MARTIN VINGRON, 1989