[Clustalv.Readme -- Abstracted from clustalv.doc]


		Clustal V  Multiple Sequence Alignments.

		Documentation (Installation and Usage).

		Des Higgins
		European Molecular Biology Laboratory
		Postfach 10.2209
		D-6900 Heidelberg
		Germany.

		higgins@EMBL-Heidelberg.DE


This document describes how to install and use ClustalV on various 
machines.  ClustalV is a complete upgrade and rewrite of the Clustal 
package of multiple alignment programs (Higgins and Sharp, 1988 and 
1989).   The original programs were written in Fortran for 
microcomputers running MSDOS.   You carried out a complete alignment 
by running 3 programs in succession.   Later, these were merged into 
a single menu driven program with on-line help, for VAX/VMS.  
ClustalV was written in C and has all of the features of the old 
programs plus many new ones.  It has been compiled and tested using 
VAX/VMS C, Decstation ULTRIX C, Gnu C for Sun workstations, Turbo C 
for IBM PC's and Think C for Apple Mac's.   The original Clustal was 
written by Des Higgins while he was a Post-Doc in the lab of Paul 
Sharp in the Genetics Department, Trinity College, Dublin 2, 
Ireland. 

The main feature of the old package was the ability to carry out 
reliable multiple alignments of many sequences.  The sensitivity of 
the program is as good as from any other program we have tried, with 
the exception of the programs of Vingron and Argos (1991), while it 
works in reasonable time on a microcomputer.  The programs of 
Vingron and Argos are specialised for finding distant similarities 
between proteins but require mainframes or workstations and are more 
difficult to use.

The main new features are: profile alignments (alignments of old 
alignments); phylogenetic trees (Neighbor Joining trees calculated 
after multiple alignment with a bootstrapping option); better 
sequence input (automatically recognise and read NBRF/PIR, Pearson 
(Fasta) or EMBL/SwissProt formats); flexible alignment output 
(choose one of: old Clustal format, NBRF/PIR, GCG msf format or 
Phylip format); full command line interface (everything that you can 
do interactively can be specified on the command line).

In version 7 of the GCG package, there is a program called PILEUP 
which uses a very similar algorithm to the one in ClustalV.  There 
are 2 main differences between the programs: 1) the metric used to 
compare the sequences for the initial "guide tree" uses a full 
global, optimal alignment in PILEUP instead of the fast, approximate 
ones in ClustalV.  This makes PILEUP much slower for the comparison 
of long sequences.  In principle, the distances calculated from 
PILEUP will be more sensitive than ours, but in practice it will not 
make much difference, except in difficult cases.  2)  During the 
multiple alignment, terminal gaps are penalised in ClustalV but not 
in PILEUP.  This will make the PILEUP alignments better when the 
sequences are of very different lengths (has no effect if there are 
no large terminal gaps).   


This software may be distributed and used freely, provided that you 
do not modify it or this documentation in any way without the 
permission of the authors.  

If you wish to refer to ClustalV, please cite: 
Higgins,D.G. Bleasby,A.J. and Fuchs,R. (1991) ms. submitted to 
CABIOS.  

The overall multiple alignment algorithm was described in:
Higgins,D.G. and Sharp,P.M. (1989).  Fast and sensitive multiple 
sequence alignments on a microcomputer.  CABIOS, vol. 5, 151-153.


ACKNOWLEDGEMENTS.

D.H. would particularly like to thank Paul Sharp, in whose lab. this 
work originated.  We also thank Manolo Gouy, Gene Myers, Peter Rice 
and Martin Vingron for suggestions, bug-fixes and help.    

Des Higgins and Rainer Fuchs, 
EMBL Data Library, Heidelberg, Germany.

Alan Bleasby,  
Daresbury, UK.

JUNE 1991
*******************************************************************

		2.  Installation.


As far as possible, we have tried to make ClustalV portable to any 
machine with a standard C compiler (proposed ANSI C standard).  The 
source code, as supplied by us, has been compiled and tested using 
the following compilers:

VAX/VMS C
Ultrix C (on a Decstation 2100)
Gnu C on a Sun 4 workstation
Think C on an Apple Macintosh SE
Turbo C on an IBM AT.

In each case, one must make 1 change to 1 line of code in 1 header 
file.  This is described below.  The exact capacity of the program 
(how many sequences of what length can be aligned) will depend of 
course on available memory but can also be set in this header file.

The package comes as 9 C source files; 3 header files; 1 file of on-
line help; this documentation file; 3 make files:

Source code:	clustalv.c, amenu.c, gcgcheck.c, myers.c, sequence.c, 
			showpair.c, trees.c, upgma.c, util.c

Header files:	clustalv.h, general.h, matrices.h

On-Line help:	clustalv.hlp  (must be renamed or defined as 		
			clustalv_help except on PC's)

Documentation:	clustalv.doc (this file).

Makefiles:	makefile.sun (gnu c on Sun), makefile.vms (vax/vms), 
			makefile.ult (ultrix).