[Clustalv.Readme -- Abstracted from clustalv.doc] Clustal V Multiple Sequence Alignments. Documentation (Installation and Usage). Des Higgins European Molecular Biology Laboratory Postfach 10.2209 D-6900 Heidelberg Germany. higgins@EMBL-Heidelberg.DE This document describes how to install and use ClustalV on various machines. ClustalV is a complete upgrade and rewrite of the Clustal package of multiple alignment programs (Higgins and Sharp, 1988 and 1989). The original programs were written in Fortran for microcomputers running MSDOS. You carried out a complete alignment by running 3 programs in succession. Later, these were merged into a single menu driven program with on-line help, for VAX/VMS. ClustalV was written in C and has all of the features of the old programs plus many new ones. It has been compiled and tested using VAX/VMS C, Decstation ULTRIX C, Gnu C for Sun workstations, Turbo C for IBM PC's and Think C for Apple Mac's. The original Clustal was written by Des Higgins while he was a Post-Doc in the lab of Paul Sharp in the Genetics Department, Trinity College, Dublin 2, Ireland. The main feature of the old package was the ability to carry out reliable multiple alignments of many sequences. The sensitivity of the program is as good as from any other program we have tried, with the exception of the programs of Vingron and Argos (1991), while it works in reasonable time on a microcomputer. The programs of Vingron and Argos are specialised for finding distant similarities between proteins but require mainframes or workstations and are more difficult to use. The main new features are: profile alignments (alignments of old alignments); phylogenetic trees (Neighbor Joining trees calculated after multiple alignment with a bootstrapping option); better sequence input (automatically recognise and read NBRF/PIR, Pearson (Fasta) or EMBL/SwissProt formats); flexible alignment output (choose one of: old Clustal format, NBRF/PIR, GCG msf format or Phylip format); full command line interface (everything that you can do interactively can be specified on the command line). In version 7 of the GCG package, there is a program called PILEUP which uses a very similar algorithm to the one in ClustalV. There are 2 main differences between the programs: 1) the metric used to compare the sequences for the initial "guide tree" uses a full global, optimal alignment in PILEUP instead of the fast, approximate ones in ClustalV. This makes PILEUP much slower for the comparison of long sequences. In principle, the distances calculated from PILEUP will be more sensitive than ours, but in practice it will not make much difference, except in difficult cases. 2) During the multiple alignment, terminal gaps are penalised in ClustalV but not in PILEUP. This will make the PILEUP alignments better when the sequences are of very different lengths (has no effect if there are no large terminal gaps). This software may be distributed and used freely, provided that you do not modify it or this documentation in any way without the permission of the authors. If you wish to refer to ClustalV, please cite: Higgins,D.G. Bleasby,A.J. and Fuchs,R. (1991) ms. submitted to CABIOS. The overall multiple alignment algorithm was described in: Higgins,D.G. and Sharp,P.M. (1989). Fast and sensitive multiple sequence alignments on a microcomputer. CABIOS, vol. 5, 151-153. ACKNOWLEDGEMENTS. D.H. would particularly like to thank Paul Sharp, in whose lab. this work originated. We also thank Manolo Gouy, Gene Myers, Peter Rice and Martin Vingron for suggestions, bug-fixes and help. Des Higgins and Rainer Fuchs, EMBL Data Library, Heidelberg, Germany. Alan Bleasby, Daresbury, UK. JUNE 1991 ******************************************************************* 2. Installation. As far as possible, we have tried to make ClustalV portable to any machine with a standard C compiler (proposed ANSI C standard). The source code, as supplied by us, has been compiled and tested using the following compilers: VAX/VMS C Ultrix C (on a Decstation 2100) Gnu C on a Sun 4 workstation Think C on an Apple Macintosh SE Turbo C on an IBM AT. In each case, one must make 1 change to 1 line of code in 1 header file. This is described below. The exact capacity of the program (how many sequences of what length can be aligned) will depend of course on available memory but can also be set in this header file. The package comes as 9 C source files; 3 header files; 1 file of on- line help; this documentation file; 3 make files: Source code: clustalv.c, amenu.c, gcgcheck.c, myers.c, sequence.c, showpair.c, trees.c, upgma.c, util.c Header files: clustalv.h, general.h, matrices.h On-Line help: clustalv.hlp (must be renamed or defined as clustalv_help except on PC's) Documentation: clustalv.doc (this file). Makefiles: makefile.sun (gnu c on Sun), makefile.vms (vax/vms), makefile.ult (ultrix).