SHELXL-93 --------- SHELXL-93 is a FORTRAN-77 program for the refinement of crystal structures from diffraction data, and is primarily designed for single crystal X-ray data at atomic resolution. It is intended to be easy to install and use on a wide variety of computers, and replaces the structure-refining part of SHELX-76. SHELXL-93 is general and efficient for all space groups in all settings and there are no arbitrary limits to the size of problems which can be handled, except for the total memory available to the program. All instructions are in machine independent free format, with extensive use of default settings to minimize the amount of input required from the user. Instructions and data are taken from two standard (ASCII) text files, so that input files can easily be transferred between different computers. SHELXL-93 is provided in source form as well as precompiled PC versions. An application form is reproduced in Appendix F. The program is available free to academics (and for a modest license fee to commercial institutions) subject to the condition that it is acknowledged in all publications which report structures refined with it. SHELXL-93 has been designed particularly for optimum performance on computers with vector and pipelined architectures, and most vectorizing compilers achieve a high degree of vectorization for all important routines without special action on the part of the user. The rate-determining routines are provided in a separate file so that they may be compiled with full optimization; it may well prove counter-productive to optimize or vectorize the rest of the program! The distribution and installation of the program is discussed further in Appendix E. Two auxiliary programs are provided for use with SHELXL-93. PDBINS reads a PDB-format file for a protein and interactively generates a SHELXL-93 '.ins' input file, and CIFTAB reads the '.cif' output file from SHELXL-93 and produces various tables. Users are encouraged to adapt CIFTAB and PDBINS to local circumstances. PDBINS and CIFTAB are described in Appendices B and C respectively. PROGRAM AND FILE ORGANIZATION The way of running SHELXL-93 and the conventions for filenames will of course vary for different computers and operating systems, but the following general concept should be adhered to as much as possible. SHELXL-93 may be run on-line by means of the command: shelxl name where 'name' defines the first component of the filename for all files which correspond to a particular crystal structure. On some systems, 'name' may not be longer than 8 characters. Batch operation will normally require the use of a short batch file containing the above command etc. Before starting SHELXL-93, two ASCII input files must be prepared. The file 'name.ins' contains instructions, crystal and atom data etc. The reflection data file 'name.hkl' contains one line per reflection in the same fixed format as for SHELX-76; batch numbers, wavelength (for Laue data) and 'direction cosines' (for absorption corrections using SHELXA) are optional. Although both files are essentially upwardly compatible with SHELX-76 and the Siemens SHELXTL system, there are many new facilities and some important philosophical differences. When converting '.ins' files from these programs to SHELXL-93, it is a good idea to DELETE or modify all WGHT, OMIT, BLOC, ILSF and MERG instructions (because of changes in their specifications), to review all AFIX instructions for possible differences or more appropriate new options, and to free any coordinates which have been fixed to anchor the molecule in polar space groups (the program has a better way of doing this). The Fourier grid is now larger and the asymmetric unit is found automatically, so FMAP and GRID should be replaced by e.g. 'FMAP 2'. Free variables are no longer required for special position constraints or handling multiply occupied sites (see EXYZ and EADP) but are still legal and are interpreted in the same way. Disordered groups will probably require the addition of PART instructions and may benefit from some of the new restraints (SAME, SADI, FLAT, DELU and SIMU etc.). A brief summary of the progress of the structure refinement appears on the console (i.e. the standard FORTRAN output), and a full listing is written to a file 'name.lst', which can be printed or examined with a text editor. After each refinement cycle a file 'name.res' is (re)written; it is similar to 'name.ins', but has updated values for all refined parameters. It may be copied or edited to name.ins for the next refinement run. Optionally further files 'name.cif' (refinement results) and 'name.fcf' (reflection data) may be created (using the ACTA instruction) in CIF format for direct publication, archiving and input to other programs (e.g. CIFTAB - see Appendices C and D). Two mechanisms are provided for interaction with a SHELXL-93 job which is already running. The first, which it is not possible to implement on all computer systems, applies to 'on-line' runs. If the key combination is hit, the job terminates almost immediately, but without the loss of output buffers etc. which can happen with etc. Usually the key may be used as an alternative to . If the key is hit during least- squares refinement, the program completes the current cycle and then, instead of further refinement cycles, continues with the final structure-factor calculation, tables and Fourier etc. Otherwise has no effect. On computer consoles with no key, or usually have the same effect. The second mechanism requires the user to create the file 'name.fin' (the contents of this file are irrelevant); the program tries at regular intervals to delete it, and if it succeeds it takes the same action as after . The name.fin file is also deleted (if found) at the start of a job in case it has been accidentally left over from a previous job. This approach may be used with batch jobs under most operating systems. The UNIX version of SHELXL-93 is able to read the '.ins' and '.hkl' files in either UNIX or DOS format, and writes the '.res', '.cif' and '.fcf' files in DOS format, so that PC's can access such files via a shared disk without the need for conversion programs such as DOS2UNIX etc. The program may be compiled without this option if necessary. For reasons of efficiency the '.lst' file is always in the local format. Note that for UNIX systems all filenames associated with SHELXL-93 should be in lower case. The program uses two large arrays A and B dynamically, so the limits on the size of structure which can be handled are determined by the dimensions of these two arrays and also of the array C; A, B and C are defined as separate COMMON blocks. The standard version of the program is dimensioned for up to 1500 parameters in each full-matrix block and roughly 5000 atoms (assuming a generous number of restraints etc.), and is suitable for a typical (UNIX) workstation (or mainframe) with 8MB or more physical memory. The standard precompiled PC version is similarly dimensioned but will automatically run as a virtual memory program if less memory is available; it thus requires 8MB of free contiguous disk space (plus another 2MB or so for scratch files) and an 80586, 80486 or 80386/80387 processor. A real mode precompiled PC version PCSHELXL.EXE is also available which should run on virtually ANY PC with a coprocessor and 640K memory; however it is restricted to 300 full-matrix parameters and is somewhat slower. It may be necessary to redimension A, B and C and recompile the program for specific installations, e.g. to fit within a given job category on a mainframe. The highest elements of A and B actually used for the various calculations are printed out by the program (after 'Memory required ='). The program will try to use all available physical (and virtual) memory rather than performing its own disk I/O, thereby achieving longer vector 'runs', which enhances performance on vector and pipelined systems. In some cases, e.g. when a large structure is refined on a MicroVAX or PC with limited physical memory (or allocation of physical memory to a given process in the case of the VAX) this strategy may cause excessive 'paging' and disk I/O. If this happens, the maximum vector run length can be reduced by setting the 4th parameter on the L.S. instruction or by reducing the value of the variable IV in the main program and recompiling; it may also be more efficient to 'block' the refinement or use the CGLS option. THE '.ins' INSTRUCTION FILE - GENERAL ORGANIZATION All instructions commence with a four (or fewer) character word (which may be an atom name); numbers and other information follow in free format, separated by one or more spaces. Upper and lower case input may be freely mixed; with the exception of the text string input using TITL, the input is converted to upper case for internal use in SHELXL-93. The TITL, CELL, ZERR, LATT (if required), SYMM (if required), SFAC, DISP (if required) and UNIT instructions must be given in that order; all remaining instructions, atoms, etc. should come between UNIT and the last instruction, which is always HKLF (to read in reflection data). There is also a facility (which may not be possible under some operating systems) for reading instructions from (possibly nested) 'include files' by inserting the line '+filename' at the appropriate place in the '.ins' file. A number of instructions allow atom names to be referenced; use of such instructions without any atom names means 'all non-hydrogen atoms' (in the current residue, if one has been defined). A list of atom names may also be abbreviated to the first atom, the symbol '>' (separated by spaces), and then the last atom; this means 'all atoms between and including the two named atoms but excluding hydrogens'. For further details of the atom list syntax, see 'RESI' as well as the following examples. EXAMPLES OF SHELXL-93 STRUCTURE REFINEMENTS The two test structures supplied with the program are intended to provide a good illustration of routine structure refinement with SHELXL-93. The output discussed here should not differ significantly from that of the test jobs, except that it has been abbreviated and there may be slight differences in the last decimal place caused by rounding errors. ============================================================================== FIRST EXAMPLE (ags4): The first example (provided as the files 'ags4.ins' and 'ags4.hkl') is the final refinement job for the polymeric inorganic structure Ag(NCSSSSCN)2 AsF6. This structure is described by H.W. Roesky, T. Gries, J. Schimkowiak and P.G. Jones in Angew. Chem. 98 (1986) 93-94 [Int. Edn. 25 (1986) 84-85] and was also used as the cover picture for the SHELXS-86 manual. Each ligand bridges two Ag+ ions so each silver is tetrahedrally coordinated by four nitrogen atoms. The silver, arsenic and one of the fluorine atoms lie on special positions. Normally the four unique heavy atoms (from Patterson interpretation using SHELXS) would have been refined first isotropically and the remaining atoms found in a difference synthesis, and possibly an intermediate job would have been performed with the heavy atoms anisotropic and the light atoms isotropic. For test purposes we shall simply input the atomic coordinates which assumes isotropic U's of 0.05. In this job all atoms are to be made anisotropic (ANIS). We shall further assume that a previous job has recommended the weighting scheme used here (WGHT) and shown that one reflection is to be suppressed in the refinement because it is clearly erroneous (OMIT). The first 9 instructions (TITL...UNIT) are the same for any SHELXS and SHELXL-93 job for this structure and define the cell dimensions, symmetry and contents. The Siemens SHELXTL program XPREP can be used to generate these instructions automatically for any space group etc. SHELXL-93 knows the scattering factors for the first 94 neutral atoms in the Periodic Table. Ten least-squares cycles are to be performed, and the ACTA instruction ensures that the CIF files 'ags4.cif' and 'ags4.fcf' will be written for archiving and publication purposes. ACTA also sets up the calculation of bond lengths and angles (BOND) and a final difference electron density synthesis (FMAP 2) with peak search (PLAN 20). The HKLF 4 instruction terminates the file and initiates the reading of the 'ags4.hkl' intensity data file. Users migrating from SHELX-76 should note that it is still legal to set up special position constraints on the x,y,z-coordinates, occupation factors, and Uij components (for upwards compatibility). However it is totally unnecessary because the program will do this automatically for any special position in any space group, conventional or otherwise. Similarly the program recognizes polar space groups (P-4 is non-polar) and applies appropriate restraints (H.D. Flack and D. Schwarzenbach, Acta Cryst., A44 (1988) 499-506), so it is no longer necessary to worry about fixing one or more coordinates to prevent the structure drifting along polar axes. It is not necessary to set the overall scale factor using an FVAR instruction for this initial job, because the program will itself estimate a suitable starting value. Comments may be included in the '.ins' file either as REM instructions or as the rest of a line following '!'; this latter facility has been used to annotate this example. TITL AGS4 in P-4 ! title of up to 76 characters CELL 0.71073 8.381 8.381 6.661 90 90 90 ! wavelength and unit-cell ZERR 1 .002 .002 .001 0 0 0 ! Z (formula-units/cell), cell esd's LATT -1 ! non-centrosymmetric primitive lattice SYMM -X, -Y, Z SYMM Y, -X, -Z ! symmetry operators (x,y,z must be left out) SYMM -Y, X, -Z SFAC C AG AS F N S ! define scattering factor numbers UNIT 4 1 1 6 4 8 ! unit cell contents in same order L.S. 10 ! 10 cycles full-matrix least-squares ACTA ! CIF-output, bonds, Fourier, peak search OMIT -2 3 1 ! suppress bad reflection ANIS ! convert all (non-H) atoms to anisotropic WGHT 0.037 0.31 ! weighting scheme AG 2 .000 .000 .000 AS 3 .500 .500 .000 S1 6 .368 .206 .517 ! atom name, SFAC number, x, y, z (usually S2 6 .614 .966 .736 ! followed by sof and U(iso) or Uij); the C 1 .278 .095 .337 ! program automatically generates special N 5 .211 .030 .214 ! position constraints F1 4 .596 .325 -.007 F2 4 .500 .500 .246 HKLF 4 ! read h,k,l,Fo^2,sigma(Fo^2) from 'ags4.hkl' The '.lst' listing file starts with a header followed by an echo of the above '.ins' file. After reading TITL...UNIT the program calculates the cell volume, F(000), absorption coefficient, cell weight and density. If the density is unreasonable, perhaps the unit-cell contents have been given incorrectly. The next items in the '.lst' file are the connectivity table and the symmetry operations used to include a shell of symmetry equivalent atoms (so that all unique bond lengths and angles can be found): ------------------------------------------------------------------------------ Covalent radii and connectivity table for AGS4 in P-4 C 0.770 AG 1.440 AS 1.210 F 0.640 N 0.700 S 1.030 Ag - N N_$4 N_$5 N_$3 As - F2 F2_$6 F1_$7 F1_$6 F1_$1 F1 S1 - C S2_$1 S2 - S2_$2 S1_$1 C - N S1 N - C Ag F1 - As F2 - As Operators for generating equivalent atoms: $1 -x+1, -y+1, z $2 -x+1, -y+2, z $3 -x, -y, z $4 y, -x, -z $5 -y, x, -z $6 y, -x+1, -z $7 -y+1, x, -z ------------------------------------------------------------------------------ Note that in addition to symmetry operations generated by the program, one can also define operations with the EQIV instruction and then refer to the corresponding atoms with _$n in the same way. Thus: EQIV $1 1-x, 1-y, z EQIV $2 x, y-1, z EQIV $3 1-x, -y, z CONF S1 S2_$1 S2_$2 S1_$3 could have been included in 'ags4.ins' to calculate the S-S-S-S torsion angle. Only one new operator would have been required if S2 were bonded to S1 in the original atom list. If EQIV instructions are used, the program renumbers the other symmetry operators accordingly. The next part of the output is concerned with the data reduction: ------------------------------------------------------------------------------ 1475 Reflections read, of which 0 rejected 0 =< h =< 10, -9 =< k =< 10, 0 =< l =< 8, Max. 2-theta = 55.00 0 Systematic absence violations Inconsistent equivalents etc. h k l Fo^2 Sigma(Fo^2) Esd of mean(Fo^2) 3 4 0 387.25 8.54 47.78 1 Inconsistent equivalents 904 Unique reflections, of which 1 suppressed R(int) = 0.0165 R(sigma) = 0.0202 Friedel opposites not merged Maximum memory for data reduction = 955 / 9083 ------------------------------------------------------------------------------ Throughout this documentation, Sigma with a capital S means a summation, and sigma with a small s is an esd. Fo^2 means the EXPERIMENTAL measurement, and so, despite the square, may possibly be slightly negative if the background is higher than the peak as a result of statistical fluctuations etc. R(int) and R(sigma) are defined as follows: R(int) = Sigma | Fo^2 - Fo^2(mean) | / Sigma [ Fo^2 ] where both summations involve all input reflections for which more than one symmetry equivalent is averaged, but not the remaining reflections, and: R(sigma) = Sigma [ sigma(Fo^2) ] / Sigma [ Fo^2 ] over all reflections in the merged list. Since these R-indices are based on F^2, they will tend to be about twice as large as the corresponding indices based on F. The 'esd of the mean' (in the table of inconsistent equivalents) is the rms deviation from the mean divided by the square root of (n-1), where n equivalents are combined for a given reflection. In estimating the sigma(F^2) of a merged reflection, the program uses the value obtained by combining the sigma(F^2) values of the individual contributors, unless the esd of the mean is larger, in which case it is used instead. The memory statistics which appear at various points in the output give the highest elements of the A and B arrays used for the given calculation. Although it is easy to adjust these dimensions, it requires recompiling the program and will rarely be required. For example there is no limit on the number of reflections in this sort/merge stage - if there is less physical memory the program makes more use of the disk, which of course is slower. Special position constraints are then generated and the statistics from the first least-squares cycle are listed (the output has been compacted to fit the page). The maximum vector length refers to the number of reflections processed simultaneously in the rate-determining calculations; usually the program utilizes all available memory to make this as large as possible, subject to a maximum of 511. This maximum may be reduced (but not increased) by means of the fourth parameter on the L.S. (or CGLS) instruction; this may be required to prevent unnecessary disk transfers when large structures are refined on virtual memory systems with limited physical memory. The number of parameters refined in the current cycle is followed by the total number of refinable parameters (here both are 55). ------------------------------------------------------------------------------ Special position constraints for Ag x = 0.0000 y = 0.0000 z = 0.0000 U22 = 1.0 * U11 U23 = 0 U13 = 0 U12 = 0 sof = 0.25000 Special position constraints for As x = 0.5000 y = 0.5000 z = 0.0000 U22 = 1.0 * U11 U23 = 0 U13 = 0 U12 = 0 sof = 0.25000 Special position constraints for F2 x = 0.5000 y = 0.5000 U23 = 0 U13 = 0 sof = 0.50000 Least-squares cycle 1 Maximum vector length =511 Memory required =1095/82388 wR2 = 0.5042 before cycle 1 for 903 data and 55 / 55 parameters GooF = S = 3.480; Restrained GooF = 3.480 for 0 restraints Weight = 1/[sigma^2(Fo^2)+(0.0370*P)^2+0.31*P] where P=(Max(Fo^2,0)+2*Fc^2)/3 ** Shifts scaled down to reduce maximum shift/esd from 17.32 to 15.00 ** N value esd shift/esd parameter 1 2.38015 0.04260 32.401 OSF 2 0.08362 0.00224 14.993 U11 Ag 5 0.02864 0.00580 -3.679 U33 As 11 0.08546 0.00781 4.543 U33 S1 23 -0.01788 0.00444 -4.027 U12 S2 47 0.14422 0.01515 6.218 U33 F1 52 0.13288 0.02330 3.558 U11 F2 Mean shift/esd = 2.053 Maximum = 32.401 for OSF Max. shift = 0.055 A for C Max. dU = 0.049 for F2 ------------------------------------------------------------------------------ Only the largest shift/esd's are printed. More output could have been obtained using 'MORE 2' or 'MORE 3'. The largest correlation matrix elements are printed after the last cycle, in which the mean and maximum shift/esd have been reduced to 0.002 and 0.012 respectively. This is followed by the full table of refined coordinates and Uij's with esd's (too large to include here, but similar to the corresponding table in SHELX-76 except that Ueq and its esd are also printed) and by a final structure factor calculation: ------------------------------------------------------------------------------ Final Structure Factor Calculation for AGS4 in P-4 Total number of l.s. parameters = 55 Maximum vector length = 511 wR2 = 0.0779 before cycle 11 for 903 data and 2 / 55 parameters GooF = S = 1.063; Restrained GooF = 1.063 for 0 restraints Weight = 1/[sigma^2(Fo^2)+(0.0370*P)^2+0.31*P] where P=(Max(Fo^2,0)+2*Fc^2)/3 R1 = 0.0322 for 818 Fo > 4.sigma(Fo) and 0.0370 for all 904 data wR2 = 0.0834, GooF = S = 1.138, Restrained GooF = 1.138 for all data Flack x parameter = 0.0224 with esd 0.0260 (expected values are 0 (within 3 esd's) for correct and +1 for inverted absolute structure) ------------------------------------------------------------------------------ There are some important points to note here. The weighted R-index based on Fo^2 is (for compelling statistical reasons) much higher than the conventional R-index based on Fo with a threshold of say Fo > 4.sigma(Fo). For comparison with structures refined against F the latter is therefore printed as well (as R1). Despite the fact that wR2 and not R1 is the quantity minimized, R1 has the advantage that it is relatively insensitive to the weighting scheme, and so is more difficult to manipulate. Since the structure is non-centrosymmetric, the program has automatically estimated the Flack absolute structure parameter x in the final structure factor summation. In this example x is within one esd of zero, and its esd is also relatively small. This provides strong evidence that the absolute structure has been assigned correctly, so that no further action is required. The program would have printed a warning here if it would have been necessary to 'invert' the structure. For further details see the section on 'absolute structure' below. The two parameters 'refined' ( 2 / 55 ) but not applied in the final structure factor cycle in this case are related to the overall scale and the Flack x parameter; no parameters are 'refined' in the final structure factor cycle for a centrosymmetric structure. This is followed by a list of principal mean square displacements U for all anisotropic atoms. It will be seen that none of the smallest components (in the third column) are in danger of going negative [which would make the atom 'non positive definite' (NPD)] but that the motion of the two unique fluorine atoms is highly anisotropic (not unusual for an AsF6 anion). The program suggests that the fluorine motion is so extended in one direction that it would be possible to represent each of the two fluorine atoms as disordered over two sites, for which x, y and z coordinates are given; this may safely be ignored here (although there may well be some truth in it). The two suggested new positions for each 'split' atom are placed equidistant from the current position along the direction (and reverse direction) corresponding to the largest eigenvalue of the anisotropic displacement tensor. This list is followed by the analysis of variance (reproduced here in squashed form), recommended weighting scheme (to give a flat analysis of variance in terms of Fc^2), and a list of the most disagreeable reflections (which clearly shows that the one reflection suppressed by OMIT is indeed an aberration). For a discussion of the analysis of variance see the second example. ------------------------------------------------------------------------------ Principal mean square atomic displacements U 0.1067 0.1067 0.0561 Ag 0.0577 0.0577 0.0386 As 0.1038 0.0659 0.0440 S1 0.0986 0.0515 0.0391 S2 0.0779 0.0729 0.0391 C 0.1004 0.0852 0.0474 N 0.3029 0.0954 0.0473 F1 may be split into 0.5965 0.3173 0.0288 and 0.5946 0.3324 -0.0369 0.4778 0.1671 0.0457 F2 may be split into 0.5320 0.5089 0.2462 and 0.4680 0.4911 0.2462 Analysis of variance for reflections employed in refinement K = Mean[Fo^2] / Mean[Fc^2] for group Fc/Fc(max) 0.000 0.026 0.039 0.051 0.063 0.082 0.103 0.147 0.202 0.306 1.0 Number in group 94. 89. 90. 91. 89. 91. 89. 91. 88. 91. GooF 1.096 1.101 0.997 1.078 1.187 1.069 1.173 0.922 1.019 0.966 K 1.560 1.053 1.010 1.004 1.007 1.021 1.026 1.002 0.997 0.984 Resolution(A) 0.77 0.81 0.85 0.90 0.95 1.02 1.10 1.22 1.40 1.74 inf Number in group 97. 84. 92. 91. 89. 90. 89. 90. 93. 88. GooF 1.067 0.959 0.935 0.895 1.035 1.040 1.115 1.149 1.161 1.228 K 1.047 1.010 1.009 0.991 1.004 0.996 0.989 1.012 0.997 0.982 R1 0.166 0.100 0.069 0.059 0.051 0.036 0.033 0.027 0.020 0.020 Recommended weighting scheme: WGHT 0.0329 0.3591 Most Disagreeable Reflections (* if suppressed) h k l Fo^2 Fc^2 Delta(F^2)/esd Fc/Fc(max) Resolution(A) * -2 3 1 43.53 7.44 11.14 0.029 2.19 4 4 4 18.32 33.30 3.51 0.062 1.11 -4 1 3 15.79 4.17 3.39 0.022 1.50 0 2 2 41.60 57.32 3.16 0.082 2.61 2 5 0 124.72 100.33 3.06 0.108 1.56 2 3 0 64.43 48.46 3.03 0.075 2.32 -5 4 1 11.04 2.57 2.90 0.017 1.28 2 5 3 42.27 55.48 2.60 0.080 1.27 6 5 2 6.43 1.02 2.56 0.011 1.02 4 6 2 20.16 11.98 2.55 0.037 1.10 6 1 1 55.45 42.28 2.51 0.070 1.35 6 0 5 104.65 126.19 2.49 0.121 0.96 4 1 2 139.30 116.95 2.44 0.117 1.74 9 0 3 39.34 26.06 2.44 0.055 0.86 2 4 4 371.53 327.01 2.36 0.195 1.24 4 3 5 55.69 43.02 2.33 0.071 1.04 -3 6 0 7.51 3.10 2.25 0.019 1.25 -1 4 2 142.05 120.53 2.22 0.119 1.74 0 10 1 2.01 8.31 2.21 0.031 0.83 -2 1 2 1497.02 1361.86 2.20 0.399 2.49 ------------------------------------------------------------------------------ After the table of bond lengths and angles (BOND was implied by the ACTA instruction), the data are merged (again) for the Fourier calculation after correcting for dispersion (because the electron density is real). In contrast to the initial data reduction, Friedel's law is assumed here; the aim is to set up a unique reflection list so that the (difference) electron density can be calculated on an absolute scale. The algorithm for generating the 'asymmetric unit' for the Fourier calculations is general for all space groups, in conventional settings or otherwise. The rms electron density (averaged over all grid points) is printed as well as the maximum and minimum values so that the significance of the latter can be assessed. Since PLAN 20 was assumed, only a peak list is printed (and written to the .res file), followed by a list of shortest distances between peaks (not shown below); PLAN -20 would have produced a more detailed analysis with 'printer plots' of the structure. The last 40 peaks and some of the interatomic distances have been deleted here to save space. In this table, 'distances to nearest atoms' takes symmetry equivalents into account. ------------------------------------------------------------------------------ Bond lengths and angles [severely squashed to fit 80 columns!] Ag - Distance Angles N 2.279(0.006) N_$4 2.279(0.006) 113.08(0.15) N_$5 2.279(0.006) 113.08(0.15) 102.47(0.29) N_$3 2.279(0.006) 102.47(0.29) 113.08(0.16) 113.08(0.15) Ag - N N_$4 N_$5 As - Distance Angles F2 1.640(0.007) F2_$6 1.640(0.007)180.00(0.00) F1_$7 1.672(0.004) 89.08(0.41) 90.92(0.41) F1_$6 1.672(0.004) 89.08(0.41) 90.92(0.41)178.18(0.82) F1_$1 1.672(0.004) 90.92(0.41) 89.08(0.41) 90.01(0.01) 90.01(0.01) F1 1.672(0.004) 90.92(0.41) 89.08(0.41) 90.01(0.01) 90.01(0.01)178.18(0.82) As - F2 F2_$6 F1_$7 F1_$6 F1_$1 S1 - Distance Angles C 1.682(0.007) S2_$1 2.063(0.003) 98.61(0.20) S1 - C S2 - Distance Angles S2_$2 2.011(0.003) S1_$1 2.063(0.003) 105.37(0.07) S2 - S2_$2 C - Distance Angles N 1.147(0.007) S1 1.682(0.007) 175.67(0.49) C - N N - Distance Angles C 1.147(0.007) Ag 2.279(0.006) 152.38(0.45) N - C F1 - Distance Angles As 1.672(0.004) F1 - F2 - Distance Angles As 1.640(0.007) F2 - FMAP and GRID set by program FMAP 2 3 18 GRID -3.333 -2 -1 3.333 2 1 R1 = 0.0370 for 590 unique reflections after merging for Fourier Highest memory used 768 / 6109 Electron density synthesis with coefficients Fo-Fc Maximum = 0.32, Minimum = -0.35 e/A^3, Highest memory used = 768/13827 Mean = 0.00, Rms deviation from mean = 0.07 e/A^3 Fourier peaks appended to .res file x y z sof U Peak Dist to nearest atoms Q1 1 0.0000 0.0000 0.5000 0.25000 0.05 0.32 2.60 N 2.69 C 3.33 AG Q2 1 0.5691 0.3728 0.1623 1.00000 0.05 0.27 1.20 F1 1.34 F2 1.62 AS Q3 1 0.5685 0.3851 -0.1621 1.00000 0.05 0.24 1.19 F1 1.25 F2 1.56 AS Q4 1 0.4075 0.4717 0.2378 1.00000 0.05 0.23 0.81 F2 1.78 AS 1.79 F1 Q5 1 0.5848 0.2667 0.0312 1.00000 0.05 0.23 0.55 F1 2.09 AS 2.47 F1 Q6 1 0.5495 0.3425 -0.1122 1.00000 0.05 0.21 0.83 F1 1.57 AS 1.65 F2 Q7 1 0.2617 -0.1441 0.1446 1.00000 0.05 0.20 1.59 N 2.17 F1 2.40 C Q8 1 0.7221 0.1898 0.0030 1.00000 0.05 0.20 1.55 F1 2.39 N 2.54 N Q9 1 0.1997 0.0293 0.1024 1.00000 0.05 0.19 0.75 N 1.79 C 1.82 AG Q10 1 0.5394 1.0113 0.8165 1.00000 0.05 0.19 0.91 S2 1.41 S2 2.82 S1 ============================================================================== SECOND EXAMPLE (sigi): In the second example (provided as the files 'sigi.ins' and 'sigi.hkl') a small organic structure is refined in the space group P-1. Only the features that are different from the ags4 refinement will be discussed in detail. The structure consists of a five-membered lactone [-C7-C11-C8-C4(O1)-O3-] with a -CH2-OH group [-C5-O2] attached to C7 and a =C(CH3)(NH2) unit [=C9(C10)N6] double-bonded to C8. Of particular interest here is the placing and refinement of the 11 hydrogen atoms via HFIX instructions. The two -CH2- groups (C5 and C11) and one tertiary CH (C7) can be placed geometrically by standard methods; the algorithms have been improved relative to those used in SHELX-76, and the hydrogen atoms are now idealized before each refinement cycle (and after the last). Since N6 is attached to a conjugated system, it is reasonable to assume that the -NH2 group is coplanar with the C8=C9(C10)-N6 unit, which enables these two hydrogens to be placed as ethylenic hydrogens, which requires HFIX (or AFIX) 9n; the program takes into account that they are bonded to nitrogen in setting the default bond lengths. All these hydrogens are to be refined using a 'riding model' (HFIX or AFIX m3) for x, y and z. The -OH and -CH3 groups are trickier, in the latter case because C9 is sp2-hybridized, so the potential barrier to rotation is low and there is no fully staggered conformation available as the obvious choice. Since the data are reasonable, the initial torsion angles for these two groups can be found by means of difference electron density syntheses calculated around the circles which represent the loci of all possible hydrogen atom positions. The torsion angles are then refined during the least-squares refinement. Note that in subsequent cycles (and jobs) these groups will be re-idealized geometrically with RETENTION of the current torsion angle; the circular Fourier calculation is performed only once. Two 'free variables' (2 and 3 - yes, they still exist!) have been assigned to refine common isotropic displacement parameters for the 'rigid' and 'rotating' hydrogens respectively. If these had not been specified, the default action would have been to hold the hydrogen U values at 1.2 times the equivalent isotropic U of the atoms to which they are attached (1.5 for the -OH and methyl groups). The 'sigi.ins' file (which is provided as a test job) is as follows. Note that for instructions with both numerical parameters and atom names such as HFIX and MPLA, is does not matter whether numbers or atoms come first, but the order of the numerical parameters themselves (and in some cases the order of the atoms) is important. ------------------------------------------------------------------------------ TITL SIGI in P-1 CELL 0.71073 6.652 7.758 8.147 73.09 75.99 68.40 ZERR 2 .002 .002 .002 .03 .03 .03 SFAC C H N O UNIT 14 22 2 6 ! no LATT and SYMM needed for space group P-1 L.S. 4 EXTI 0.001 ! refine an isotropic extinction parameter WGHT .060 0.15 ! (suggested by program in last job); WGHT OMIT 2 8 0 ! and OMIT are also based on previous output BOND $H ! include H in bond lengths / angles table CONF ! all torsion angles except involving hydrogen FMAP 2 ! Fo-Fc Fourier PLAN -20 ! printer plots and full analysis of peak list HFIX 147 31 O2 ! initial location of -OH and -CH3 hydrogens from HFIX 137 31 C10 ! circular Fourier, then refine torsion, U(H)=fv(3) HFIX 93 21 N6 ! -NH2 in plane, xyz ride on N, U(H)=fv(2) HFIX 23 21 C5 C11 ! two -CH2- groups, xyz ride on C, U(H)=fv(2) HFIX 13 21 C7 ! tertiary CH, xyz ride on C, U(H)=fv(2) EQIV $1 X-1, Y, Z ! define symmetry operation and tabulate H-bond RTAB H..O H2 O1_$1 ! distance and angle to symmetry equivalent of O1 RTAB XHY O2 H2 O1_$1 ! 'H..O' and 'XHY' are table headings RTAB H..O H6A O1 ! include intramolecular H-bond in tables RTAB XHY N6 H6A O1 EQIV $2 X+1, Y, Z-1 ! include a further intermolecular H-bond in the RTAB H..O H6B O2_$2 ! same tables; involves symmetry equivalent of O2 RTAB XHY N6 H6B O2_$2 ! l.s. planes through 5-ring and through MPLA 5 C7 C11 C8 C4 O3 O1 N6 C9 C10 ! CNC=CCC moiety, then find deviations MPLA 6 C10 N6 C9 C8 C11 C4 O1 O3 C7 ! of last 4 and 3 named atoms resp. too FVAR 1 .06 .07 ! overall scale and free variables for U(H) REM name sfac# x y z sof(+10 to fix it) U11 U22 U33 U23 U13 U12 follow O1 4 0.30280 0.17175 0.68006 11.00000 0.02309 0.04802 = 0.02540 -0.00301 -0.00597 -0.01547 O2 4 -0.56871 0.23631 0.96089 11.00000 0.02632 0.04923 = 0.02191 -0.00958 0.00050 -0.02065 O3 4 -0.02274 0.28312 0.83591 11.00000 0.02678 0.04990 = 0.01752 -0.00941 -0.00047 -0.02109 C4 1 0.10358 0.23458 0.68664 11.00000 0.02228 0.02952 = 0.01954 -0.00265 -0.00173 -0.01474 C5 1 -0.33881 0.18268 0.94464 11.00000 0.02618 0.03480 = 0.01926 -0.00311 -0.00414 -0.01624 N6 3 0.26405 0.17085 0.33925 11.00000 0.03003 0.04232 = 0.02620 -0.01312 0.00048 -0.01086 C7 1 -0.25299 0.33872 0.82228 11.00000 0.02437 0.03111 = 0.01918 -0.00828 -0.00051 -0.01299 C8 1 -0.03073 0.27219 0.55976 11.00000 0.02166 0.02647 = 0.01918 -0.00365 -0.00321 -0.01184 C9 1 0.05119 0.24371 0.39501 11.00000 0.02616 0.02399 = 0.02250 -0.00536 -0.00311 -0.01185 C10 1 -0.10011 0.29447 0.26687 11.00000 0.03877 0.04903 = 0.02076 -0.01022 -0.00611 -0.01800 C11 1 -0.26553 0.36133 0.63125 11.00000 0.02313 0.03520 = 0.01862 -0.00372 -0.00330 -0.01185 HKLF 4 ! read intensity data from 'sigi.hkl'; terminates '.ins' file ------------------------------------------------------------------------------ The data reduction reports 1904 reflections read with -7 >= h >= 7, -8 >= k >= 9 and -9 >= l >= 9. Note that these are the limiting index values; in fact only about 1.5 times the unique volume of reciprocal space was measured. The maximum 2-theta was 50.00, and there were no systematic absence violations, 34 (not seriously) inconsistent equivalents, and 1297 unique data, of which 1 was suppressed (by OMIT). R(int) was 0.0196 and R(sigma) 0.0151. It will be seen that the program uses different default distances to hydrogen for different bonding situations (these may be overridden by the user if desired, of course). These defaults depend on the temperature (set using TEMP) in order to allow for librational effects. The list of default X-H distances is followed by the (squashed) circular difference electron syntheses to determine the C-OH and C-CH3 initial torsion angles: ------------------------------------------------------------------------------ Default effective X-H distances for T = 20.0 C AFIX m = 1 2 3 4 4[N] 3[N] 15[B] 8[O] 9 9[N] 16 d(X-H) = 0.98 0.97 0.96 0.93 0.86 0.89 1.10 0.82 0.93 0.86 0.93 Difference electron density (eA^-3x100) at 15 degree intervals for AFIX 147 group attached to O2. The center of the range is eclipsed (cis) to C7 and rotation is clockwise looking down C5 to O2 -2 0 1 0 0 0 -1 -5 -8 -9 -6 -2 2 5 9 16 29 42 48 39 23 9 0 -2 Difference electron density (eA^-3x100) at 15 degree intervals for AFIX 137 group attached to C10. The center of the range is eclipsed (cis) to N6 and rotation is clockwise looking down C9 to C10 34 37 39 41 38 30 20 15 19 28 39 47 50 43 29 15 12 19 29 35 33 27 25 29 After local symmetry averaging: 21 28 36 41 40 33 24 20 ------------------------------------------------------------------------------ It can be seen that the hydroxyl hydrogen is very clearly defined, but that the methyl group is rotating fairly freely (low potential barrier). After three-fold averaging, however, there is a single difference electron density maximum. The (squashed) least-squares refinement output follows: ------------------------------------------------------------------------------ Least-squares cycle 1 Maximum vector length =511 Memory required =1771/135569 wR2 = 0.1138 before cycle 1 for 1296 data and 105 / 105 parameters GooF = S = 1.134; Restrained GooF = 1.134 for 0 restraints Weight = 1/[sigma^2(Fo^2)+(0.0600*P)^2+0.15*P] where P=(Max(Fo^2,0)+2*Fc^2)/3 N value esd shift/esd parameter 1 0.97914 0.00386 -5.406 OSF 2 0.03486 0.00263 -9.959 FVAR 2 3 0.07515 0.00396 1.048 FVAR 3 4 0.02334 0.00951 2.349 EXTI Mean shift/esd = 0.911 Maximum = -9.959 for FVAR 2 Max. shift = 0.038 A for H10C Max. dU =-0.026 for H5A .......... etc (cycles 2 and 3 omitted) ......... Least-squares cycle 4 Maximum vector length =511 Memory required =1771/135569 wR2 = 0.1044 before cycle 4 for 1296 data and 105 / 105 parameters GooF = S = 1.025; Restrained GooF = 1.025 for 0 restraints Weight = 1/[sigma^2(Fo^2)+(0.0600*P)^2+0.15*P] where P=(Max(Fo^2,0)+2*Fc^2)/3 N value esd shift/esd parameter 1 0.97903 0.00361 -0.001 OSF 2 0.03607 0.00178 0.022 FVAR 2 3 0.07346 0.00379 -0.009 FVAR 3 4 0.02502 0.01089 -0.004 EXTI Mean shift/esd = 0.006 Maximum = -0.182 for tors H10A Max. shift = 0.003 A for H10B Max. dU = 0.000 for H5A Largest correlation matrix elements 0.509 U12 O2 / U22 O2 0.506 U12 O3 / U11 O3 0.508 U12 O2 / U11 O2 0.500 U12 O3 / U22 O3 Idealized hydrogen atom generation before cycle 5 Name x y z AFIX d(X-H) shift Bonded Conformation to determined by H2 -0.6017 0.2095 0.8833 147 0.820 0.000 O2 C5 H2 H5A -0.2721 0.0676 0.9001 23 0.970 0.000 C5 O2 C7 H5B -0.2964 0.1554 1.0576 23 0.970 0.000 C5 O2 C7 H6A 0.3572 0.1389 0.4085 93 0.860 0.000 N6 C9 C8 H6B 0.3073 0.1559 0.2347 93 0.860 0.000 N6 C9 C8 H7 -0.3331 0.4598 0.8575 13 0.980 0.000 C7 O3 C5 C11 H10A -0.2044 0.4191 0.2694 137 0.960 0.000 C10 C9 H10A H10B -0.1761 0.2034 0.2962 137 0.960 0.000 C10 C9 H10A H10C -0.0176 0.2950 0.1525 137 0.960 0.000 C10 C9 H10A H11A -0.3575 0.2948 0.6198 23 0.970 0.000 C11 C8 C7 H11B -0.3198 0.4943 0.5737 23 0.970 0.000 C11 C8 C7 ------------------------------------------------------------------------------ The final structure factor calculation, analysis of variance etc. produces the following edited output: ------------------------------------------------------------------------------ Final Structure Factor Calculation for SIGI in P-1 Total number of l.s. parameters = 105 Maximum vector length = 511 wR2 = 0.1044 before cycle 5 for 1296 data and 0 / 105 parameters GooF = S = 1.025; Restrained GooF = 1.025 for 0 restraints Weight = 1/[sigma^2(Fo^2)+(0.0600*P)^2+0.15*P] where P=(Max(Fo^2,0)+2*Fc^2)/3 R1 = 0.0365 for 1189 Fo > 4.sigma(Fo) and 0.0399 for all 1297 data wR2 = 0.1060, GooF = S = 1.042, Restrained GooF = 1.042 for all data Principal mean square atomic displacements U 0.0504 0.0254 0.0188 O1 0.0491 0.0229 0.0190 O2 0.0513 0.0194 0.0165 O3 0.0326 0.0208 0.0159 C4 0.0375 0.0204 0.0190 C5 0.0440 0.0320 0.0214 N6 0.0329 0.0201 0.0185 C7 0.0276 0.0190 0.0181 C8 0.0288 0.0220 0.0191 C9 0.0494 0.0353 0.0181 C10 0.0353 0.0215 0.0183 C11 Analysis of variance for reflections employed in refinement K = Mean[Fo^2] / Mean[Fc^2] for group Fc/Fc(max) 0.000 0.009 0.017 0.027 0.038 0.049 0.065 0.084 0.110 0.156 1.0 Number in group 135. 125. 130. 139. 119. 133. 130. 128. 131. 126. GooF 1.110 1.006 1.082 1.046 1.093 1.014 0.923 0.996 1.027 0.930 K 1.521 1.121 0.966 1.023 1.008 0.990 0.998 0.998 1.008 1.010 Resolution(A) 0.84 0.88 0.90 0.95 0.99 1.06 1.14 1.25 1.44 1.79 inf Number in group 136. 127. 128. 128. 136. 124. 128. 130. 130. 129. GooF 1.007 0.890 0.865 0.867 0.864 0.921 0.874 1.095 1.256 1.432 K 1.024 1.013 1.017 0.990 0.991 0.989 1.013 0.995 1.037 1.004 R1 0.062 0.049 0.051 0.046 0.034 0.034 0.031 0.039 0.039 0.037 Recommended weighting scheme: WGHT 0.0548 0.1468 ------------------------------------------------------------------------------ The analysis of variance should be examined carefully for indications of systematic errors. If the Goodness of Fit is significantly higher than unity and the scale factor K is appreciably lower than unity in the extreme right columns in terms of both Fc and resolution, then an extinction parameter should be refined (the program prints a warning in such a case). This does not show here because an extinction parameter is already being refined. The scale factor is a little high for the weakest reflections in this example; this may well be a statistical artifact and may be ignored (selecting the groups on Fc will tend to make Fo^2 greater than Fc^2 for this range). The increase in the GooF at low resolution (the 1.79 to infinity range) is caused in part by systematic errors in the model such as the use of scattering factors based on spherical atoms which ignore bonding effects, and is normal for purely light-atom structures (this interpretation is confirmed by the fact that difference electron density peaks are found in the middle of bonds). In extreme cases the lowest or highest resolution ranges can be conveniently suppressed by means of the SHEL instruction; this is normal practice in macromolecular refinements. The weighting scheme suggested by the program is designed to produce a flat analysis of variance in terms of Fc, but makes no attempt to fit the resolution dependence of the Goodness of Fit. It is also written to the end of the .res file, so that it is easy to update it before the next job. In the early stages of refinement it is better to retain the default scheme of WGHT 0.1; the updated parameters should not be incorporated in the next '.ins' file until all atoms have been found and at least the heavier atoms refined anisotropically. The list of most disagreeable reflections and tables of bond lengths and angles (BOND $H - omitted here) and torsion angles (CONF) are followed by the RTAB and MPLA tables: ------------------------------------------------------------------------------ Selected torsion angles -175.08 ( 0.12) C7 - O3 - C4 - O1 5.72 ( 0.15) C7 - O3 - C4 - C8 109.70 ( 0.12) C4 - O3 - C7 - C5 -11.64 ( 0.15) C4 - O3 - C7 - C11 171.12 ( 0.10) O2 - C5 - C7 - O3 -72.04 ( 0.15) O2 - C5 - C7 - C11 -1.47 ( 0.24) O1 - C4 - C8 - C9 177.61 ( 0.12) O3 - C4 - C8 - C9 -176.27 ( 0.14) O1 - C4 - C8 - C11 2.81 ( 0.16) O3 - C4 - C8 - C11 3.09 ( 0.22) C4 - C8 - C9 - N6 176.93 ( 0.13) C11 - C8 - C9 - N6 -177.23 ( 0.13) C4 - C8 - C9 - C10 -3.38 ( 0.22) C11 - C8 - C9 - C10 176.04 ( 0.13) C9 - C8 - C11 - C7 -9.39 ( 0.14) C4 - C8 - C11 - C7 12.36 ( 0.14) O3 - C7 - C11 - C8 -104.74 ( 0.13) C5 - C7 - C11 - C8 Distance H..O 2.041 (0.003) H2 - O1_$1 2.225 (0.002) H6A - O1 2.172 (0.002) H6B - O2_$2 Angle XHY 174.03 (2.37) O2 - H2 - O1_$1 129.29 (0.05) N6 - H6A - O1 155.07 (0.05) N6 - H6B - O2_$2 Least-squares planes (x,y,z in crystal coordinates) and deviations from them (* indicates atom used to define plane) 2.344 (0.004) x + 7.411 (0.004) y - 0.015 (0.005) z = 1.978 (0.004) * -0.074 (0.001) C7 * 0.068 (0.001) C11 * -0.042 (0.001) C8 * -0.006 (0.001) C4 * 0.054 (0.001) O3 -0.006 (0.002) O1 -0.098 (0.003) N6 -0.056 (0.002) C9 -0.031 (0.003) C10 Rms deviation of fitted atoms = 0.055 2.544 (0.004) x + 7.349 (0.004) y - 0.166 (0.004) z = 1.863 (0.003) Angle to previous plane (with approximate esd) = 2.45 ( 0.07 ) * 0.005 (0.001) C10 * 0.008 (0.001) N6 * -0.005 (0.001) C9 * -0.034 (0.001) C8 * 0.013 (0.001) C11 * 0.012 (0.001) C4 0.057 (0.002) O1 0.021 (0.002) O3 -0.154 (0.002) C7 Rms deviation of fitted atoms = 0.016 ------------------------------------------------------------------------------ All esd's printed by the program are calculated rigorously from the full covariance matrix, except for the angle between two least-squares planes, which involves some approximations. The contributions to the esds in bond lengths, angles and torsion angles also take the errors in the unit-cell parameters (as input on the ZERR instruction) rigorously into account; an approximate treatment is used to obtain the (rather small) contributions of the cell errors to the esds involving least-squares planes. The free torsional motion of H2 is virtually at right angles to the fairly linear hydrogen bond, so the O-H..O angle has a large esd. On the other hand the 'riding model' constraint applied to the N-H bonds effectively prevents the estimation of a meaningful esd in the two N-H..O angles, hence the unrealistically small values for these two esds. There follows the difference electron density synthesis and line printer 'plot' of the structure and peaks. The highest and lowest features are 0.28 and -0.17 eA^-3 respectively, and the rms difference electron density is 0.04. These values confirm that the treatment of the hydrogen atoms was adequate, and are indeed typical for routine structure analysis of small organic molecules. This output is too voluminous to give here, and indeed users of the Siemens SHELXTL molecular graphics program XP will almost always suppress it by use of the default option of a positive number on the PLAN instruction, and employ interactive graphics instead for analysis of the peak list. ============================================================================== THE REFLECTION DATA FILE 'name.hkl' The '.hkl' file consists of one line per reflection in FORMAT(3I4,2F8.2,I4) for h,k,l,Fo^2,sigma(Fo^2), and batch number. This file should be terminated by a record with all items zero; individual data sets within the file should NOT be separated from one another - the batch numbers serve to distinguish between groups of reflections for which separate scale factors are to be refined (see the BASF instruction). The reflection order and the batch number order is unimportant. This '.hkl' file is read each time the program is run; unlike SHELX-76, there is no facility for intermediate storage of binary data. This enhances computer independence and eliminates several possible sources of confusion. The '.hkl' file is read after the HKLF instruction (which terminates the '.ins' file) has been interpreted. The HKLF instruction specifies the format of the '.hkl' file, and allows scale factors and a reorientation matrix to be applied. For further details see the specification of the HKLF instruction. Lorentz, polarization and absorption corrections are assumed to have been applied to the data in the '.hkl' file. If SHELXA is used for the absorption corrections, it will have read a file name.raw (containing direction cosines) and written 'name.hkl' (without cosines). Since SHELXA can read a SHELXL-93 '.ins' file, empirical absorption corrections (which require SHELXA to calculate Fc) may be applied more than once to the original data in the course of a structure determination simply by running SHELXA immediately before SHELXL-93 with the same '.ins' file. Note that there are special extensions to the '.hkl' format for Laue and powder data, as well as for twinned crystals which cannot be handled by a TWIN instruction alone. In general the '.hkl' file should contain all measured reflections without rejection of systematic absences or merging of equivalents. The systematic absences and R(int) for equivalents provide an excellent check on the space group assignment and consistency of the input data. Since complex scattering factors are used throughout by SHELXL-93 it is important NOT to average Friedel opposites in preparing this file. WHY DOES SHELXL-93 REFINE AGAINST F-SQUARED ? Traditionally most crystal structures have been refined against F. For a well-behaved structure the geometrical parameters and their esd's are almost identical for refinement based on all Fo^2 values and for an old-fashioned refinement against F ignoring data with Fo less than (say) 3.sigma(Fo). For weakly diffracting crystals and in particular for pseudosymmetry problems the refinement against all data is demonstrably superior. The esd's are reduced because more experimental information is used, and the chance of getting stuck in a local minimum is reduced. In addition, the use of a threshold introduces a systematic error which introduces bias into the displacement parameters Uij. On the other hand, it is impossible to refine on F using ALL data, because it would involve taking the square root of a negative number for reflections with negative Fo^2 (i.e. background higher than the peak as a result of statistical fluctuations), and because the estimation of sigma(Fo) from sigma(Fo^2) for small or negative Fo^2 is a difficult statistical problem which requires the assumption of a probability distribution function for the F-values. In the case of pseudosymmetric structures - i.e. the very case where the weak reflections are most important - this distribution function is not known a priori, making it impossible to derive 'correct' sigma(Fo) values and hence correct weights. The diffraction experiment measures intensities and their standard deviations, which after the various corrections give Fo^2 and sigma(Fo^2). If your data reduction program only outputs Fo and sigma(Fo), which as explained above involves serious approximations for weak reflections, you MUST CORRECT YOUR DATA REDUCTION PROGRAM, not simply write a routine to square the Fo values or use HKLF 3 to input Fo and sigma(Fo) to SHELXL-93 (although the latter is legal). Note that if an Fo^2 value is too large to fit format F8.2, then format F8.0 may be used instead - the decimal point overrides the FORTRAN format specification. The use of a threshold for ignoring weak reflections may introduce bias which primarily affects the atomic displacement parameters; it is only justified to speed up the early stages of refinement. In the final refinement ALL DATA should be used except for reflections known to suffer from systematic error (i.e. in the final refinement the OMIT instruction may be used to omit specific reflections - although not without good reason - but not ALL reflections below a given threshold). Anyone planning to ignore this advice should read F. L. Hirshfeld and D. Rabinovich, Acta Cryst., A29 (1973) 510-513 and L. Arnberg, S. Hovmoller and S. Westman, Acta Cryst., A35 (1979) 497-499 first. Refinement against F^2 also facilitates the treatment of twinned and powder data, and the determination of absolute structure. One cosmetic disadvantage of refinement against F^2 is that R-indices based on F^2 are larger than (often about double) those based on F. For comparison with older refinements based on F and an OMIT threshold, a conventional index R1 based on observed F values larger than 4.sigma(Fo) is also printed. The deviation of the Goodness of Fit (S) from unity also tends to be magnified when calculated with F^2. Throughout the output, R indices based on F^2 are denoted R2 and those based on F are denoted R1, e.g. wR2 = [ Sigma[w(Fo^2-Fc^2)^2] / Sigma[w(Fo^2)^2] ]^0.5 R1 = Sigma||Fo|-|Fc|| / Sigma|Fo| For details of the weights w see 'WGHT' below. The Goodness of Fit (S) is always based on F^2: GooF = S = [ Sigma [ w(Fo^2-Fc^2)^2 ] / (n-p) ]^0.5 where n is the number of reflections and p is the total number of parameters refined. In the 'Restrained Goodness of Fit', Sigma[w(yt-y)^2] is added to the numerator and the number of restraints is added to the denominator. This corresponds to treating each restraint as an extra observational equation with weight w = 1/sigma^2. y is the quantity (e.g. a bond length) being restrained and yt is its target value. In these expressions, Sigma is written with a capital S to indicate a summation and a small s for an estimated standard deviation (corresponding to the use of capital and small Greek letters for sigma). In general most statistical quantities are defined as in the I.U.Cr. Commission's report: 'Statistical Descriptors in Crystallography', D. Schwarzenbach et al., Acta Cryst., A45 (1989) 63-75. CIF ARCHIVE FORMAT The CIF format represents a major step forward in the archiving, publication and communication of crystallographic data. At last it is possible to publish crystal structures and incorporate structural data into the crystallographic databases without the expensive and error-prone retyping of tables by hand. CIF format also provides a convenient method of transferring data from one program system to another. The ACTA instruction instructs SHELXL-93 to write two CIF-format files: 'name.fcf' contains the reflection data and 'name.cif' all other data. These files contain all the items needed for archiving the structure; those answers not known to SHELXL-93 (e.g. the color of the crystal) are left as a question mark. In general the final 'name.cif' file should be edited using any text editor to replace most of these question marks. The file is then suitable for deposition in the CSD (organic) and ICSD (inorganic crystal structure) databases. For publication via electronic mail it will normally be necessary to add the authors' names, title, text etc., which may also be done in CIF-format; this is followed by the edited contents of one or more '.cif' files each describing one structure (or possibly the same structure at different temperatures etc.). An example of a paper submitted to Acta Cryst. in this way is provided in Appendix D. At the time of writing it is necessary to send the diagram and Fo/Fc tables by post, though in principle the '.fcf' file is suitable for the direct submission of the Fo/Fc data in CIF-format. SHELXL-93 users are strongly recommended to familiarize themselves with the definitive paper by the I.U.Cr. Commission on Crystallographic Data: S.R. Hall, F.H. Allen and I.D. Brown, Acta Cryst., A47 (1991) 655-685. The auxiliary program CIFTAB is provided with SHELXL-93 to facilitate the transition to CIF. It enables the '.cif' output file from SHELXL-93 to be extended by adding CIF information from other (e.g. diffractometer data processing) programs, and enables a variety of tables to be produced (e.g. crystal data, coordinates, bond lengths and angles, and structure factors) for padding out Ph.D. theses and submission to Journals that have not yet seen the light. Further details of CIFTAB may be found in Appendix C. TREATMENT OF HYDROGEN ATOMS It is difficult to locate hydrogen atoms accurately using X-ray data because of their low scattering power and lack of core electrons, and because the valence electron density is asymmetrical and is not centered at the position of the nucleus (which can be determined by neutron diffraction). In addition hydrogen atoms tend to have larger vibrational and librational amplitudes than other atoms. For many purposes it is preferable to calculate the hydrogen positions according to well-established geometrical criteria and then to adopt a refinement procedure which ensures that a sensible geometry is retained. SHELXL-93 provides a bewildering selection of (AFIX and HFIX) options for positioning and refining hydrogen atoms, as detailed in the section 'atom lists and least-squares constraints'. For routine refinement, however, the riding model is a good choice for tertiary CH (HFIX 13), secondary CH2 (HFIX 23), ethylenic =CH2 (HFIX 93), acetylenic CH (HFIX 163), BH in polyhedral boranes (HFIX 153), and aromatic CH or amide NH (HFIX 43). The hydrogen coordinates are re-idealized before each cycle, and 'ride' on the atoms to which they are attached (i.e. the coordinate shifts are the same for both). In this riding model, the C-H vector remains constant in magnitude and direction, but its origin, i.e. the position of the carbon atom in the unit-cell, may move. Both C and H contribute to the derivative calculations which improves convergence. Alternatively AFIX (or HFIX) 14 etc. performs a similar riding refinement but allows the C-H distance to vary as well (keeping the C-H distances equal within a CH2 or CH3 group). It is possible to use SADI or DFIX to restrain chemically equivalent C-H distances involving different carbons to be equal. Methyl and hydroxyl groups are more difficult to position accurately. If good (low-temperature) data are available the method of choice is HFIX 137 for -CH3 and HFIX 147 for -OH groups; in this approach, a difference electron density synthesis is calculated around the circle which represents the loci of possible hydrogen positions (for a fixed X-H distance and Y-X-H angle). The maximum electron density (in the case of a methyl group after local threefold averaging) is then taken as the starting position for the hydrogen atom(s). In subsequent refinement cycles (and in further least-squares jobs) the hydrogens are re-idealized at the start of each cycle, but the current torsion angle is retained; the torsion angles are allowed to refine whilst keeping the X-H distance and Y-X-H angle fixed. If unusually high quality data are available, AFIX 138 would allow the refinement of a common C-H distance for a methyl group but not allow it to tilt; a variable metric rigid group refinement (AFIX 9 for the carbon followed by AFIX 135 before the first H) would allow it to tilt as well, but still retain tetrahedral H-C-H angles and equal C-H distances within the group. If the data quality is less good, then the refinement of torsion angles may not converge very well. In such cases the hydrogens can be positioned geometrically and refined using a riding model by HFIX 33 for methyl and HFIX 83 for hydroxyl groups. This staggers the methyl groups, and -OH groups attached to saturated carbons, as well as possible; -OH groups attached to aromatic rings are placed in one of the two positions in the plane. In either -OH case the choice of hydrogen position is then determined by best hydrogen bond (to an N, O, Cl or F atom) which can be created. For disordered methyl groups (with two sites rotated by 60 degrees from one another) HFIX 123 is recommended, possibly with refinement of the corresponding site occupation factors via a 'free variable' so that their sum is unity (e.g. 21 and -21). The choice of a suitable (default) O-H distance is very difficult. O-H internuclear distances for isolated molecules in the gas phase are about 0.96 Angstroms (cf. 1.10 for C-H), but the appropriate distance to use for X-ray diffraction must be appreciably shorter to allow for the displacement of the center of gravity of the electron distribution towards the oxygen atom, and also for librational effects. Although the (temperature dependent) value assumed by the program fits reasonably well for O-H groups in predominantly organic molecules, appreciably longer O-H distances are appropriate for low temperature studies of strongly (cooperatively) hydrogen bonded systems - short H..O distances are always associated with long O-H distances. If there are many such O-H groups and good quality data are available, HFIX 88 (or 148) plus SADI restraints to make all the O-H distances approximately equal (with an esd of say 0.01) is a good approach. Hydrogen atoms may also 'ride' on atoms in rigid groups (unlike SHELX-76); for example HFIX 43 could reference carbon atoms in a rigid phenyl ring. In such a case further geometrical restraints (SADI, SAME, DFIX, FLAT) are not permitted on the hydrogen atoms; this is the only exception to the general rule that any number of restraints may be applied to any atom, whatever constraints are also being applied to it. This is much more general than in SHELX-76. If the hydrogen atoms are generated using HFIX, the standard option is to set the isotropic U's to -1.2 (-1.5 for methyl and hydroxyl) which is interpreted as 1.2 (or 1.5) times the equivalent isotropic displacement parameter of the last atom which did not use this facility. A good alternative is to use 'free variables' to constrain the U values of chemically equivalent hydrogens to be equal. Hydrogen atoms are identified as such by their scattering factor numbers, which must correspond to a SFAC name H (or $H). Other elements which need to be specifically identified (e.g. so that HFIX 43 can use different default C-H and N-H distances) are defined similarly. However for the output of the PLAN instruction, hydrogen atoms are identified as those atoms with a radius of less the 0.4 Angstroms (this is not as illogical as it may sound; the PLAN output is concerned with potential hydrogen bonds etc., not with the scattering power of an atom, and SHELXL-93 has to handle neutron as well as X-ray data). OMIT $H (or OMIT_* $H if residues are employed) combined with L.S. 0, FMAP 2 and PLAN -100 enables an 'omit map' to be calculated, which is a convenient way of checking whether there are actually electron density peaks close to the calculated hydrogen positions. In this omit map, the hydrogen atoms are retained but do not contribute to Fc; if a non-zero electron density appears in the 'Peak' column for one of these hydrogens in the Fourier output, then there was an actual peak in the difference electron density synthesis within 0.31 Angstroms of the expected hydrogen position. There are a number of operations in SHELXL-93 in which hydrogen atoms are treated specially, for example in the connectivity array, in atom lists defined using the '>' and '<' symbols, in the atoms following the SAME, ANIS and AFIX instructions, and in the output generated by PLAN. This approach is very convenient for the vast majority of structure refinements. However it may be useful to know how the program decides which atoms are 'hydrogens' in order to be able to treat hydrogens as normal atoms. The program scans the SFAC instructions (either format) for an element named 'H', and if one is found, treats all atoms with this scattering factor number specially. If two or more scattering factors are named 'H', only the last one gets this special treatment, which provides a way of tricking the program into allowing both 'normal' and 'special' hydrogens. Similarly for neutron data, where an SFAC instruction is needed for each element anyway, one could if desired suppress the special treatment of hydrogens by labeling their SFAC instruction 'Hyd' or even 'D'. RESTRAINTS, CONSTRAINTS AND GROUP FITTING, AND DISORDER In crystal structure refinement, there is an important distinction between a 'constraint' and a 'restraint'. A constraint is an exact mathematical condition which enables one or more least-squares variables to be expressed exactly in terms of other variables or constants, and hence eliminated. An example is the fixing of the x, y and z coordinates of an atom on an inversion center. A restraint takes the form of additional information which is not exact but is subject to a probability distribution; for example we could restrain two chemically but not crystallographically equivalent bonds to be approximately equal, with an effective standard deviation of (say) 0.01 Angstroms. A restraint is incorporated in the least-squares refinement as if it were an additional experimental observation; w(yt-y)^2 is added to the quantity Sigma[w(Fo^2-Fc^2)^2] to be minimized, where a quantity y (which is a function of the least-squares parameters) is to be restrained to a target value yt, and the weight w (for either a restraint or a reflection) is 1/sigma^2. In the case of a reflection sigma^2 is estimated using a weighting scheme; for a restraint sigma is simply the effective standard deviation. In SHELXL-93 the restraint weights are multiplied by the square of the Goodness of Fit for the reflection data, which allows for the possibility that the reflection weights may be relative rather than absolute, and also gives the restraints more influence at the early stages of refinement (when the Goodness of Fit is invariably much greater than unity), which improves convergence. Most of the constraints and restraints available in SHELXL-93 have already been widely used in other programs, especially for macromolecular refinement. In SHELXL-93 an effort has been made to make them simple to understand and use, while at the same time avoiding the bias which is introduced when specific target values etc. have to be assumed. For example it is more realistic to assume that a phenyl group is planar and has mm (C2v) symmetry (in both cases within a reasonable tolerance) rather than that it is an exactly regular hexagon with a bond length of 1.39 Angstroms; however both approaches may conveniently be applied using SHELXL-93. The following general categories of constraints and restraints are available using SHELXL-93: 1. Constraints for the coordinates and anisotropic displacement parameters for atoms on special positions: these are generated automatically by the program for ALL special positions in ALL space groups, in conventional settings or otherwise. If the user applies (correct or incorrect) special position constraints using free variables etc., the program assumes this has been done with intent and reports but does not apply the correct constraints. Thus the accidental application of a free variable to a Uij term of an atom on a special position can lead to the refinement 'blowing up' ! 2. Two or more atoms sharing the same site: the xyz and Uij parameters may be equated using the EXYZ and EADP constraints respectively (or by using 'free variables'). The occupation factors may be expressed in terms of a 'free variable' so that their sum is constrained to be constant (e.g. 1.0). If more than two different chemical species share a site, a linear free variable restraint (SUMP) is required to restrain the sum of occupation factors. EADP is also useful for equating the Uij of 'opposite' fluorines of disordered -CF3 groups. 3. Floating origin restraints: these are generated automatically by the program as and when required by the method of H.D. Flack and D. Schwarzenbach, Acta Cryst., A44 (1988) 499-506, so the user should not attempt to fix the origin in such cases by fixing the coordinates of a heavy atom. 4. Geometrical constraints: these include rigid-group refinements (AFIX 6), variable-metric rigid-group refinements (AFIX 9) and various riding models (AFIX/HFIX) for hydrogen atom refinement, for example torsional refinement of a methyl group about the local threefold axis. 5. Fragments of known geometry may be fitted to target atoms (e.g. from a previous Fourier peak search), and the coordinates generated for any missing atoms. Four standard groups are available: regular pentagon, regular hexagon, naphthalene and pentamethylcyclopentadienyl; any other group may be used simply by specifying orthogonal or fractional coordinates in a given cell (AFIX mn with m > 16 and FRAG...FEND). This is usually, but not always, a preliminary to rigid group refinement. 6. Geometrical restraints: a particularly useful restraint is to make chemically but not crystallographically equivalent distances equal (subject to a given or assumed esd) without having to invent a value for this distance (SADI). The SAME instruction can be used to generate such restraints automatically, e.g. when chemically identical molecules or residues are present. This has the same effect as making equivalent bond lengths and angles but not torsion angles equal. The FLAT instruction restrains a group of atoms to lie in a plane (but the plane is free to move and rotate). DFIX and CHIV restrain distances and chiral volumes respectively to target values. When 'free variables' are used for the target values, it is possible to restrain different distances etc. to be equal and to refine their mean value (for which an esd is thus obtained). ALL types of geometrical restraints may involve ANY atom, even if it is part of a rigid group or a symmetry equivalent generated using EQIV $n ... and referenced by _$n, except for hydrogen atoms which ride on rigid group atoms (see preceding section). 7. 'Anti-bumping' restraints may be applied individually, by means of DFIX distance restraints with the distance given as a negative number, or generated automatically by means of the BUMP instruction, which operates on all atoms which have been designated by 'CONN 0' instructions (and so are excluded from the connectivity array). DFIX restraints with negative distance d are ignored if the two atoms are further from one another than |d| in the current refinement cycle; if they are closer than |d|, a restraint is applied to increase the distance to |d| with the given (or assumed) esd. The automatic generation of anti-bumping restraints takes all possible symmetry equivalents into account, and allows a safety margin of 0.5 A so that atoms which move towards one another during the refinement are also covered. In combination with the SWAT instruction for diffuse solvent, BUMP provides a very effective way of handling solvent water in macromolecules. 8. Restraints on anisotropic displacement parameters: three different types of restraint may be applied to Uij values. DELU applies a 'rigid-bond' restraint to Uij of two bonded (or 1,3) atoms; the anisotropic displacement components of the two atoms along the line joining them are restrained to be equal. This restraint was suggested by J.S. Rollett (in Crystallographic Computing, Ed. F.R. Ahmed, S.R. Hall and C.P. Huber, Munksgaard, Copenhagen, (1970) pp. 167-181), and corresponds to the rigid-bond criterion for testing whether anisotropic displacement parameters are physically reasonable (F.L. Hirshfeld, Acta Cryst., A32 (1976) 239-244; K.N. Trueblood and J.D. Dunitz, Acta Cryst., B39 (1983) 120-133). J.J. Didisheim and D. Schwarzenbach (Acta Cryst., A43 (1987) 226-232) have shown that in many but not all cases, rigid- bond restraints are equivalent to the TLS description of rigid body motion in the limit of zero esd's; however this requires that (almost) all atom pairs are restrained in this way, which for molecules with conformational flexibility is unlikely to be appropriate. An extensive study (E. Irmer, Ph.D. Thesis, University of Goettingen, 1990) has shown that this condition is fulfilled within the experimental error for routine X-ray studies of bonds and 1,3-distances between two first-row elements (B to F inclusive), and so may be applied as a 'hard' restraint (low esd). A rigid bond restraint is not suitable for systems with unresolved disorder, e.g. AsF6- anions and dynamic Jahn-Teller effects, although it may be useful in detecting such effects. Isolated (e.g. solvent water) atoms may be restrained to be approximately isotropic, e.g. to prevent them going 'non-positive-definite'; this is a rough approximation and so should be applied as a 'soft' restraint with a large esd (ISOR). Similarly the assumption of 'similar' Uij values for spatially adjacent atoms (SIMU) is useful so that (for example) the thermal ellipsoids increase and change direction gradually going along a side-chain in a polypeptide, but this treatment is approximate and thus also appropriate only for a soft restraint; it is also useful for partially overlapping atoms of disordered groups. A simple way to apply SIMU to all such overlapping atoms is to give a SIMU instruction with no atoms (i.e. all atoms implied) and the third number set to a distance less than the shortest bond, i.e. SIMU 0.02 0.04 0.8 which applies the restraint to all pairs of atoms separated by less than 0.8 Angstroms. Additional SIMU restraints may be included in the same job. SHELXL-93 does not permit DELU, SIMU and ISOR restraints to reference symmetry generated atoms, although this is allowed for all geometrical restraints. To permit such references for displacement parameter restraints as well would considerably complicate the program, and is rarely required in practice. 9. 'Shift limiting restraints' may be applied in SHELXL-93 by the Marquardt algorithm (J. Soc. Ind. Appl. Math., 11 (1963) 431-441). Terms proportional to a 'damping factor' (the first parameter on the DAMP instruction) are added to the least-squares matrix before inversion. Shift limiting restraints are particularly useful in the refinement of structures with a poor data to parameter ratio, and for pseudosymmetric problems. The 'damping factor' should be reduced towards the end of the refinement, otherwise the least-squares estimates of the esd's in the less well determined parameters will be too low (the program does however make a first order correction to the esds for this effect). The shifts are also scaled down if the maximum shift/esd exceeds the second DAMP parameter. In addition, if the actual and target values for a particular restraint differ by more than 100 times the given esd, the program will temporarily increase the esd to limit the influence of this restraint in any one cycle to that produced by a discrepancy of 100 times the esd. This helps to prevent a bad initial model and tight restraints from causing dangerously large shifts in the first cycle. 10. Further constraints may be applied to atom coordinates, occupation and displacement parameters, and to restrained distances (DFIX) and chiral volumes (CHIV), by the use of 'free variables'. Linear combinations of free variables may in turn be restrained (SUMP). Free variables were required for special position constraints and for refining more than one atom on the same site in SHELX-76; their use in this way is allowed (for upwards compatibility) in SHELXL-93, but it is more convenient to use the fully automatic handling of special positions in SHELXL-93, and atoms on multiply occupied sites may be constrained using EXYZ and EADP. For further details see the description of the FVAR instruction. A major advantage of applying chemically reasonable restraints is that a subsequent difference electron density synthesis is often more revealing, because the parameters were not allowed to 'mop up' any residual effects. The refinement of pseudosymmetric structures, where the X-ray data may not be able to determine all of the parameters, is also considerably facilitated, at the cost of making it much easier to refine a structure in a space group of unnecessarily low symmetry! By way of example, assume that the structure contains a cyclopentadienyl (Cp) ring pi-bonded to a metal atom, and that as a result of the high thermal motion of the ring only three of the atoms could be located in a difference electron density map. We wish to fit a regular pentagon (default C-C 1.42 A) in order to place the remaining two atoms, which are input as dummy atoms with zero coordinates. Since the C-C distance is uncertain (there may well be an appreciable librational shortening in such a case) we refine the C5-ring as a 'variable metric' rigid group, i.e. it remains a regular pentagon but the C-C distance is free to vary. In SHELXL-93 this may all be achieved by inserting one instruction (AFIX 59) before the five carbons and one (AFIX 0) after them: AFIX 59 ! AFIX mn with m = 5 to fit pentagon (default C-C C1 1 .6755 .2289 .0763 ! 1.42 A) and n = 9 for v-m rigid-group refinement C2 1 .7004 .2544 .0161 C3 1 0 0 0 ! the coordinates for C3 and C4 are obtained by the C4 1 0 0 0 ! fit of the other 3 atoms to a regular pentagon C5 1 .6788 .1610 .0766 AFIX 0 ! terminates rigid group Since Uij values were not specified, the atoms would refine isotropically starting from U = 0.05. To refine with anisotropic displacement parameters in the same or a subsequent job, the instruction: ANIS C1 > C5 should be inserted anywhere before C1 in the '.ins' file. The SIMU and ISOR restraints on the Uij would be inappropriate for such a group, but: DELU C1 > C5 could be applied if the anisotropic refinement proved unstable. The five hydrogen atoms could be added and refined with the 'riding model' by means of: HFIX 43 C1 > C5 anywhere before C1 in the input file. For good data, in view of possible librational effects, a possible alternative would be: HFIX 44 C1 > C5 SADI 0.02 C1 H1 C2 H2 C3 H3 C4 H4 C5 H5 (which retains a riding model but allows the C-H bond lengths to refine, subject to the restraint that they should be equal within about 0.02 A). In analogous manner it is possible to generate missing atoms and perform rigid group refinements for phenyl rings (AFIX 66) and Cp* groups (AFIX 109). Very often it is possible and desirable to remove the rigid group constraints (by simply deleting the AFIX instructions) in the final stages of refinement; there is good experimental evidence that the ipso-angles of phenyl rings differ systematically from 120 degrees [P.G. Jones, J. Organomet. Chem., 345 (1988) 405; T. Maetzke and D. Seebach, Helv. Chim. Acta, 72 (1989) 624-630; A. Domenicano, Accurate Molecular Structures, eds. Domenicano and Hargittai, Chapter 18, OUP 1992]. As a second example, assume that the structure contains two molecules of poorly defined THF solvent, and that we have managed to identify the oxygen atoms. A rigid pentagon would clearly be inappropriate here, except possibly for placing missing atoms, since THF molecules are not planar. However we can RESTRAIN the 1,2- and the 1,3-distances in the two molecules to be similar by means of a 'similarity restraint' (SAME). Assume that the molecules are numbered O11 C12 ... C15 and O21 C22 ... C25, and that the atoms are given in this order in the atom list. Then we can either insert the instruction: SAME O21 > C25 before the first molecule, or: SAME O11 > C15 before the second. These SAME instructions define a group of five atoms which are considered to be the same as the five (non-hydrogen) atoms which immediately follow the SAME instruction. The entries in the connectivity table for the latter are used to define the 1,2- and 1,3-distances, so the SAME instruction should be inserted before the group with the best geometry. This one SAME instruction restrains five pairs of 1,2- and five pairs of 1,3- distances to be nearly equal, i.e. d(O11-C12) = d(O21-C22), d(C12-C13) = d(C22-C23), d(C13-C14) = d(C23-C24), d(C14-C15) = d(C24-C25), d(C15-O11) = d(C25-O21), d(O11-C13) = d(O21-C23), d(C12-C14) = d(C22-C24), d(C13-C15) = d(C23-C25), d(C14-O11) = d(C24-O21), and d(C15-C12) = d(C25-C22). In addition, it would also be reasonable to restrain the distances on opposite sides of the same ring to be equal. This can be achieved with one further SAME instruction in which we count the other way around the ring. For example we could insert: SAME O11 C15 < C12 before the first ring. The symbol '<' indicates that one must count up the atom list instead of down. The above instruction is exactly equivalent to: SAME O11 C15 C14 C13 C12 This generates 10 further restraints, but two of them [d(C13-C14) = d(C14-C13) and d(C12-C15) = d(C15-C12)] are identities, and each of the others appears twice, so only four are independent and the rest are ignored. It is not necessary to add a similar instruction before the second ring, because the program also automatically generates all 'implied' restraints, i.e. restraints which can be derived by combining two existing distance restraints which refer to the same atom pair. In contrast to other restraint instructions, the SAME instructions must be inserted at the correct positions in the atom list. These similarity restraints provide a very general and powerful way of exploiting non- crystallographic symmetry; in this example two instructions suffice to restrain the THF molecules so that they have (within an assumed standard deviation) twofold symmetry and are the same as each other. However we have not imposed planarity on the rings nor restricted any of the torsion angles. To complicate matters, let us assume that the two molecules are two alternative conformations of a THF molecule disordered on a single site. We must then ensure that the site occupation factors of the two molecules add to unity, and that no spurious bonds linking them are added to the connectivity table. The former is achieved by employing site occupation factors of 21 (i.e. 1 times free-variable 2) for the first molecule and -21 [ 1*(1-fv(2)) ] for the five atoms of the second molecule. Free variable 2 is then the occupation factor of the first molecule; its starting value must be specified on the FVAR instruction. The possibility of spurious bonds is eliminated by inserting 'PART 1' before the first molecule, 'PART 2' before the second, and 'PART 0' after it. Hydrogen atoms can be inserted in the usual way using the HFIX instruction since the connectivity table is 'correct'; they will automatically be assigned the site occupation factors of the atoms to which they are bonded. Finally we would like to refine with anisotropic displacement parameters because the thermal motion of such solvent molecules is certainly not isotropic, but the refinement will be unstable unless we restrain the anisotropic displacement parameters to behave 'reasonably' by means of rigid bond restraints (DELU) and 'similar Uij' restraints (SIMU); fortunately the program can set up these restraints automatically. The DELU restraints restrain the differences in the components of the displacement parameters of two atoms to zero along the 1,2- and 1,3-vector directions, and are derived with the help of the connectivity table. Since the SIMU restraints are much more approximate, we restrict them to atoms which, because of the disorder, are almost overlapping (i.e. are within 0.7 A of each other). Note that the SIMU restraints ignore the connectivity table and are based directly on a distance criterion specifically because this is a sensible way of handling disorder. In order to specify a non-standard distance cutoff which is the third SIMU parameter, we must also give the first two parameters which are the restraint esd's for distances involving non-terminal atoms (0.02) and at least one terminal atom (0.04) respectively. The '.ins' file now contains: HFIX 23 C12 > C15 C22 > C25 ANIS O11 > C25 DELU O11 > C25 SIMU O11 > C25 0.02 0.04 0.7 FVAR ..... 0.75 .... PART 1 SAME O21 > C25 SAME O11 C15 < C12 O11 4 ..... ..... ..... 21 C12 1 ..... ..... ..... 21 C13 1 ..... ..... ..... 21 C14 1 ..... ..... ..... 21 C15 1 ..... ..... ..... 21 PART 2 O21 4 ..... ..... ..... -21 C22 1 ..... ..... ..... -21 C23 1 ..... ..... ..... -21 C24 1 ..... ..... ..... -21 C25 1 ..... ..... ..... -21 PART 0 An alternative type of disorder common for THF molecules and proline residues in proteins is when one atom (say C14) can flip between two positions (i.e. it is the flap of an envelope conformation). If we assign C14 to PART 1, C14' to PART 2, and the remaining ring atoms to PART 0 then the program will be able to generate the correct connectivity, and so we can also generate hydrogen atoms for both disordered components (with AFIX, not HFIX): SIMU C14 C14' ANIS O11 > C14' FVAR ..... 0.7 .... SAME O11 C12 C13 C14' C15 O11 4 ..... ..... ..... C12 1 ..... ..... ..... AFIX 23 H12A 2 ..... ..... ..... H12B 2 ..... ..... ..... AFIX 0 C13 1 ..... ..... ..... PART 1 AFIX 23 H13A 2 ..... ..... ..... 21 H13B 2 ..... ..... ..... 21 PART 2 AFIX 23 H13C 2 ..... ..... ..... -21 H13D 2 ..... ..... ..... -21 AFIX 0 PART 1 C14 1 ..... ..... ..... 21 AFIX 23 H14A 2 ..... ..... ..... 21 H14B 2 ..... ..... ..... 21 AFIX 0 PART 0 C15 1 ..... ..... ..... PART 1 AFIX 23 H15A 2 ..... ..... ..... 21 H15B 2 ..... ..... ..... 21 PART 2 AFIX 23 H15C 2 ..... ..... ..... -21 H15D 2 ..... ..... ..... -21 AFIX 0 C14' 1 ..... ..... ..... -21 AFIX 23 H14C 2 ..... ..... ..... -21 H14D 2 ..... ..... ..... -21 AFIX 0 PART 0 It will be seen that six hydrogens belong to one conformation, six to the other, and two are common. The generation of the idealized hydrogen positions is based on the connectivity table but also takes the PART numbers into account. These procedures should be able to set up the correct hydrogen atoms for all cases of two overlapping disordered groups. In cases of more than two overlapping groups the program will usually still be able to generate the hydrogen atoms correctly by making reasonable assumptions when it finds that an atom is 'bonded' to atoms with different PART numbers, but it is possible that there are examples of very complex disorder which can only be handled by using dummy atoms constrained (EXYZ and EADP) to have the same positional and displacement parameters as atoms with different PART numbers (in practice it may be easier - and quite adequate - to ignore hydrogens except on the two components with the highest occupancies!). When the site symmetry is high, it may be simpler to apply similarity restraints using SADI or DFIX rather than SAME. For example the following three instruction sets would all restrain a perchlorate ion (CL,O1,O2,O3,O4) to be a regular tetrahedron: SAME CL O2 O3 O4 O1 SADI O1 O2 O1 O3 followed immediately by the atoms CL, O1... O4; the SAME restraint makes all the Cl-O bonds equal but introduces only FOUR independent restraints involving the O..O distances, which allows the tetrahedron to distort retaining only one -4 axis, so one further restraint must be added using SADI. or: SADI CL O1 CL O2 CL O3 CL O4 SADI O1 O2 O1 O3 O1 O4 O2 O3 O2 O4 O3 O4 or: DFIX 31 CL O1 CL O2 CL O3 CL O4 DFIX 31.6330 O1 O2 O1 O3 O1 O4 O2 O3 O2 O4 O3 O4 in the case of DFIX, one extra least-squares variable (free variable 3) is needed, but it is the mean Cl-O bond length and refining it directly means that its esd is also obtained directly. If the perchlorate ion lies on a three-fold axis through CL and O1, the SADI method would require the use of symmetry equivalent atoms (EQIV $1 y, z, x and O2_$1 etc. for R3 on rhombohedral axes) so DFIX would be simpler (same DFIX instructions as above with distances involving O3 and O4 deleted) [the number 1.6330 in the above example is of course twice the sine of half the tetrahedral angle]. If you wish to test whether you have understood the full implications of these restraints, try the following problems: (a) A C-O-H group is being refined with AFIX 87 so that the torsion angle about the C-O bond is free. How can we restrain it to make the 'best' hydrogen-bond to a specific Cl- ion, so that the H..Cl distance is minimized and the O-H..Cl angle maximized, using only one restraint instruction (it may be assumed that the initial geometry is reasonably good) ? (b) Restrain a C6 ring to an ideal chair conformation using one SAME and one SADI instruction. Hint: all 1-2, 1-3 and 1-4 distances are respectively equal for a chair conformation, which also includes a regular planar hexagon as a special case. A non-planar boat conformation does not have equal 1-4 distances. To force the ring to be non-planar, the ratio of the 1-2 and 1-3 distances would have to be restrained using DFIX and a free variable. MACROMOLECULES AND OTHER STRUCTURES WITH A POOR DATA/PARAMETER RATIO Macromolecules often contain regions of disordered solvent and do not usually diffract to as high a resolution as small molecules. On the other hand they often contain repeated chemical units which we can exploit by means of similarity restraints to improve the effective data to parameter ratio and hence the precision of the structure. These provide an effective way of incorporating 'non-crystallographic symmetry' into structure refinement. To simplify the application of restraints etc. SHELXL-93 allows a structure to be subdivided into residues, each of which is defined by a residue number and (optionally) a residue class (up to 4 characters). Different residues of the same chemical type may be assigned to the same class and also use identical atom names, but must have different residue numbers. Thus for example the beta carbon atoms in all phenylalanine residues (class PHE) in a polypeptide may all be called 'CB'. Only one instruction would then be needed to add the appropriate idealized hydrogens to all of them and refine them with a 'riding model': HFIX_phe 23 CB To apply 'similarity' distance restraints to all phenylalanines, all that is required is one SAME instruction, which should be inserted before the first atom of the residue with the best geometry (so that its connectivity array may be used to define the 1,2- and 1,3-distances): RESI 23 phe SAME_phe N > CZ [Note: there is of course no restriction on the N 3 ..... ..... ..... order of the atoms in a residue, but it must be ... the same for all residues of the same class] CZ 1 ..... ..... ..... It would also be sensible to apply a planarity restraint to these side chains: FLAT_phe CB > CZ The code '_*' is used to refer to all residues. For example it would be possible to use FLAT in this way to ensure that all peptide carbonyl carbons have planar coordination, but it is easier to do this by restraining their chiral volumes to zero (because the three bonded atoms do not then need to be named explicitly): CHIV_* C 0 assuming that these are the only atoms named 'C'; since the default chiral volume is zero it could be left out. In some cases it is necessary to refer to specific residues, in which case residue numbers should be used. For example the following instruction calculates the torsion angle of a disulfide bridge linking Cys_56 and Cys_124: CONF CB_56 SG_56 SG_124 CB_124 Protein crystallographers will have noticed that SHELXL-93 is fully compatible with the usual protein atom naming conventions, except that all atom names MUST begin with a letter, so the PDB convention of starting some hydrogen atom names with a digit is not allowed; similarly residue classes must begin with a letter and residue numbers must be pure numbers. The auxiliary program PDBINS is provided to generate a SHELXL-93 '.ins' file from a PDB file, incorporating restraints etc. taken from the dictionary file SHELXL.DIC. The general approach to the refinement of large structures with limited reflection data is to proceed GRADUALLY, using all appropriate restraints (and possibly rigid group constraints) in the early stages of refinement, and relaxing them as far as possible only when the refinement has more or less converged. Although full-matrix refinement is normally recommended for small-molecule refinements, it is more efficient in terms of computer resources to use the Konnert-Hendrickson conjugate gradient approach (CGLS) for macromolecular refinement, with judicious insertion of large full-matrix blocks to help to resolve problem areas (e.g. solvent disorder). A final refinement with overlapping full-matrix blocks, possibly restricted to the x, y and z coordinates only, would then be required to obtain the esds in e.g. torsion angles. For a very small protein or polynucleotide with less than 500 non-hydrogen atoms (excluding solvent) a single final xyz-block would suffice. The CGLS refinement is usually very stable; erratic behavior can usually be tracked down to one or more atoms with unreasonably large isotropic or anisotropic displacement parameters, or to refinement of more parameters than the data and restraints can support. If the second number on the L.S. or CGLS instruction is negative (-N) then every Nth reflection is ignored in the least-squares refinement, but is used instead for the calculation of independent R-values when the final structure factor cycle is performed. This enables 'R(free)' to be used to calibrate the sigmas for the various restraints and to check on possible 'over-refinement' (e.g. the refinement of noise peaks from a difference electron density map as solvent atoms). For details see A.T. Brunger, Nature 355 (1992) 472-475. Note the use of the DEFS instruction to change the default sigmas globally! A particularly effective application of R(free) is the decision as to whether the data justify (restrained) anisotropic refinement rather than isotropic. After the structure has more or less reached convergence after isotropic refinement in the usual way, two jobs are run with (for example) 'CGLS 20 -10' so that every 10th reflection is ignored in the refinement but is used instead for calculating R(free). One of the jobs should also contain ANIS (before the first atom), DELU and SIMU (without atom names), and ISOR (for the solvent water, e.g. 'ISOR O1 > LAST'). Only if R(free) is significantly lower for the ANIS job is further anisotropic refinement justified. This is more likely to be the case if the data have been collected to higher resolution (i.e. the data to parameter ratio is higher), but the quality of the data is also important. In general the effective resolution should be better than (very roughly) 1.5 Angstroms for proteins and polynucleotides before anisotropic refinement is justified. It is sensible to apply this R(free) test and - if justified - initiate anisotropic refinement BEFORE attempting to resolve discrete side-chain disorder unless the components of the disorder are well separated spatially, because anisotropic motion can be regarded as an alternative to isotropic motion with discrete disorder for small separations. On the other hand it is a good idea to try to locate as many solvent atoms as possible before applying the test (see below). The similarity restraints on the geometry are unbiased in the sense that no arbitrary numbers in the form of standard bond lengths and angles are required. Thus it should never be necessary to repeat a refinement because more precise values of these quantities are available. If R(free) is used to establish optimal esd's for the restraints, the weights may also be regarded as objective. The only assumption being made is that chemically equivalent bond lengths and angles (i.e. 1,3-distances) are equal in a statistical sense. Similarly the planarity restraints and the restraints on isotropic and anisotropic displacement parameters do not require the use of preconceived (and possibly erroneous) numbers (except zero!). This approach should be used whenever the type of problem (e.g. the extent of the non-crystallographic similarity) and the extent of the data permit. The geometrical similarity approach works very well for 'small-molecule' structures which have become large because there are several chemically identical molecules in the crystallographic asymmetric unit, and well for polynucleotides which may also contain several examples of each repeating unit (especially when divided up into base, furanose and phosphate units). A further advantage of the similarity approach for polynucleotides is that the state of protonation of the bases may be uncertain, making it difficult to know which standard bond lengths etc. to use as target values or in fitting rigid groups; it is safer to assume that the equivalent bases have the same (partial) protonation states, i.e. the 1,2- and 1,3-distances are 'similar' but unknown. On the other hand in proteins some amino-acids may be present many more times (and so will be better refined) than others, and geometric similarity does not help for an amino-acid which is only present once. Thus the recommended approach for proteins and large polypeptides is to use DFIX instructions to restrain 1,2- and 1,3-distances to standard values, with SAME/SADI (and small sigmas) to restrain the components of disordered residues to be similar. FLAT restraints are useful for aromatic residues and (with larger sigmas) for the five atoms involved in each main-chain peptide linkage. It is also very convenient to impose planarity on carbonyl and carboxyl carbons using CHIV (with a chiral volume of zero). All these restraints are set up automatically when the program PDBINS (Appendix B) is used to convert a PDB file for a protein into SHELXL-93 '.ins' format; the restraints are taken from the dictionary file SHELXL.DIC which users are encouraged to extend and adapt to local circumstances. Alternatively a text editor may be used to incorporate the appropriate parts of SHELXL.DIC into the .ins file. Standard (restraint) bond lengths based on the CSD have been tabulated by F.H.Allen, O. Kennard, D.G. Watson, L. Brammer, A.G. Orpen and R. Taylor in Sections 9.5 and 9.6 of Volume C of International Tables for Crystallography (1992), Ed. A.J.C. Wilson, Kluwer Academic Publishers, Dordrecht, pp. 685-791. Suitable parameters for proteins have been given by R.A. Engh and R. Huber, Acta Cryst., A47 (1991) 392-400. For nucleic acids the necessary parameters may be taken from R. Taylor and O. Kennard, J. Mol. Struct., 78 (1982) 1-28 (bases and phosphates) and S. Arnott and D.W.L. Hukins, Biochem. J., 130 (1972) 453-465 (furanose rings). Taylor and Kennard found no evidence that the bases are non-planar, so FLAT can safely be used. With poor resolution data it might be better to fit the bases to the orthogonal coordinates given by R. Taylor and O. Kennard, J. Am. Chem. Soc., 104 (1982) 3209-3212, and then refine them as rigid groups (FRAG...FEND - possibly in an 'include file' - followed by AFIX 176 etc.). It appears that the optimal restraint esds are very nearly independent of the type of structure and the resolution of the data, so normally the default values may be used. These have been established by R(free) and other tests on a variety of structures. The default values may if necessary be reset globally by a DEFS instruction before the individual restraints. The default esds are: all SAME and SADI distances, and DFIX with positive d: 0.03 A (first DEFS parameter); FLAT and CHIV: 0.2 A^3 (second DEFS parameter); DELU: 0.01 A^2 (third DEFS parameter), SIMU: 0.05 (fourth DEFS parameter) if neither atom terminal, otherwise 0.1 (or twice the fourth DEFS parameter); ISOR: 0.1 if atom not bonded to exactly one other atom, otherwise 0.2; DFIX -d (anti- bumping restraints) 0.1 A. The ISOR and DFIX -d defaults are not set by DEFS. Although the above default restraint esds give good results for small molecules and proteins which diffract to 1.2 Angstroms or better, there may be discrepancies involving the rigid bond restraints (indicating that the harmonic model is not such a good approximation, i.e. an ensemble (molecular dynamics) approach may be a better description. In such case DELU and SIMU can be relaxed to about 0.03 and 0.10 respectively for anisotropic refinement, and this model may well give the lowest value for the free R-factor. Some care is needed, because if the restraints are relaxed too far the refinement may become unstable. The refinement may also become unstable (e.g. oscillate rather than converge) if one or more solvent atoms have unreasonably high displacement parameters, in which case they can be deleted. Otherwise either 'DAMP 100' (with L.S.) or 'SLIM .3 .1' (with CGLS) should be tried to damp the refinement (which will then require more cycles for convergence). A further facility primarily intended for macromolecules but also useful for smaller structures is the production of tables using RTAB. When used in conjunction with residues, RTAB provides a convenient way of tabulating standard torsion angles, chiral volumes, and distances and angles involved in (for example) hydrogen bonds. Examples of the latter involving symmetry generated atoms were included in the second test structure (sigi) discussed above. The following instructions would produce sorted tables of the standard protein torsion angles and chiral volumes for the alpha-carbon atoms, assuming that the residues are numbered consecutively (CA_- means the atom CA with the residue number decreased by one): RTAB_* Omeg CA_- C_- N CA RTAB_* Phi C_- N CA C RTAB_* Psi N CA C N_+ RTAB_* Chi N CA CB CG RTAB_* Cvol CA If RTAB_* is not appropriate for a particular residue, e.g. some torsion angles involving the terminal residues, or chi and chiral volume for glycine, the residues in question are simply left out of the tables. The _+ and _- notation may also be used for cyclic peptides by assigning an 'alias' to the first and last residues; for example the residues in a cyclic pentapeptide could be numbered 2 to 6 inclusive, with alias 7 assigned to residue 2 and alias 1 to residue 6, so that all the torsion angles would be tabulated using the above RTAB instructions. The SWAT option introduces one variable and one fixed parameter which enable diffuse solvent to be modeled by Babinet's principle (R. Langridge, D.A. Marvin, W.E. Seeds, H.R. Wilson, C.W. Hooper, M.H.F. Wilkins and L.D. Hamilton, J. Mol. Biol. 2 (1960) 38-64; H. Driessen, M.I.J. Haneef, G.W. Harris, B. Howlin, G. Khan and D.S. Moss, J. Appl. Cryst. 22 (1989) 510-516). This usually produces a significant but not dramatic improvement for the very low order data in macromolecular refinements. One of the most difficult and potentially time-consuming aspects of macromolecular structure refinement is the treatment of solvent water. The relatively diffuse solvent atoms contribute primarily to the lower order reflections and so often constitute a local region in the least-squares parameter space in which there are more parameters than data, i.e. there may be many plausible sets of parameters which fit the data equally well. Thus anisotropic refinement of fully occupied atoms or isotropic refinement of a larger number of water molecules with fractional occupation factors may well fit the data equally well and involve about the same number of parameters in total. The advantage of the former approach is that chemically sensible restraints can be applied to the distances between the waters (and between the solvent and protein atoms). Even when the data only permit an isotropic refinement, it is recommended that the water be refined with full occupancies and 'anti-bumping' restraints until no more waters can be found, and then if necessary (e.g. when there are strong difference Fourier peaks closer than say 2.3 Angstom to waters with relatively high U values) partial occupancies can be assigned. SHELXL-93 enables anti-bumping restraints to be input by hand (DFIX -d) but they will usually be generated automatically by the program (by using the BUMP instruction and flagging the (water) atoms on which it is to operate by 'CONN 0'). The anti-bumping restraints are generated between all water atoms, and between all water and all other atoms, including all possible symmetry equivalents and taking atom types into account (thus potential hydrogen bonds are allowed to be shorter than O..C distances etc.). The following iterative procedure proves effective in practice at building up a network of fully-occupied water molecules, with an acceptable pattern of hydrogen-bonded distances, that is also consistent with the diffraction data. The SWAT and BUMP instructions should be included throughout, with CONN 0 to flag the water molecules and inhibit the generation of accidental bonds (which can for example upset the reidealization of hydrogen atoms each refinement cycle). If the waters are anisotropic 'ISOR 1.0 O1 > LAST' is advisable. After each refinement job, water molecules with (an)isotropic displacement parameters which are too high (e.g. all three principal components greater than 1.2 or 1.4 A^2) should be deleted, and (FMAP 2 / PLAN 200 2.3) difference peaks which make sensible hydrogen bonding distances to water molecules or to other electronegative atoms added; these will not necessarily be the highest peaks. The final table of distances between peaks should be checked to ensure that there are no short distances between the chosen peaks (PLAN 200 2.3 does this automatically). The list of 'disagreeable restraints' after the final refinement cycle in each job should also be checked for short contacts and if necessary one of the offending waters removed. At lower resolution it would be necessary to use a graphical display of the Fo-Fc or 2Fo-Fc electron density to locate the new trial water molecules. This procedure converges after a few jobs when no further water molecules can be eliminated or added. At this point the remaining difference electron density peaks should be inspected carefully to see if it is necessary to add partially occupied discrete solvent atoms in the vicinity of disordered side-chains (if any). An advantage of the full occupancy / antibumping approach is that it prevents water molecules from diffusing into protein regions and thus facilitates remodeling of disordered side-chains etc. In summary, for modeling the solvent the following instructions would be typical: CGLS 10 SWAT 2 2 ! will be updated by the program in the .res file BUMP ! automatic antibumping restraints generated CONN 0 O1 > LAST ! flag water for antibumping and exclude from connectivity ISOR 0.1 O1 > LAST ! for anisotropic waters (ignored for isotropic atoms) FMAP 2 ! Fo-Fc map PLAN 200 2.3 ! difference peaks only written to .res for potential waters and after each job waters would be deleted on editing .res to .ins if bad contacts remain (see final restraints summary) or if U or Ueq have risen to too high a value; selected (or perhaps all) potential waters in the peak-list are then renamed and moved to before the HKLF instruction. It is also possible to monitor progress using the free R factor (CGLS 10 -10). Even if anisotropic refinement is planned, it is a good idea (and it usually makes the eventual R-free test for the anisotropic refinement more favorable) to optimize the water structure in this way first. If this extension of the water is continued after going anisotropic, then an ANIS instruction is needed before the first new water (oxygen) atom. Other useful features for macromolecules include an 'omit map' (OMIT atomnames followed by FMAP), the SHEL instruction for ignoring high and low resolution data, the use of 'include files' for accessing standard fragments or restraint libraries, and provision for synchrotron data at various wavelengths (DISP) as well as Laue data (LAUE plus HKLF 2). The amount of '.lst' file output produced may be reduced substantially by putting 'MORE 0' before the first atom in the '.ins' file, but this facility should only be used when one is sure that the '.ins' file is correct; it might be better to edit (or write a little program to extract information from) the full '.lst' file instead, so that diagnostic information is still available if required. The UNIX 'more' command is useful for browsing through '.lst' files. In contrast to standard macromolecular refinement programs, SHELXL-93 is able to provide reliable estimates of the standard deviations of all refined parameters and of all derived quantities, subject of course to any assumptions implied by the restraints employed (in keeping with the Bayesian philosophy). For example tight geometrical 'similarity restraints' effectively determine mean bond lengths and angles and their esd's, but leave the torsion angles free to refine independently; thus the torsion angles - and their esds - retain their diagnostic value. In summary, a typical refinement of a small protein would take the following course. First the auxiliary program PDBINS would be used to convert the atom coordinates into SHELXL-93 '.ins' format and to extract the necessary restraints from a residue dictionary file (based on 'shelxl.dic' which is provided as a model). This is especially convenient if XPLOR has been used for the structure solution by molecular replacement and/or the initial refinement. Some editing of the '.ins' file may be needed if disorder or non-standard residues are present. Different components of disordered groups should be assigned different 'PART' numbers, and the occupation factors of two components may be refined as p and (1-p) by the use of a free variable (i.e. set to e.g. 21 and -21 in which case a starting value for free variable number 2 should be given as the second parameter on the 'FVAR' instruction). The first SHELXL-93 runs serve to build up a consistent network of fully occupied solvent molecules as explained above. At this point the hydrogen atoms are inserted by removing 'REM ' which precedes the HFIX instructions from PDBINS and the dictionary file. Attachment of hydrogens to more than one component of each disordered group is best performed in a subsequent job by inserting the appropriate AFIX instructions. If the resolution is very good (ca. 1.5 A or better) the R(free) test should now be performed to see whether anisotropic refinement is justified (i.e. two 'CGLS 20 -10' jobs should be run, differing only in that one contains an 'ANIS' instruction). It is a mistake to model discrete disorder (unless the components are very clearly separated), or to include partially occupied solvent, until this test is applied, because anisotropic refinement may well provide an alternative way of modeling these effects. Subsequent anisotropic refinement (if justified) may be combined with improvement of the solvent model and possible modeling of discrete disorder; very often the better phase estimates resulting from the restrained anisotropic refinement give a much clearer difference electron density. Towards the end of this procedure partially occupied solvent may be introduced; where possible the occupation factors should be coupled (using free variables) to those of neighboring disordered side-chains, or an atom may be split into two components with occupancies fixed at 0.5 (i.e. set as 10.5), either as recommended by the program (see the list of principal displacement components) or as deduced from an Fo-Fc Fourier. This maintains the anti-bumping restraints with other solvent and side-chain atoms, but not between disordered components for which the occupancies add up to less than 1.1 (slightly greater than one to allow for hydrogen atom contributions etc.). At various stages in the refinement one of the LIST options can be used to write a phased reflection list to the .fcf file for input into another macromolecular FFT map generating for input into a graphics system. When the refinement has converged, it may be desired to run an xyz-only refinement with overlapping blocks (L.S./BLOC) to obtain esds on the torsion angles and hydrogen bonding distances (the antibumping list may be used to set up tables using RTAB and EQIV - see the sigi test example). Torsion angles and hydrogen-bonding distances are not usually restrained in the refinement, and so their esds have some meaning. Finally 'ACTA 2' and/or WPDB may be used to archive the results. ABSOLUTE STRUCTURE Even if determination of absolute configuration is not one of the aims of the structure determination, it is important to refine ANY non-centrosymmetric structure as the correct 'absolute structure' in order to avoid introducing systematic errors into the bond lengths etc. In some cases the absolute structure will be known with certainty (e.g. proteins), but in others it has to be deduced from the X-ray data. Generally speaking, a single phosphorus or heavier atom suffices to determine an absolute structure using Cu-K(alpha) radiation, and with accurate high-resolution low-temperature data including Friedel opposites such an atom may even suffice for Mo-K(alpha). In the course of the final structure factor calculation the program calculates the Flack absolute structure parameter x and its esd (it is a bonus of the refinement against F^2 that this calculation is a 'hole in one' and doesn't require expensive iteration). A comparison of x with its esd provides an indication as to whether the refined absolute structure is correct or whether it has to be 'inverted' (the program prints a suitable warning should this be necessary). This attempt to refine x 'on the cheap' is reliable when the true value of x is close to zero, but may produce a (possibly severe) underestimate of x for structures which have to be inverted, because x is correlated with positional and other parameters which have not been allowed to vary. Effectively these parameters have adapted themselves to compensate for the wrong (zero) value of x in the course of the refinement, and need to be refined with x to eliminate the effects of correlation. These effects will tend to be greater when the correlation terms are greater, e.g. for pseudo- symmetric structures and for poor data to parameter ratios (say less than 8:1). x can be refined at the same time as all the other parameters using the TWIN and BASF instructions; this implies racemic twinning and so is discussed under TWIN below (see also H.D. Flack, Acta Cryst., (1983) A39, 876-881). For most space groups 'inversion' of the structure simply involves inserting an instruction 'MOVE 1 1 1 -1' before the first atom. Where the space group is one of the 11 enantiomorphous pairs [e.g. P3(1) and P3(2)] the translation parts of the symmetry operators need to be inverted as well to generate the other member of the pair. There are seven cases for which, if the standard setting of the International Tables for Crystallography has been used, inversion in the origin does NOT lead to the inverted absolute structure (in fact, in some cases it leads to a totally different structure: H.D. Flack, personal communication, 1992)! This problem was drawn to the author's attention by D. Rogers in about 1980, but was probably first discussed in print by E. Parthe and L.M. Gelato, Acta Cryst., A40 (1984) 169-183 and by G. Bernardinelli and H.D. Flack, Acta Cryst., A41 (1985) 500-511. The offending space groups and corresponding correct MOVE instructions are: Fdd2 MOVE .25 .25 1 -1 I4(1)cd MOVE 1 .5 1 -1 I4(1) MOVE 1 .5 1 -1 I-42d MOVE 1 .5 .25 -1 I4(1)22 MOVE 1 .5 .25 -1 F4(1)32 MOVE .25 .25 .25 -1 I4(1)md MOVE 1 .5 1 -1 TWINNED CRYSTALS AND REFINEMENT AGAINST POWDER DATA Twinned crystals are refined in SHELXL-93 by the method of Pratt, Coyle and Ibers, J. Chem. Soc. 1971, 2146-2151 (see also Jameson, Acta Cryst., A38 (1982) 817-820). The sum of the Fc^2 values of the individual twin domains, each multiplied by its fractional contribution, is fitted to the observed Fo^2. Since the n fractional contributions must sum to unity, only n-1 of them can be refined independently; the fraction of component 1 is set equal to one minus the sum of the other fractional contributions. Refinement of twinned crystals and refinement against F^2-values derived from powder data are similar in that several reflections with different indices may contribute to a single F^2 observation. For powder data this requires some small adjustments to the format of the '.hkl' file; the batch number becomes the multiplicity m, and where several reflections contribute to the same observation the multiplicity is made positive for the last reflection in the group and negative for the rest. A similar approach is possible for twinned crystals, except that the batch number is replaced by the twin component number, and the batch scale factors (BASF) may be refined to determine the fractional contributions of the components 2, 3, ... k1, the fraction of component 1, is refined as ( 1 - k2 - k3 - ... ). In simple cases, i.e. when the lattices of all components are always coincident, the normal format can be retained in the '.hkl' file, and the index transformation specified with a TWIN instruction. Although SHELXL-93 may be useful for some high symmetry and hence reasonably well resolved powder and fibre diffraction patterns - the various restraints and constraints should be exploited in full to make up for the poor data/parameter ratio - for normal powder data a Rietveld refinement program would be much more appropriate. For both powder (HKLF 6) and twinned data (HKLF 5 or TWIN with HKLF 4), the reflection data are reduced to the 'prime' component, by multiplying Fo^2 and Fc^2 by the ratio Fc^2(prime) / Fc^2(total), before performing the analysis of variance and the Fourier calculations. Similarly 'OMIT h k l' refers to the indices of the prime component. The prime component is the one for which the indices have not been transformed by the TWIN instruction (i.e. m = 1 ), or in the case of HKLF 5 or HKLF 6 the component given with positive m (i.e. the last contributor to a given intensity measurement, not necessarily with |m| = 1). For powder data the least-squares refinement fits the overall scale factor (osf^2 where osf is given on the FVAR instruction) times the multiplicity weighted sum of calculated intensities to Fo^2: (Fc^2)* = osf^2 [ m(1) * Fc(1)^2 + m(2) * Fc(2)^2 + m(3) * Fc(3)^2 + ... ] where the multiplicities of the contributors are given in the place of the batch numbers in the '.hkl' file. Since it is then not possible to define batch numbers as well, 'BASF' cannot be used with powder data. For twinned data (TWIN or HKLF 5) the expression becomes: (Fc^2)* = osf^2 [ k(1) * Fc(1)^2 + k(2) * Fc(2)^2 + k(3) * Fc(3)^2 + ... ] where the starting values for the k(2), k(3), ... are given on the BASF instruction, and k(1) is defined such that Sigma[k(m)] = 1. If no BASF instruction is used, all the k(m) are made equal. m is the component number given in the place of the batch number in the '.hkl' file; if TWIN is used to generate the components, m is 1 for the initial indices, 2 after applying the TWIN matrix once, 3 after applying it twice, etc. The parameter ncomp must be given on the TWIN instruction if the matrix is to be applied more than once. The following cases are relatively common: (a) The lower symmetry trigonal, tetragonal, hexagonal or cubic Laue groups may be twinned so that they look (more) like the corresponding higher symmetry Laue groups (assuming c unique except for cubic): TWIN 0 1 0 1 0 0 0 0 -1 plus one BASF parameter if the twin components are not equal in scattering power. (b) Orthorhombic with a and b approximately equal may emulate tetragonal: TWIN 0 1 0 1 0 0 0 0 -1 plus one BASF parameter for unequal components. (c) Monoclinic with beta approximately 90 degrees may emulate orthorhombic: TWIN 1 0 0 0 -1 0 0 0 -1 plus one BASF parameter for unequal components. (d) Monoclinic with a and c approximately equal and beta approximately 120 degrees may emulate hexagonal [P2(1)/c would give absences and possibly also intensity statistics corresponding to P6(3)]. There are three components, so ncomp must be specified and the matrix is applied once to generate the indices of the second component and twice for the third component. In German this is called a 'Drilling' as opposed to a 'Zwilling' (with two components): TWIN 0 0 1 0 1 0 -1 0 -1 3 plus TWO BASF parameters for unequal components. If the data were collected using an hexagonal cell, then an HKLF matrix would also be required to transform them to a setting with b unique: HKLF 4 1 1 0 0 0 0 1 0 -1 0 (e) Refinement of racemic twinning may be performed by adding the following two instructions to the '.ins' file (and retaining HKLF 4): TWIN BASF 0.5 since the default TWIN matrix inverts the indices. In this example, the BASF coefficient is the Flack absolute structure parameter x (H.D. Flack, Acta Cryst., (1983) A39, 876-881; G. Bernardinelli and H.D. Flack, Acta Cryst., A41 (1985) 500-511). Refinement of racemic twinning should normally only be attempted after all non-hydrogen atoms have been located AND the program suggests that it would be advisable. If racemic twinning is refined in this way, the automatic calculation of the Flack x parameter in the final structure factor cycle is suppressed, since the BASF parameter is x. If general and racemic twinning are to be refined simultaneously, ncomp should be doubled and given a negative sign, and there should be |ncomp|-1 BASF twin component factors (or none, in the unlikely event that all are to be fixed as equal). The inverted components follow those generated using the TWIN matrix, in the same order. In such a case a single Flack x parameter is no longer appropriate; the program will still estimate a value, which should be zero since the effect has already been taken into account, but its esd gives a guide to the reliability of the racemic refinement. The HKLF 5 and 6 instructions force MERG 0, i.e. neither a transformation of reflection indices into a standard form nor a sort-merge is performed. If twinning is specified using the TWIN instruction, any MERG instruction may be used and the default remains MERG 2. Although this is always safe for racemic twinning, there may be other forms of twinning for which it is not permissible to sort-merge first. Whether or not MERG is used, the program ignores all systematically absent contributions, with the result that a reflection is excluded from the data if it is systematically absent for all components. Twinning usually arises for good structural reasons. When the heavy atom positions correspond to a higher symmetry space group it may be difficult or impossible to distinguish between twinning and disorder (of the light atoms); see W. Hoenle and H.G. von Schnering, Z. Krist., 184 (1988) 301-305. Since refinement as a twin usually requires only two extra instructions and one extra parameter, in such cases it should be attempted first, before investing many hours in a detailed interpretation of the 'disorder'! THE '.ins' INSTRUCTION FILE - DETAILED SPECIFICATION The rest of this documentation should be regarded as a reference manual rather than light reading! Defaults are given in square brackets in this documentation; '#' indicates that the program will generate a suitable default value based on the rest of the available information. Continuation lines are flagged by '=' at the end of a line, the instruction being continued on the next line which must start with four spaces. Other lines beginning with four spaces are treated as comments, so blank lines may be added to improve readability. All characters following '!' or '=' in an instruction line are ignored, except after TITL, SYMM or EQIV (for which continuation lines are not allowed). The '.ins' file may include an instruction of the form: +filename (the '+' character MUST be in column 1). This causes further input to be taken from the named file until an 'END' instruction is encountered in that file, whereupon the file is closed and instructions are taken from the next line of the '.ins' file. The input instructions from such an 'include' file are not echoed to the '.lst' and '.res' file, and may NOT contain FVAR, BASF, EXTI or SWAT instructions or atoms (except inside a FRAG...FEND section) since this would prevent the '.res' file from being used unchanged for the next refinement job (after renaming as '.ins'). The '+filename' facility enables standard fragment coordinates or long lists of restraints etc. to be read from the same files for each refinement job, and for different structures to access the same fragment or restraint files. One could also for example store the LATT and SYMM instructions for different space groups, or neutron scattering factors for particular elements, or LAUE instructions followed by wavelength-dependent scattering factors, in suitably named files. Since these 'include' files are not echoed, it is a good idea to test them as part of an '.ins' file first, to check for possible syntax errors. Such 'include' files may be nested; the maximum allowed depth depends upon the operating system and compiler used. Note that on some (e.g. IBM mainframe) computers, 'filename' is a dummy name (DDNAME) which must be defined in the JCL or REXX macro used to submit the job. TITL [ ] Title of up to 76 characters, to appear at suitable places in the output. The characters '!' and '=', if present, are part of the title rather than having a special significance. CELL lambda a b c alpha beta gamma Wavelength and unit-cell dimensions in Angstroms and degrees. ZERR Z esd(a) esd(b) esd(c) esd(alpha) esd(beta) esd(gamma) Z value (number of formula units per cell) followed by the estimated standard deviations in the unit-cell dimensions. LATT N [1] Lattice type: 1=P, 2=I, 3=rhombohedral obverse on hexagonal axes, 4=F, 5=A, 6=B, 7=C. N must be made negative if the structure is non-centrosymmetric. SYMM symmetry operation Symmetry operators, i.e. coordinates of the general positions as given in International Tables. The operator x, y, z is always assumed, so MUST NOT be input. If the structure is centrosymmetric, the origin MUST lie on a center of symmetry. Lattice centering should be indicated by LATT, not SYMM. The symmetry operators may be specified using decimal or fractional numbers, e.g. 0.5-x, 0.5+y, -z or Y-X, -X, Z+1/6; the three components are separated by commas. SFAC ele