Tonto input file structure

From Tonto Wiki
Jump to: navigation, search

This wiki tells how a Tonto input file is structured.

It tells you how data is entered into Tonto.

It also tells you what keywords are used to enter the data.

And it tells how to find out what the allowed keywords are.

You really need to know this if you are writing input files.

Contents

Input file structure

Keywords

Input files are comprised of a series of keywords entered between curly braces.

Keywords always begin with an alphabetic character.

They are comprised of letters, underscores and possibly a terminating equals character.

Keywords are insensitive to capitalisation.

Newlines and spaces between keywords are ignored.

So, here is what an input file looks like (with an inline comment added for good measure):

 { 
    <keyword-1>
       
    <keyword-2>
    
    <keyword-3> ! Some comment
    ...
 }

Curly braces and data blocks

The curly braces represent the beginning and end of a block of data.

A block of data is always associated with a particular module or object.

For example, the crystal= keyword shown below expects to receive a block of CRYSTAL data as input.

 crystal= {
    <keyword-1>
    ...
 }

To know that crystal= expects CRYSTAL data, you have to read the documentation; it may not always be as obvious as this example.

Note that keywords which input data may be comprised of further assignments within the curly braces.

For example, here the CRYSTAL data block includes xray_data= as one of its allowed keywords:

 crystal= {
     
    <CRYSTAL-keyword-1>
    
    <CRYSTAL-keyword-2>
    
    xray_data= {
       ...
       <DIFFRACTION_DATA-keyword-1>
       ...
    }
  
 }

You can find the xray_data= keyword in the process_keyword routine in the CRYSTAL module.

The xray_data= keyword itself expects to receive a block of DIFFRACTION_DATA information.

The moral of the story is that input files are made up of nested curly braces.

In the Tonto program, the outermost block of data is always associated with the MOLECULE object.

REDIRECT and REVERT : changing text file streams

Often you may have quite a lot of data to input into the stdin file.

For example, you may want to input a considerable amount of reflection data.

It would be impractical to just have one input file: what you want is to keep the data in another file.

In Tonto, in any non-list input environment (see below) you can use the REDIRECT <filename> command to transfer input processing to the start of specified data file:

  crystal= {
  
     ...
      
     xray_data= {
 
        thermal_smearing_model= hirshfeld
  
        optimise_extinction= NO
  
        REDIRECT data.nh3
  
     }
  
  }

At the end of the specified file, in this case data.nh3, put a REVERT command to jump back into the original textfile just at the point you left off.

    reflection_data= {
   
       data= {
  
          1   1   0    18.093    0.118
          1   1   1    63.470    0.446
             ...
          0   7   1     0.387    0.614
          1  -5  -5     1.158    0.205
         -1  -1   7     0.877    0.271
  
       }  ! <--- ends the "data=" block
  
     }    ! <--- ends the "reflection_data=" block
  
     REVERT


Neat!

Just make sure that you choose a <filename> which is not used by Tonto or some other program, to store temporary data!

Data input keywords vs command keywords

Keywords can be of two types:

  • Data input keywords
    • These always end in an equals sign
    • A `data item' is always expected to follow a data input keyword
    • The data item can be simple or complex (or, equivalently, derived data).
    • Simple data is either a string, logical, integer, real number or complex number
    • Simple data does not require curly braces for input: complex data does
    • Data input can also be list-data or non-list-data
    • List style data input is used for entering lists of data e.g. lists or coordinates
  • Command keywords
    • These do not end in an equals character
    • They are never followed by data
    • They tell the data block to `do something' and perhaps produce output or results

Inputting simple data

Form of simple data input

Tthe following is an example of a data input keyword, which expects a simple string:

  basis_directory= /home/dylan

This is in fact an assignment of simple string data.

Note that for simple data assignment curly braces are not used.

Note also that there is a required space between the equals and the data item.

There should also be no space between the last character and the equals sign, but Tonto will understand and forgive you if you write:

  basis_directory = /home/dylan

String data (STR)

Strings can be enclosed in single or double quotes:

  basis_directory= /home/dylan
  basis_directory= '/home/dylan'
  basis_directory= "/home/dylan"

You must use quotes if your string includes embedded blank spaces.

Why? By default, blank spaces terminate the end of a string token.

Although keywords themselves are case insensitive, string data is case sensitive.

Logical or binary-switch data (BIN)

Logical data can be enetered by a variety of means:

  refine_structure= TRUE  ! All these set the logical value TRUE
  refine_structure= YES
  refine_structure= Y
  refine_structure= yes
  refine_structure= y
  refine_structure= 1
  
  refine_structure= FALSE ! All these set the logical false
  refine_structure= NO 
  refine_structure= N     
  refine_structure= no
  refine_structure= n
  refine_structure= 0

Like keywords themselves, logical data input is case insensitive.

Integer data (INT)

Integer data is entered as you expect

  IT_group_number= 121

Real number data (REAL)

There are several ways to enter real number data

  temperature= 273
  temperature= 273.0
  temperature= 2.730e2
  temperature= 2.730d2

This is as you would use for Fortran.

For real numbers, you canm also specify the units e.g.

  CH_bond_length= 1.008 angstrom

If you don't specify the units, atomic units are assumed.

That is, Tonto converts your number (with its units) into atomic units which is the default internal representation for quantities with units.

Complex-number data (COMPLEX)

Complex number data is entered as a pair of real numbers between parentheses:

  phase= (1.23e-1,0.001)

Again, this is as you would use for Fortran.

Please don't confuse complex-number data with complex data.

The former is a number, the latter involves nested curly braces.

Inputting complex (or derived) data

Complex data is not the same as complex-number data!

Complex or derived data input is comprised of a succession of simple data or other complex data.

You've already seen how more complicated data are input in the crystal= example above.

Here's another example in more detail:

  scfdata= {
     initial_density= promolecule
     kind= rhf
     convergence= 0.00001
     diis= { convergence_tolerance= 0.00001 }
  } 
    
  scf

In this example, scf_data= inputs a complex SCF_DATA block of data into Tonto.

By the way, at the end of this complex data input, an scf command keyword is issued.

This is a command because there is no equals after it. Get it?

Inputting simple list data

The different kinds

Sometimes you want to enter lists of data. e.g. lists of string, integers and so on.

In Tonto, lists of simple data are called vectors and are denoted

  • VEC{STR}, string lists
  • VEC{BIN}, logical lists
  • VEC{INT}, integer lists
  • VEC{REAL}, real number lists
  • VEC{COMPLEX}, complex number lists

Fixed-length vs variable-length lists

We can also specify if the list is fixed-length or variable-length:

  • If the list has a fixed size, this is specified in parentheses at the end of the list descriptor.
    • Thus a list of strings of length 5 is specified VEC{STR}(5).
  • On the other hand, if the list is of unknown or variable-length, we use a star (*) character at the end.
    • Thus an integer list of arbirary length is specified as VEC{INT}*.

Entering fixed-length list data

To enter list data where the size of the list is known simply list the data in order

For example, an ATOM data block specifies that an atom position is entered using the pos= keyword.

The pos= keyword expects data of the form VEC{REAL}(3) i.e. a vector of length 3.

Then the following input would be used:

  atom= {
     ...
     pos = 1.10 2.23 5.53 angstrom
     ...
  }

Easy, huh?

Note:

  • You can specify the units of the whole real-number list, just like for a single real number (see above)
  • No commas are used to separate the elements

Entering variable-length list data

To entering list data where the size of the list is not known use of curly brackets surrounding the list.

The curly brackets are needed because Tonto needs to know where the list starts and where it ends.

So, for instance to enter a variable-length integer list use

  system_A_atom_indices= { 3 4 5 6 7 }

Easy, huh?

In the case of integer lists, where the integers increase consecutively, you can also use the ... notatation:

  system_A_atom_indices= { 3 ... 7 }

Just as for fixed-length real-number lists, units can be specified for variable-length real-number lists

  harmonic_frequencies= { 1.5  3.6  7.7 } wavenumbers

Inputting complex (derived) list data

Using keys= and data=

Lists of complex data are almost always of variable-length.

Hence, they are inputted using curly braces (just as for variable-length simple-data lists).

Inputting complex list data is controlled by:

  • the keys= keyword, which expects a list of data-input keywords associated with an "element" of the complex list
  • the data= keyword, which contains the actual list "element" data in the order specified by the keys= keyword list

Let's look at a simple example now to see how this works.

For inputting a variable-length list of ATOM data i.e. VEC{ATOM}* data, the input is controlled by the ATOM data keywords.

atoms= {
   
     keys= { label= pos= }
   
     data= {
            O       0.000000    .000000     .000000
            H       1.107        1.436200     .0
            H       1.107      -1.436000     .0
     }
   
  }

Here they keys= is defined to be { label= pos= }.

label= and pos= are ATOM data-input keyword.

This means that in the data= { ... } block

  • the first column of the ATOM list data must be input for label= i.e. it must be an atom label
  • the second, third and fourth columns of the ATOM list data must be input for pos= i.e. it must be the x, y and z coordinates
  • Of course, you can reverse the label= and pos= keywords in the keys= list, but then column 1 (the labels) would have to appear at the end.
  • Although we have set the data out in columns, there's no real need to do that since white space and new lines are ignored (as always)

Easy!

Embedded commands inside keys=

Suppose we wanted to specify the atom coordinates in angstrom. We could do it like this:

atoms= {
   
     keys= { label= pos= }
   
     data= {
            O       0.000000    .000000     .0 angstrom
            H       1.107        1.436200     .0 angstrom
            H       1.107      -1.436000     .0 angstrom
     }
   
  }

However, this is very tedious - it involves editing every line in the same way.

To avoid this tedium, we can use the { units= angstrom } embedded command in the keys= string list, like this:

atoms= {
   
     keys= { label= { units= angstrom } pos= }
   
     data= {
            O       0.000000    .000000     .0 
            H       1.107        1.436200     .0 
            H       1.107      -1.436000     .0 
     }
   
  }


Now Tonto will execute this `embedded command' as if it appeared just before every pos= command.

What does that do?

Well, normally, if you use the units= angstrom command, Tonto assumes that the next numerical item (and only that item) is supposed to have those units.

So by embedding this command before every pos= the positions are all changed into angstrom units.

Embedded commands are very useful for

  • Changing units
  • Producing custom-designed tables
    • in collaboration with with the put_keys_table command (see below)
  • Specifying a new default for a block of data which itself contains within it other lists of complex data
    • in collaboration with the process_keys_once commands (see below)

junk= : getting rid of junk columns of data

Sometimes you have lists of data which you want to input but some columns are junk.

You can input such data into Tonto by using the junk= input keyword which simply reads the data value and discards it.

Consider the following example:

  atoms= {
  
      keys= { junk= label= junk= pos= junk= }
  
       data= {
       1       O1        8       21.209178        4.476293       11.273725           O:DZP
       2       O2        8       18.087301        6.384450       13.391037           O:DZP
       3       O3        8       13.595420        4.561130        5.497190           O:DZP
       4       O4        8       11.061654        2.181851        7.896120           O:DZP
       }
  }

The first, third and fifth columns are ignored.

altered_data= : altering data

Suppose after entering a list of data, you find that you want to alter or add some information to elements of the list.

Then you can use a new set of key= commands, and apply these to some or all of the previously entered list, using the altered_data= keyword:

atoms= {
  
      keys= { junk= label= junk= pos= junk= }
  
       data= {
       1       O1        8       21.209178        4.476293       11.273725           O:DZP
       2       O2        8       18.087301        6.384450       13.391037           O:DZP
       3       O3        8       13.595420        4.561130        5.497190           O:DZP
       4       O4        8       11.061654        2.181851        7.896120           O:DZP
      }             
                                                     
      keys= { thermal_tensor= }   
                                       
       altered_data= {                                                 
       1        0.017161        0.038243        0.022129        0.006034       -0.005022       -0.005719
       2        0.022339        0.028606        0.019686        0.003533       -0.001349       -0.004758
       3        0.021556        0.040304        0.018831       -0.005496       -0.003556        0.005948
       4        0.018787        0.040102        0.023193       -0.008297       -0.002893        0.003594
     }
}

Here, in the altered_data= portion of the input, new thermal_tensor= data is added to the four oxygen atoms.

The elements to be `altered' - to have the additional thermal_tensor information - are listed in the first column.

They are the first four elements: 1 2 3 4.

In summary, altererd_data= { ... } is useful for changing the data associated with some elements of a complex list.

append_data= : appending data

You can use this command just like data= { ... } to append more elements to the list.

Like altered_data= { ... } above, append_data= { ... } is useful if you change the keys= { .... } used for reading in the appended data.

Warning: if the list does not have uniform data content and availability, bugs may appear.

do : executing commands on the whole list

If you want to run a command on all list elements, you must use the do= syntax

  do { <some-command> ... }

For non-list input, only the command itself is needed

  <some-command>

The reason for this difference is that lists are set up for facile inputting via keys= and data= keywords.

process_keys : doing something to every list element

Suppose you have entered some complex list data and you want to do something to every element of that list.

For example, you might want to change the gaussian basis set for every ATOM in an atom list VEC{ATOM}*.

Then, you can use the process_keys command, like this:

  keys= { { basis_label= DZP } }
   
  process_keys

This example will execute the embedded command basis_label= DZP for every atom in the list, thus changing the basis set for every atom.

Of course, there are much better ways to do this e.g. using the basis_name= command!

If you use this command, you have to make sure that there is actually a list of data to be entered.

This command is, therefore, rarely used.

process_keys_once : doing something to a whole list, just once

Suppose you have entered some complex list data and you want to do something to whole list, just once.

For example, you might want to change the the order in which the keys for a list contained within the list, are processed.

Example

For example, suppose you want to enter a list of basis sets, which you have downloaded from the internet in some format.

Each basis set in the list-of-basis sets you downloaded is comprised of (among other things) a list of orbitals, made from a sum of basis functions.

Each basis function is an orbital or a particular `kind', e.g. 1S or 2P etc, made from a sum of Slater functions of a given angular momentum `quantum number', an exponent zeta, and a contraction coefficient.

You may want to change the order in which the data (orbital kind, angular momentum numbers exponents and coefficients) are entered into Tonto

Then, if you are reading in a list of slater-type atomic functions you can use the process_keys_once command, like this:

  {
 
     keys= { { shells= { keys= { l_chr= orb_kinds= n,z,c*= } } } }
  
     process_keys_once
  
     keys= { label= configuration= shells= { analyse_configuration } }
     
     Li:Thakkar
     1S(2)2S(1) !  2S                                                    
     {
     S                       { 1S             2S }
     {
     1       10.335672      0.0014270      0.0002728
     1        5.553473     -0.0514718     -0.0059858
     1        3.453336     -0.1471640     -0.0400549
     1        2.416445     -0.8105589     -0.0273618
     1        1.555772     -0.0077788     -0.1448297
     1        0.889955      0.0004098     -0.6053390
     2        0.637402     -0.0005029      0.5955827
     1        0.626614     -0.0002691      0.9979831
     }
     }
  
     ...
  }

This example is actually used in the basis_sets/Thakkar file to input slater-type atomic basis functions.

The embedded command in the first keys= { ... } string list ensures that the order for reading a SLATERSHELL basis function (activated by the shells= { ... } data block) has a particular order:

  • l_chr= ensures that the L-symbols comes first e.g. S, P, D, F, ...
  • orb_kinds= ensures that a string list of orbital basis function labels comes next e.g. {1s 2S ... }
  • n,z,c*= ensures that columns of the n quantum number, zeta exponent of the slater, and the contraction coefficients of each basis function occur next.

The second keys= { ... } string list is more normal.

It ensures that for the following list of Slater basis sets

  • The symmetry label= of Slater orbitals is read in
  • The configuration= is read in e.g. 1S(2)2S(1) for Lithium
  • The list of shells= is read in (according to the order just specified)
  • Finally, the Slater basis set in `analysed' using an embedded command { analyse_configuration }.

put_keys_table : dynamically making your own tables from list data:

For some list data, you can dynamically print out your own tables.

The way this works is that

  • You define the keys= { ... } to have commands to output the things you are interested in
  • issue a put_keys_table command

Here's an example, for printing out custom REFLECTION data tables from a data block involving a list of reflections, VEC{REFLECTION}*

  reflections= {
      ...
      ! Input data
      ...
      keys= { put_indices put_F_calc put_F_exp flush }
      put_keys_table
  }

This will produce a table with h, k, l indices followed by calculated (complex) structure factors, followed by the experimental structure factor magnitude.

This requires that in the reflection module

  • The appropriate put routines are defined
  • The `width' of these put routines is defined in table_width routine

It also requires that in the VEC{REFLECTION} module

  • The process_list_keywords, and
  • The put_keys_table have been defined

These are defined by inheriting from the VEC{OBJECT} virtual module which describes abstract lists of objects.

It's very easy to make this available for your users, if you are a programmer.

Commands

Command keywords

At the end of a previous example we saw the use of an scf command keyword.

  scfdata= {
     initial_density= promolecule
     kind= rhf
     convergence= 0.00001
     diis= { convergence_tolerance= 0.00001 }
  } 
    
  scf

When you issue a command keyword, such as the scf command, all the data required to execute the command should have been entered.

For example, for the scf command, you should have at least specified scf_data= { ... } to tell Tonto exactly what type of scf you want done, what type of initial guess you want, and so on.

For the scf command, you should also have entered atom coordinates: you can't do an scf calculation without a molecule!

An scf calculation also requires a basis set (at least, in Tonto).

To find a basis set, you might have to specified a basis_directory for Tonto to look in.

Of course, you need to read the documentation for a command keyword to see exactly what data is required.

However, if you forget to enter some required data for a command, Tonto will usually (hopefully always) tell you nicely what's missing.

If not, you will see a crash!

What keywords are available?

Here is some very useful information for learning about new keywords without referring to the wiki.

Fing keywords with a question mark (?)

If you place a question mark at any point in the input file (except in a list-style data input section) you will get a list of allowed keywords that may be used at that point.

For example, suppose you forget what keywords you are allowed to use in the spacegroup section.

Try putting in a question mark like this

  {
     name= urea
  
     crystal= {
           spacegroup= { ? }
     }
  
  }

Executing the tonto executable will give you the following

  Error in routine SPACEGROUP:process_keyword ... unknown option ?
  
  File name   = stdin
  Line number =    5
  File buffer =       spacegroup= { ? }
  Cursor ---------------------------^
  
  Allowed keyword options:
  
     }
     analyse
     hall_symbol=
     hermann_mauguin_symbol=
     hm_symbol=
     it_symbol=
     jones_faithful_symbols=
     put

Nice, huh?

The allowed options are arranged alphabetically.

This really helps when you've many options, but doesn't matter much in the above example.

Of course, you have to have some idea of what these options actually are and how they might be used.

You need to check the documentation or read the referenced literature!

Finding keywords in the code

Another way to find the allowed keywords is to look in the source code.

Edit the module for which you want to know the keywords, and look for the process_keyword routine.

For example, if you wanted to know about the allowed keywords in the SPACEGROUP data block, look in the foofiles/spacegroup.foo module.

When you find the process_keyword routine in there, you will see the allowed keywords arranged there in a Fortran case statement

  process_keyword(keyword) ::: leaky
  ! Process a command "keyword". Data is inputted from "stdin", unless
  ! "word" is a sequence of blank separated strings. In this case,
  ! the sequence is processed as if it were a separate file.
     keyword :: STR
     word :: STR
     word = keyword
     word.to_lower_case
     select case (word)
        case ("}                      ")  ! exit case
        case ("analyse                "); .analyse
        case ("hall_symbol=           "); .read_Hall_symbol
        case ("hermann_mauguin_symbol="); .read_HM_symbol
        case ("hm_symbol=             "); .read_HM_symbol
        case ("it_symbol=             "); .read_IT_symbol
        case ("jones_faithful_symbols="); .read_jones_faithful_symbols
        case ("put                    "); .put 
        case  default ;        UNKNOWN(word) 
     end
  end

It's recommended to use vim to jump to the appropriate routines used to read the data in process_keywords.

For example, to see what hm_symbol= is expecting to read in, jump to the .read_HM_symbol routine and read its documentation:

  read_HM_symbol ::: leaky
  ! Read the Hermann-Mauguin symbol
     symbol :: STR
     stdin.read(symbol)
     .set_HM_symbol(symbol)
  end

Using vim, you can jump into the .set_HM_symbols routine to see what you should enter for the Hermann-Mauguin symbols.

In case you don't already know.