| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PDB_EXTRACT is used to extract statistical information from the output files produced by the software for protein structural determination using Xray Crystallography and NMR method. PDB_EXTRACT merges the information into two mmCIF (macromolecular Crystallographic Information File) files, one with structure factors and one with coordinate and statistic. These two files are ready for PDB deposition. PDB_EXTRACT can work with/without ADIT to deposit complete data. When working with ADIT (recommended), users just upload the extracted mmCIF files to ADIT. Enter any additional information into ADIT and submit your file directly from there. When working without ADIT, users just fill any additional information to a plain text file (called data_template.text) and run the program. Users can email the extracted two files to the Protein Data Bank (PDB) "deposit@rcsb.rutgers.edu". The advantage of using PDB_EXTRACT:
IMPORTANT NOTES:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The source and binary versions of PDB_EXTRACT can be downloaded from the
address
http://deposit.pdb.org/software . The source is available under an
Open Source license. The binary distributions are available for Intel-Linux,
SGI-IRIX, DEC-Alpha, and Sun-Solaris.
The web interface can be accessed at http://pdb-extract.rutgers.edu PDB_EXTRACT has been integrated into CCP4 and the CCP4i interface(Version 5.0 and above). Users can run PDB_EXTRACT under the CCP4 environment. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
NOTE: if you have installed CCP4 (version 5.0 and above), you do not have to
install pdb_extract again. You just use the CCP4 environment to run the pdb_extract program.
Requirements: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It is recommended to install the binary distribution, since it is very fast
to install and it takes very small space. The binary distributions are
available for Intel-Linux, SGI-IRIX, DEC-Alpha, and Sun-Solaris.
Step 1. Uncompress and unbundle the distribution using the following command: zcat pdb-extract-vX.XXX-XXX.tar.gz | tar -xf - The result of this command is a subdirectory pdb-extract-vX.XXX-XXX in the current directory, which contains the following:
Step 2. Set up the environment variables. A. Define RCSBROOT environment variable to point to the installation directory.
Assuming that the installation directory is /home/username/pdb-extract-vX.XXX-XXX, execute in the shell: For C shell users:
For Bourne shell users:
B. Add "bin" subdirectory to the PATH environment variable. Assuming the installation directory is /home/username/pdb-extract-vX.XXX-XXX, For C shell users:
For Bourne shell users:
Step 3. Make binary data from ASCII data Position in the pdb-extract-vX.XXX-XXX/etc directory and run the script binary.sh: cd pdb-extract-vX.XXX-XXX/etc
This command will create certain binary data files, using the ASCII data files in data/ascii directory. The resulting files are stored in data/binary directory. Note that it may take several minutes for this step to complete. This step must be executed before the tool can be utilized. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Step 1. Uncompress and unbundle the distribution using the following command:
zcat pdb-extract-vX.XXX-XXX.tar.gz | tar -xf - The result of this command is a subdirectory pdb-extract-vX.XXX-XXX in the current directory. It contains subdirectories of various source modules and the following items important for the user:
Step 2. Set up the environment variables. A. Define RCSBROOT environment variable to point to the installation directory. Assuming that the installation directory is /home/username/pdb-extract-vX.XXX-XXX, execute in the shell:
For C shell users:
For Bourne shell users:
B. Add "bin" subdirectory to the PATH environment variable. Assuming that the installation directory is /home/username/pdb-extract-vX.XXX-XXX, For C shell users:
For Bourne shell users:
Step 3. Building the Application (compile the program) Position in the pdb-extract-vX.XXX-XXX directory and run "make" command:
cd pdb-extract-vX.XXX-XXX
The application executables will be placed in the "bin" subdirectory. NOTE: The users who are working on Sun platform are advised to check the compiler flags in etc/make.platform.sunos5 file. Depending on the compiler version, users may be required to make modifications to those compiler flags. Step 4. Make binary data from ASCII data Position in the pdb-extract-vX.XXX-XXX directory and run "make" command as follows: make binary NOTE: This command will create certain binary data files, using the ASCII data files in data/ascii directory. The resulting files are stored in data/binary directory. Note that it may take several minutes for this step to complete. This step must be executed before the tool can be utilized. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There is an example included in this distribution. User can find more
examples from the
location
This example is located in the subdirectory of "pdb-extract-vX.X/examples/Example_1". The directory contains the following:
To execute the example, position in the appropriate directory and invoke test.sh and test_script.sh scripts. cd pdb-extract-vX.XXX-XXX/pdb-extract-vX.X/examples/Example_1 A. Run the scripts test.sh All the Unix commands were included in the script file test.sh. ./test.sh B. Run the scripts test_script.sh The script for test_script.sh is an alternative way to obtain the same result as above. It is also a combination of various programs. The difference is that it used the component extract instead of the pdb_extract and pdb_extract_sf. All the information is included in the file log_script.inp. ./test_script.sh Please click me to see the script files and the explanations of arguments of input/output. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There are four ways to extract crystallographic information and deposit complete data to the Protein Data Bank.
The four interfaces have different features. For example, The CCP4i or Web interface provide a simple graphic interface. Users only select the program name and output file names to do the job. The full Unix command line method provides the greatest flexibility. User need to read the command options to run the program. The script input method provides a simple local interface. Here, we give a concrete example to show how to use PDB_EXTRACT for complete data extraction. In this example, the experimental method for solving the protein structure was multiple anomalous diffraction (MAD). The information for the experiment is as the following:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Step 1. From the main window of CCP4i, select the Data Harvesting Management Tool option. Step 2. From the option of Run program to select the Extract additional information for deposition Step 3. Select the Generate a data template filefrom various steps Type (or select using browse) in the yellow boxes either the PDB or mmCIF file name obtained from the final structure refinement and the output file name. In this case, the output coordinate file is refmac.pdb. Run the pdb_extract program to obtain the data template file. Edit this file according to the instruction in the text file. Step 4. Select the Generate a complete mmCIF file for PDB deposition from various steps Select program names and log file names generated from the selected programs.
Run the PDB_EXTRACT program to obtain a complete data in mmCIF format. The final output file can be uploaded to ADIT for on line structure validation and submission. NOTE: The characters of file name should always start from beginning of each yellow box. There should be no white space in each box, even no file name is typed in. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Follow on line tutorial | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
STEP 1. Obtain the template data file data_template.text using the command
extract -pdb coordinate_PDB_file_name &sp &sp &sp (if PDB format)
extract -cif coordinate_CIF_file_name &sp &sp &sp (if mmCIF format) After running the program, you will get a file called data_template.text. There are 19 CATEGORIES for data entries in the file. CATEGORY 1-2 contains the extracted unit cell parameters and the unique molecular chemical sequence group. Please modify the two CATEGORIES as necessary. Additional structure information can be filled into CATEGORIES (3-19). The content of the data template file data_template.text is given in Appendix STEP 2. Obtain coordinates and all the statistics Run the pdb_extract program: pdb_extract -e MAD -p SOLVE -iLOG solve.prt -d RESOLVE -iLOG resolve.log -r refmac5 -icif peak.refmac -ipdb refmac.pdb -s HKL -iLOG scale-refine.log -sp HKL scale1.log scale2.log scale3.log -iENT date_template.text -o output.cif STEP 3. Obtain structure factors Run pdb_extract_sf to convert HKL format to mmCIF format and merge all the files to one file. pdb_extract_sf -rt F -rp refmac5 -idat refmac_sf.mmcif -dt I -dp HKL -c 1 -w 1 -idat scale1.sca -c 1 -w 2 -idat scale2.sca -c 1 -w 3 -idat scale3.sca -o output_sf.cif Since REFMAC5 use MTZ file for refinement, the reflection data file must be converted to CIF format using ccp4i or mtz2various. The file name is refmac_sf.mmcif. The output file (output_sf.cif) contains one reflection data block for refinement and one data block for protein phasing. STEP 4. Validation and deposition Note: the PDB-Validation program is not provided in CCP4 package. Either upload the two files (output.cif, output_sf.cif) to ADIT or use the following commands: validation-v8 -f output.cif -o 2 -public -exchange maxit-v8.01-O -i output.cif -o 8 -exchange_in -exchange_out -keep_contact_author Then upload the validated file and the structure factor file to ADIT for on-line submission. The alternative is to email the file to deposit@rcsb.rutgers.edu | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
STEP 1. obtain the plain text file log_script.inp
extract -pdb refmac.pdb You will get one script file called log_script.inp and one data template file data_template.text. Edit the data template file according to the instruction in the file. Fill all the Log file names and the program names as well as the data_template file to the script file log_script.inp. The content of the file log_script.inp is shown in the Appendix STEP 2. run the program: extract -ext log_script.inp You will get the same results as using the Unix command line option. STEP 3 Validation and deposition: (same as in the Unix command line option). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Listed below are the programs used from data collection to structure determination. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This section is used to collect statistical information from the LOG files generated by the programs for Data Scaling/Merging/Averaging. Important: The log files must be generated from the LAST (or BEST) trial which corresponds to the files used for phasing or molecular replacement.
The extracted information may be the following: * Intensities (or amplitude) and standard deviations * Data completeness (overall, resolution shells) * Redundancy (overall, resolution shells), mosaicity * R-merge, R-sym (overall, resolution shells) * average(I/sigma), (overall, resolution shells) * Total and unique reflections collected. * Resolution range   Some helpful hints for getting LOG files from the program of Data Scaling/Merging/Averaging Using HKL/HKL2000/scalepack
HKL (or HKL2000 or Scalepack) is a package by Otwinowski for data
collection/reduction/scaling.
You can use the graphical interface or the scalepack script to scale your
data. The LOG file (e.g. scale1.log) contains statistics for
PDB deposition.
Using D*trek
D*trek is a package by Jim Pflugrath at Rigaku/MSC for data collection/reduction/scaling.
You can use the graphical interface to scale (or merge/average) your
data. The LOG file (e.g. scale1.log) containing statistics is from the step of scaling data.
Using SAINT
SAINT is a package by Bruker (Siemens Molecular Analytical Research Tool)
for data collection/reduction/scaling. The LOG file (e.g.
scale1.ls) containing statistics is from the step of scaling data.
Using SCALA
SCALA is the CCP4 supported program. It scales together multiple
observations of reflections. SCALA generates
mmCIF or LOG file containing useful statistics. When you run the programs,
you must ask the program to export the data harvest file (mmCIF type). The
mmCIF file will be name.scala or name.truncate. Otherwise, it will generate LOG file.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This section is used to collect key statistical information from Molecular Replacement. You may first generate a LOG file from the rotation function, then generate a LOG file from the translation function. You can upload the two LOG files into this section for data extraction. You can also upload one LOG file which is generated from MR. Important: The log files must be generated from the LAST (or BEST) trial which corresponds to the files used for density modification or refinement.
The extracted information may be the following: * Low and high resolution used in rotation and translation. * Rotation and translation methods * Reflection cut off criteria, reflection completeness. * Correlation coefficients for I or F between observed and calculated. * R_factor, packing information, and model details.
Using CNS/CNX/XPLOR CNS can be used to do molecular replacement. After you finish the translation search, you can get a log file called translation.list which contains all the information of molecular replacement. Using Amore (CCP4) Amore is a program for molecular replacement. It is distributed in the CCP4 package. After rotation and translation search, you will generate two log files rotation.log and translation.log. You may extract information from both log files If you run the program in one script, you may generate one LOG file. Upload this LOG file to the web interface. Using Molrep(CCP4) Molrep is a program for molecular replacement. It is distributed in the CCP4 package. When you run the script, you can specify a LOG file name (e.g. molrep.log). All the statistic information will be recorded in the log file. Using EPMR EPMR is a Unix command line program for molecular replacement. When you run the program, please give a log file name like the following Epmr [options] files > epmr.log All the statisticial information will be written in the log file. Using Phaser Phaser was developed by Randy Read's group at the University of Cambridge. It is a program for phasing macromolecular crystal structures with maximum likelihood methods. The program generates a LOG file which can be uploaded to the web interface for data extraction. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Heavy atom phasing is performed at an earlier stage of structure determination. The log files generated from phasing contain important statistical information which should be deposited to the Protein Data Bank. From heavy atom phasing, you may have LOG files and heavy atom coordinate file.
The phasing methods are the followings: * MR molecular replacement. * SAD single anomalous dispersion. * MAD multiple anomalous dispersion. * SIR single isomorphous replacement. * SIRAS single isomorphous replacement with anomalous scattering. * MIR multiple isomorphous replacement. * MIRAS multiple isomorphous replacement with anomalous scattering. Important: The log files must be generated from the LAST (or BEST) trial which corresponds to the files used for density modification or refinement.
The following items may be extracted: * Wavelength, f_prime, f_double_prime, resolution range * FOM (acentric, centric, overall, resolution shells) * R-Cullis (acentric, centric, overall, resolution shells) * R-Kraut (acentric, centric, overall, resolution shells) * Phasing power (acentric, centric, overall, resolution shells) * Number of heavy atom sites, heavy atom type. * Heavy atom location method. * Heavy atom B-factor, occupancies, and xyz coordinates.
Using SOLVE (version 2.00 and above): SOLVE is a program for finding heavy atom location and refining heavy atom parameters. The statistical information is written to a file solve.prt (default name used by the program). The heavy atom coordinates are written to a file ha.pdb. Note: You may upload the two file names solve.prt (file type: LOG) and ha.pdb (file type: PDB).
Using CNS/CNX/XPLOR
CNS is a complete software system for protein crystallography. The scripts for heavy atom location and phasing refinement are mad_phase.inp or ir_phase.inp. When you run these scripts, you will get output files like phase_final.summary, phase_final.sdb or mad_phase.fp.
The output file phase_final.summary has all the phasing statistics. (Note: The refined heavy atom coordinates, B factors and occupancies can be found in a file like phase_final.sdb. If you prefer to convert to the PDB format, you can run the script sdb_to_pdb.inp. You will get a file phase_final.pdb with PDB format.) Note: You may input at most three files (as shown above) for extracting phase information.
Using MLPHARE (CCP4)
MLPHARE is a program in the CCP4 suite. It is used for refining heavy atom parameters. If you use the CCP4i graphical interface or the script mode, you need to ask the program to write a harvesting file. Select the data havest button, when you use the CCP4i interface. Do not use the key word NOHARV, when you use script. After you finished running this program, you will get a file (e.g. name.mlphare) which is in mmCIF format. It contains all the information for heavy atom phasing refinement. For extracting the wavelength information, you need to run program REVISE in the CCP4 (version 4.0-4.2.2). You may get a file (e.g. prephadata.log) Note: You may input at most two files (as shown above) for extracting phase information.
Using SHARP (version 1.3.x and 2.0 and above):
SHARP is a program for finding heavy atom positions and refining heavy atom parameters. When you run SHARP or autoSHARP, the log files which have useful information are normally in the directory sharpfiles/logfiles_local/dirs, where dirs are all the subdirectories for your various structures. Please note that the location of generated log files may depend on how the program is installed!
SHARP produces many output files.
For version 1.3.x: Heavy.pdb contains the heavy atom coordinates. FOMstats.html contains figure of merit statistics. Otherstat.html contains Rcullis, Rkraut, phasing power. For version 2.0 and above: Heavy.pdb contains the heavy atom coordinates. FOMstats.html contains figure of merit statistics. RCullis_?.html contains Rcullis. PhasingPower_?.html contains phasing power The easiest way to obtain these files is to run the program from the SUSHI interface. Review all the log files from the internet browser and save the files as plain text files. Note: You may input at most four files (as shown above) for extracting phase information.
Using SnB (version 2.0 and above):
SnB has no heavy atom parameter refinement, and it has no corresponding statistics. SnB gives the heavy atom or substructure coordinates (e.g. heavy.pdb) in PDB format. Note: You may input only one file (as shown above) for phasing extraction.
Using BnP (version 0.93 and above):
BnP is a combination of program SnB and Phases. The heavy atom positions are located by SnB and the heavy atom parameters will be refined by Phases. The log file (e.g. auto.log) can be found from the directory ~/PHASES/*. Log file normally contains phasing power for each phasing set. The file is in LOG format. Note: You may input at most one file (as shown above) for extracting phase information.
Using SHELXD or SHELXS (version 97):
Heavy atom or substructure coordinates are produced in PDB format (e.g. heavy.pdb). Note: You may input at most one file (as shown above) for extracting phase information.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Density modification is normally performed after obtaining phases. If you do density modification in your structure determination, statistics information is needed for PDB deposition. If density modification is not done in a separate step, you may skip this step, since you do not have a log file specifically for density modification. Important: The log files must be generated from the LAST (or BEST) trial which corresponds to the file used for refinement.
The following items may be extracted: * Density modification method. * FOM after density modification (overall, resolution shells) * Solvent mask determination method. * Structure solution software.
Using RESOLVE (version 2.00 and above):
RESOLVE is a density modification program in the SOLVE/RESOLVE package. Normally it runs together with SOLVE, but one can run it separately. When you run RESOLVE, you will get a log file like resolve.log. Only one log file (resolve.log) is needed for extraction. File type is LOG. Using CNS/CNX/XPLOR
The CNS user may need to run the input script like density_modify.inp. You will get a log file called density_modify.list. Only one log file (density_modify.list) is needed for extraction. File type is LOG.
Using DM (CCP4)
DM is a density modification program in the CCP4 suit. When you run DM either by using the CCP4i graphic interface or the script, you will get a log file like dm.log. Only one log file (dm.log) is needed for extraction. File type is LOG. Using SOLOMON (CCP4)
SOLOMON is also a another density modification program in the CCP4 suite. When you run DM either by using the CCP4i graphic interface or the script, you will get a log file like Solomon.log. Only one log file (Solomon.log) is needed for extraction. File type is LOG.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Structure refinement is performed at the end of structure determination. The atom coordinates are generated in PDB or mmCIF format and the statistics are generated in log files. The pdb_extract program is applied to extract statistical information: Since statistics can be carried at the header section of PDB file, you may not provide any LOG files for some programs like CNS, REFMAC5. Important: The log file and the coordinate file must be generated from the LAST (or BEST) trial which corresponds to the file that is used for deposition to the PDB.
The following items may be extracted: * Resolution range (highest res. shell) * Number of reflections used in refinement, and in R-Free set. * R-factor (overall, resolution shells) * Number of atoms refined * Cell parameters and space group. * The xyz coordinates of all the atoms. * RMS Bond Distances, Bond Angles, Chiral Volume, Torsion Angles * Isotropic temperature factor restraints * Non-crystallographic symmetry restraints * Solvent model used * Overall Average Isotropic B Factor * Overall Anisotropic B Factor * Overall Isotropic B Factor * Topology/parameter data used to refine deposited model * Refinement software
Using REFMAC5 (CCP4):
REFMAC5 is a program for structure refinement used in the CCP4 suite. If you run this program using CCP4i or the script, you can get a PDB file with all the refinement information at the header section. You may directly deposit this PDB file.
Using CNS/CNX/XPLOR
CNS/CNX/XPLOR is a program for final structure refinement. It exports coordinate file in both PDB and mmCIF format. You need the script deposit_mmcif.inp to generate the mmCIF format. The mmCIF file carries more statistical information than the PDB file. Authors are encouraged to deposit the mmCIF file, otherwise authors may need to manually fill in more information. You may not have to give any LOG file generated from CNS/CNX/XPLOR.
Using SHELXL (version 97):
SHELXL is a sub_program in the SHELX package. It is used for structure refinement. After you finish structure refinement, you need to run the shelxpro interactive program and use option B. After going through the shelxpro, you will get a PDB file (e.g. name.pdb) with header information.
Using TNT (version 5f):
TNT is a crystal structure refinement program. Data from this program can be extracted from the output PDB file and some LOG files. You can use the to_pdb command to convert coordinates in TNT format (name.cor) to the PDB format (name.pdb). The command is: to_pdb name.cor After finishing refinement, you must use command rfactor to generate a log file (e.g. rfactor.log) which contains the refinement statistics. The command is: rfactor name.cor > rfactor.log To extract the symmetry information, user must provide the symmetry file (e.g. p6122.dat). This information is in the control file name.tnt
Using ARP/wARP:
ARP/wARP is a automatic program for model building and refinement. REFMAC5 is used for the structure refinement step. The new version (6.0 or above) can use CCP4i as graphic interface. You can run this program either by CCP4i or by using script. You will get a log file (for example warpNtrace_refine.log). You also get a PDB file like warpNtrace.pdb. Note: If the coordinate file warpNtrace.pdb is directly used for deposition, you can use this option. Otherwise, use other program for final refinement.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There are three executable components (pdb_extract, pdb_extract_sf, extract) for the program. Argument description for the programs is given in details bellow. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PROGRAM DESCRIPTION: PDB_EXTRACT is used to extract statistical information from the output files produced by the software for protein structural determination using Xray Crystallography and NMR method. PDB_EXTRACT merges the information into two mmCIF (macromolecular Crystallographic Information File) files, one with structure factors and one with coordinate and statistic. These two files are ready for PDB deposition. User can get help by typing 'pdb_extract -h' or 'pdb_extract -help' to get information how to do extractions and deposition to PDB EXECUTABLE NAME: pdb_extract SYNOPSIS: pdb_extract [OPTIONs]... [FILEs]... OPTIONS: -o -e -m -p -d -r -s -sp -ipdb -ilog -icif -ient -idat OPTION DESCRIPTION: -o Followed by a given output file name. For example: -o outfile.mmcif NOTE: if you do not give this description, the default output file name (pdb_extract.mmcif) will be used. -e Followed by one of the following experimental methods: The phasing methods are the followings: * MR molecular replacement. * SAD single anomalous dispersion. * MAD multiple anomalous dispersion. * SIR single isomorphous replacement. * SIRAS single isomorphous replacement with anomalous scattering. * MIR multiple isomorphous replacement. * MIRAS multiple isomorphous replacement with anomalous scattering. example: -e MAD Note: If your structure was solved by combinations of above methods (e.g. MR with MAD), you may extract things from both methods (e.g. -e MR -m program_mr -ilog Log_file -e MAD -p program_mad -ilog file_name) -m Followed by the one of following programs for molecular replacement: CNS/CNX/XPLOR Amore EPMR MOREP Phaser For example: -m amore -p Followed by the one of following program names for phasing: CNS/CNX/XPLOR MLPHARE SOLVE SHARP SHELXS SHELXD SnB BnP PHASES For example: -p CNS Note: if the program that you used for phasing is not in the above list, you may still give the program name. Some information (like heavy atom coordinates) may still be extracted, if the produced file is in PDB or mmCIF format. -d Followed by the one of following program names for density modification: CNS/CNX/XPLOR DM SOLOMON RESOLVE SHELXE For example: -d CNS -r Followed by one of the following program names for final structure refinement. CNS/CNX/XPLOR REFMAC5 RESTRAIN SHELXL TNT PHENIX WARP For example: -r CNS Note: if the program that you used for final structure refinement is not in the above list, you may still give the program name. Some information (like atom coordinates) may still be extracted, if the produced file is in PDB or CIF format. (use -r program_name ) -s Followed by one of the following programs for data scaling (for refinement): HKL/HKL2000/SCALEPACK SCALA D*trek SAINT XSCALE 3DCALE For example: -s HKL Note: The -s option here is used to get statistics from data reduction. The reflection data must be used to the final structure refinement. -sp Followed by one of the following programs for data scaling (for refinement): HKL/HKL2000/SCALEPACK SCALA D*trek SAINT XSCALE 3DCALE For example: -sp HKL Note: The option is similar to -s, but it is used to extract statistics from multiple data reductions. The reflection data sets must be used to protein phasing solutions (SAD, MAD, SIR, MIR ,SIRAS, MIRAS). Normally, there are multiple data sets. -iPDB Followed by a input file with PDB format. For example: -iPDB test1.pdb Note: The PDB files are usually generated from heavy atom phasing (heavy atom coordinates) or the final structure refinement. -iCIF Followed by a input file with CIF format. For example: -iCIF deposit_cns.cif Note: This file can be produced during crystal structural determination. For instance: if you use MLPHARE for locating heavy atom position and do heavy atom phasing refinement, a file in mmCIF format will be generated. This file will contain statistics for heavy atom phasing. Another instance, if you use CNS for final structure refinement, running the deposit.inp macro will produce a CIF file containing the model coordinates and refinement statistics. -iLOG Followed by one or more input LOG files For example: -iLOG mad_sdb.dat mad_summary.dat Note: Log files are usually generated during crystal structural determination. The format depends on the program used. They may contain phasing statistics or heavy atom coordinates. For instance, when people use CNS for heavy atom phasing, they will generate a file (e.g. mad_sdb.dat) which contains the heavy atom coordinates and a file (e.g. mad_summary.dat) which contains phase refinement statistics. -iENT Followed by the either an mmCIF file or the data_template.text For example: -iENT data_template.text Note: The file data_template.text must be generated by the program extract using the command 'extract -pdb coordinate_file'. It contains the full chemical sequence and related information to be filled for each macromolecule in the solved structure. The file is shown in Appendix -idat Followed by reflection data used for refinement. For example: -idat reflection_data_file Note: This option is very special. It can be used ONLY with HKL/Scalepack output file. HKL/SCALEPACK does not export the average I/SimgaI (overall and with resolution shells), but the items are required for PDB deposition. pdb_extract can calculate them for you when providing the data for refinement. The -s and -idat must be used together (for example: -s program_name_scaling iLOG log_file -idat reflection_data_file ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
you can extract statistics separately from each step of structure determination applications (data processing, heavy atom phasing, density modification, molecular replacement and final structure refinement), or you can put all the steps together, which is a complete deposition.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PROGRAM DESCRIPTION: This program can be used to capture
EXECUTABLE NAME: pdb_extract_sf SYNOPSIS: pdb_extract_sf [OPTIONs]... [FILEs]... ARGUMENT DESCRIPTION: (-o -rt -rp -dt -dp -c -w -idat ) -o Followed by an output file name. For example: -o outfile.cif NOTE: if you do not specify an output file, a default output file name (pdb_extract- _sf.mmcif) will be used. -dt Followed by data type for initial data processing (normally intensity). It is followed by F (Amplitude) or I (Intensity) For example: -dt I -dp Data format for initial data processing. It is followed by one of the following program names: HKL/SCALEPACK, DTREK, SAINT, XPREP, XSCALE,3DSCALE, SCALA, OTHER. For example: -dp HKL -c crystal index. It is followed by crystal number (integers, like 1,2,3, ..) Example: -c 2 (It means the reflection was from the second crystal). -w wavelength index. It is followed by wavelength number (integers, like 1, 2, 3) Example: -w 2 (This means the data was collected from the crystal using the second wavelength. This is MAD case). -idat reflection data file It is followed by data file name Example: -idat scalepack.sca NOTE: You should always give the combination ' -c i, -w j -idat file_name ' in the right order! Here i is the crystal index, j is wavelength index, and file_name is the file name containing the reflections. -rt data type used for final structure refinement. It is followed by F (Amplitude) or I (Intensity) For example: -dt F -rp data format in the final structure refinement. It is followed by one of the data format names: CNS/CNX/XPLOR, SHELX, TNT, HKL/SCALEPACK, DTREK, SAINT, XPREP, XSCALE,3DSCALE, SCALA, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Extracting reflection data used for final structure refinement:
NOTE: Normally, there is only one data set. If you have several data set used for final refinement,you need to merge all the data in one file.
Extracting reflection data from initial data process (e.g. scaling ...):
NOTE: Normally, there are several data sets (e.g. in MAD, MIR ...). These reflections are used for protein phasing. The formats are from the initial data process.
Converting all the reflection data in one mmCIF file (just combine the above two steps):
The output_file_name contains the reflections for refinement and the reflections for protein phasing. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PROGRAM DESCRIPTION:
This program can be used to do the following: 1. Generate the plain text file (data_template.text) used with pdb_extract. Generated the plain text file (log_script.inp) used by extract itself. 2. Run all the commands in the script input file (log_script.inp). EXECUTABLE NAME: extract SYNOPSIS: extract [OPTIONs] [FILE] ARGUMENT OPTIONS: (-pdb -cif -ext ) -pdb Followed by the coordinate PDB file name example -pdb pdb_file_name NOTE: it will generate two plain text files (data_template.text and log_script.inp) with the chemical sequences extracted from the coordinate PDB file. -cif Followed by the coordinate mmCIF file name example -cif mmCIF_file_name NOTE: it will generate two plain text files (data_template.text and log_script.inp) with the chemical sequences extracted from the coordinate mmCIF file. -ext Followed by the generated file log_script.inp example -ext log_script.inp NOTE: you just fill the LOG files and some additional information to the log_script.inp. Then use ' extract ext log_script.inp ' to get a complete mmCIF file and structure file. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Obtain the input file entity-poly.inp extract -pdb pdb_file_name or extract -cif cif_file_name Get a complete mmCIF file for deposition extract -ext log_script.inp | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Below are the two Tables. One is for all the Unix command options and the other is for the software supported by PDB_EXTRACT. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
The script file test.sh:
| #!/bin/sh # use pdb_extract to extract the required statistics and get a mmcif file. pdb_extract -e MAD -r CNS -iCIF input_data/deposit_cns.mmcif \ -iENT input_data/data_template.text \ -p CNS -iLOG input_data/mad_sdb.dat input_data/mad_summary.dat \ input_data/mad_fp.dat \ -d CNS -iLOG input_data/density_modify.dat -o Example_1.cif # use pdb_extract_sf to convert the structure factor to mmCIF format. pdb_extract_sf -dt I -dp HKL -c 1 -w 1 -idat input_data/w1.sca \ -c 1 -w 2 -idat input_data/w2.sca -c 1 -w 3 -idat input_data/w3.sca -o Example_1.sf.cif # use validation-v8 to validate the mmcif file. validation-v8 -f Example_1.cif -o 2 -public -exchange -adit # use maxit-v8.01-O to reorder the mmcif format maxit-v8.01-O -i Example_1.cif -o 8 -exchange_in -exchange_out # move the files to some directory and delete some log files. mv Example_1.cif.cif Example_1_deposit.cif mv Example_1.cif deposit mv Example_1_deposit.cif deposit mv Example_1.sf.cif deposit mv Example_1* validation_result rm -f *log *err procheck* SEQUENCE.DAT *ERR validation.alignmentThe alternative script file test_script.sh: #!/bin/sh # use extract to run everything in example_1.inp and get a mmcif file. extract -ext input_data/log_script.inp # use validation-v8 to validate the mmcif file. validation-v8 -f script_example_1.cif -o 2 -public -exchange # use maxit-v8.01-O to reorder the mmcif format maxit-v8.01-O -i script_example_1.cif -o 8 -exchange_in -exchange_out # move the files to some directory and delete some log files. mv script_example_1.cif.cif script_example_1_deposit.cif mv script_example_1.cif deposit/ mv script_example_1_deposit.cif deposit/ mv script_example_1_sf.cif deposit/ mv *.html *.ps *.letter *.wrl validation_result/ rm -f *log *err procheck* SEQUENCE.DAT *ERR validation.alignmentThe output files: After you run the above commands (for example ./test.sh), you will get the following files in the directory pdb-extract-vX.X/examples/Example_1/deposit/
You can deposit the two files Example_1.sf.cif and either Example_1.cif or Example_1_deposit.cif The files generated by the validation program are in the directory. pdb-extract-vX.X/examples/Example_1/validation_result
User should read this file (Example_1.letter) with care and correct any geometrical errors before submission of the structure. The input files: MAD experiment · Phasing calculation by program CNS (version 1.1). · Density modification by program CNS (version 1.1). · Final structure refinement by program CNS (version 1.1). Data files: · pdb-extract-vX.X /examples/Example_1/input_data/mad_sdb.dat o File format: CNS log format. o File source: run CNS (mad_phase.inp) o Data to be extracted: heavy atom coordinates, B factors, etc. · pdb-extract-vX.X /examples/Example_1/input_data/mad_summary.dat o File format: CNS log format. o File source: run CNS (mad_phase.inp) o Data to be extracted: all the phasing statistics · pdb-extract-vX.X /examples/Example_1/input_data/mad_fp.dat o File format: CNS log format. o File source: run CNS (mad_phase.inp) o Data to be extracted: wavelengths, f_prime, f_double_prime. · pdb-extract-vX.X /examples/Example_1/input_data/density_modify.dat o File format: CNS log format. o File source: run CNS (fourier_map_dm.inp) o Data to be extracted: FOM after density modification, dm method · pdb-extract-vX.X /examples/Example_1/input_data/deposit_cns.mmcif o File format: mmCIF o File source: run CNS (deposit_mmcif.inp) o Data to be extracted: the atom coordinates and B factors and structure refinement statistics. · pdb-extract-vX.X /examples/Example_1/input_data/data_template.text o File format: mmCIF o File source: Generated by ' extract -pdb pdb_file_name'. o Data to be extracted: a complete chemical sequence.
|
| ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ THE DATA_TEMPLATE.TEXT FILE ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ NOTES AND REMINDER The data template file contains data entries for unique chemical sequences present in the structure and other non-electronically captured information. PLEASE CHECK CATEGORIES 1 & 2: Before proceeding any further, make necessary corrections here so that all information in these categories are complete and correct. You may choose to fill in CATEGORIES (3-19) either here or later in ADIT. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ GUIDELINES FOR USING THIS FILE 1. Only strings included between the 'lesser than' and 'greater than' signs (<.....>) will be parsed for evaluation by the program. Therefore, DO NOT write either on the left or right of the 'less than' and 'greater than' signs respectively. 2. All alphanumeric values or strings that you include in the different categories should be within double-quotes. Blank spaces or carriage returns within a pair of double quotes are ignored by the program. DO NOT use double quotes (") within strings that you enter. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~START INPUT DATA BELOW~~~~~~~~~~~~~~~~~~~~~~~ ================CATEGORY 1: Crystallographic Data======================= Enter crystallographic data <space_group = "?"> (use International Table conventions) <space_group_number = "? "> <unit_cell_a = " 73.530 " > <unit_cell_b = " 39.060 " > <unit_cell_c = " 23.150 " > <unit_cell_alpha = " 90.00 " > <unit_cell_beta = " 90.00 " > <unit_cell_gamma = " 90.00 " > ================CATEGORY 2: Sequence Information ======================= Enter one letter sequence for each polymeric entity in asymmetric unit -------------------------------------------------------------------------- SOME DEFINITIONS An ENTITY is defined as any unique molecule present in the asymmetric unit. Each unique biological polymer (protein or nucleic acids) in the structure is considered an entity. Thus, if there are five copies of a single protein in the asymmetric unit, the molecular entity is still only one. Water and non-polymers like ions, ligands and sugars are also entities. Here we only consider the sequences of polymeric entities (protein or nucleic acid). GUIDELINES FOR COMPLETING THIS CATEGORY * In a PDB or mmCIF format file, all residues of a single polymeric entity should have one chain ID. Multiple copies of the same entity should each be assigned a unique chain ID. The multiple chain IDs should be separated by commas as 'A,B,C,...'. If incorrect chain IDs are used the entity groups extracted by this program will not be correct. To avoid this, make necessary corrections in the PDB or mmCIF file used to generate the data_template file and regenerate the data_template.text file. Alternatively, edit the extracted sequence in this file to correctly represent the sequence and chain IDs of each polymeric entity. * In addition to chain IDs, this program uses distance geometry to asses if there are any breaks in the polymer sequence. These breaks may occur due to missing residues (not included in the model due to missing electron density) or due to poor geometry. Four question marks '????' are used to denote these chain breaks. Replace these question marks with the sequence of residues missing from the coordinates. Also add any residues missing from the N- and/or C-termini here. * If there are non-standard residues in the coordinates, this program lists them according to the three letter code used in the coordinate file as (ABC). If all the residues in your sequence are nonstandard, check and edit the sequence manually to represent it correctly in this file. * If any residue was modeled as Ala or Gly due to lack of the side-chain density, the sequence extracted here will represent them as A or G respectively. Correct this to the original sequence that was present in the crystal. ---------------------------------------------------------------------------- Below is the one letter chemical sequence extracted from your PDB coordinate file. The molecular entities are grouped and listed together. PLEASE CHECK THE SEQUENCE of each entity carefully and modify it, as necessary. Make sure that you REVIEW THE FOLLOWING: * chain breaks due to missing residues, * missing residues in the N- and/or C-termini, * non-standard residues and * cases of residues modeled as Ala or Gly due to missing side-chain density. <molecule_entity_id="1" > <molecule_entity_type="polypeptide(L)" > <molecule_one_letter_sequence=" QPRRKLCILHRNPGRCYDKIPAFYYNQKKKQCERFDWSGCGGNSNRFKTIEECRRTCIG" > < molecule_chain_id="A" > < target_DB_id=" " > <molecule_entity_id=" " > <molecule_entity_type=" " > <molecule_one_letter_sequence=" " > <target_DB_id=" " > <molecule_chain_id=" " > ================CATEGORY 3: Contact Authors============================= Enter information about the contact authors. Information about the Principal investigator (PI) should be given. For principal investigator <contact_author_PI_name = " "> (Surname, F.M.) <contact_author_PI_email = " "> <contact_author_PI_phone = " "> <contact_author_PI_fax = " "> <contact_author_PI_address = " "> For other contact authors <contact_author_name_1 = " "> <contact_author_email_1 = " "> <contact_author_phone_1 = " "> <contact_author_fax_1 = " "> <contact_author_address_1 = " "> <contact_author_name_2 = " "> <contact_author_email_2 = " "> <contact_author_phone_2 = " "> <contact_author_fax_2 = " "> <contact_author_address_2 = " "> ...(add more if needed)... ================CATEGORY 4: Release Status============================== Enter release status for the coordinates, constraints and sequence Status should be chosen from one of the following: (release now, hold for publication, hold for 6 weeks, hold for 6 months, hold for 1 year) <Release_status_for_coordinates = " "> <Release_status_for_structure_factor = " "> <Release_status_for_sequence = " "> ================CATEGORY 5: Title======================================= Enter the title for the structure <structure_title = " "> ================CATEGORY 6: Citation Authors============================ Enter citation authors (e.g. Surname, F.M.) The primary citation is the article in which the deposited coordinates were first reported. Other related citations may also be provided. For the primary citation <primary_citation_author_name_1 = " "> <primary_citation_author_name_2 = " "> <primary_citation_author_name_3 = " "> <primary_citation_author_name_4 = " "> <primary_citation_author_name_5 = " "> ...add more if needed... For other related citations (if applicable) <citation_1_author_name_1 = " "> <citation_1_author_name_2 = " "> <citation_1_author_name_3 = " "> <citation_1_author_name_4 = " "> <citation_1_author_name_5 = " "> ...add more if needed... <citation_2_author_name_1 = " "> <citation_2_author_name_2 = " "> <citation_2_author_name_3 = " "> <citation_2_author_name_4 = " "> <citation_2_author_name_5 = " "> ...add more if needed... ...(add more citations if needed)... ================CATEGORY 7: Citation Article============================ Enter citation article (journal, title, year, volume, page) If the citation has not yet been published, use 'To be published' for the category 'journal_abbrev'. The order of citations in this category should correspond to that is CATEGORY 6. For primary citation <primary_citation_journal_abbrev = " "> <primary_citation_title = " "> <primary_citation_year = " "> <primary_citation_journal_volume = " "> <primary_citation_page_first = " "> <primary_citation_page_last = " "> For other related citation (if applicable) <citation_1_journal_abbrev = " "> <citation_1_title = " "> <citation_1_year = " "> <citation_1_journal_volume = " "> <citation_1_page_first = " "> <citation_1_page_last = " "> <citation_2_journal_abbrev = " "> <citation_2_title = " "> <citation_2_year = " "> <citation_2_journal_volume = " "> <citation_2_page_first = " "> <citation_2_page_last = " "> ...(add more citations if needed)... ================CATEGORY 8: Molecule Names============================== Enter the name of the molecule for each entity The name of molecule should be obtained from the appropriate sequence database reference, if available. Otherwise the gene name or other common name of the entity may be used. e.g. HIV-1 integrase for protein RNA Hammerhead Ribozyme for RNA The number of entities should be the same as in CATEGORY 1. <molecule_name_1 = " "> (entity 1) <molecule_name_2 = " "> (entity 2) <molecule_name_3 = " "> (entity 3) ...(add more if needed)... ================CATEGORY 9: Molecule Details============================ Enter additional information about each entity Additional information would include details such as fragment name (if applicable), mutation, and E.C.number. For entity 1 <Molecular_entity_id_1 = " "> (e.g. 1, 2, ...) <Fragment_name_1 = " "> (e.g. ligand binding domain, hairpin) <Specific_mutation_1 = " "> (e.g. C280S) <Enzyme_Comission_number_1 = " "> (if known: e.g. 2.7.7.7) For entity 2 <Molecular_entity_id_2 = " "> <Fragment_name_2 = " "> <Specific_mutation_2 = " "> <Enzyme_Comission_number_2 = " "> For entity 3 <Molecular_entity_id_3 = " "> <Fragment_name_3 = " "> <Specific_mutation_3 = " "> <Enzyme_Comission_number_3 = " "> ...(add more if needed)... ================CATEGORY 10: Genetically Manipulated Source============= Enter data in the genetically manipulated source category If the biomolecule has been genetically manipulated, describe its source and expression system here. For entity 1 <Manipulated_entity_id_1 = " "> (e.g. 1, 2, ...) <Source_organism_scientific_name_1 = " "> (e.g. Homo sapiens) <Source_organism_gene_1 = " "> (e.g. RPOD, ALKA...) <Expression_system_scientific_name_1 = " "> (e.g. Escherichia coli) <Expression_system_strain_1 = " "> (e.g. BL21(DE3)) <Expression_system_vector_type_1 = " "> (e.g. plasmid) <Expression_system_plasmid_name_1 = " "> (e.g. pET26) <Manipulated_source_details_1 = " "> (any other relevant information) For entity 2 <Manipulated_entity_id_2 = " "> <Source_organism_scientific_name_2 = " "> <Source_organism_gene_2 = " "> <Expression_system_scientific_name_2 = " "> <Expression_system_strain_2 = " "> <Expression_system_vector_type_2 = " "> <Expression_system_plasmid_name_2 = " "> <Manipulated_source_details_2 = " "> For entity 3 <Manipulated_entity_id_3 = " "> <Source_organism_scientific_name_3 = " "> <Source_organism_gene_3 = " "> <Expression_system_scientific_name_3 = " "> <Expression_system_strain_3 = " "> <Expression_system_vector_type_3 = " "> <Expression_system_plasmid_name_3 = " "> <Manipulated_source_details_3 = " "> ...(add more if needed)... ================CATEGORY 11: Natural Source============================= Enter data in the natural source category If the biomolecule was derived from a natural source, describe it here. For entity 1 <natural_source_entity_id_1 = " "> (e.g. 1, 2, ...) <natural_source_scientific_name_1 = " "> (e.g. Homo sapiens) <natural_source_details_1 = " "> (any other relevant information e.g. organ, tissue, cell ..) For entity 2 <natural_source_entity_id_2 = " "> <natural_source_scientific_name_2 = " "> <natural_source_details_2 = " "> for entity 3 <natural_source_entity_id_3 = " "> <natural_source_scientific_name_3 = " "> <natural_source_details_3 = " "> ...(add more if needed)... ================CATEGORY 12: Keywords=================================== Enter a list of keywords that describe important features of the deposited structure. For example, beta barrel, protein-DNA complex, double helix, hydrolase, structural genomics etc. <structure_keywords = " "> ================CATEGORY 13: Biological Assembly======================== Enter data in the biological assembly category Biological assembly describes the functional unit(s) present in the structure. There may be part of a biological assembly, one or more than one biological assemblies in the asymmetric unit. Case 1 * If the asymmetric unit is the same as the biological assembly nothing special needs to be noted here. Case 2 * If the asymmetric unit does not contain a complete biological unit. Please provide symmetry operations including translations required to build the biological unit. (example: The biological assembly is a hexamer generated from the dimer in the asymmetric unit by the operations: -y, x-y-1, z-1 and -x+y, -x-1, z-l.) Case 3 * If the asymmetric unit has multiple biological units Please specify how to group the contents of the asymmetric unit into biological units. (example: The biological unit is a dimer. There are 2 biological units in the asymmetric unit (chains A & B and chains C & D). For biological unit 1 <biological_assembly_1 = " "> For biological unit 2 <biological_assembly_2 = " "> ....(add more if needed).... ================CATEGORY 14: Crystals=================================== Enter the number of crystals used for diffraction <number_of_crystals = " "> ================CATEGORY 15: Methods and Conditions===================== Enter the crystallization conditions for each crystal For crystal 1: <crystal_number_1 = " "> (e.g. 1, 2, ...) <crystallization_method_1 = " "> (e.g. vapor diffusion, hanging drop) <crystallization_pH_1 = " "> (e.g. 7.5 ...) <crystallization_temperature_1 = " "> (e.g. 100) (in Kelvin) <crystallization_components_1 = " "> (e.g. PEG 4000, NaCl etc.) For crystal 2: <crystal_number_2 = " "> <crystallization_method_2 = " "> <crystallization_pH_2 = " "> <crystallization_temperature_2 = " "> <crystallization_components_2 = " "> ...(add more if needed)... ================CATEGORY 16: Crystal Property=========================== Enter solvent content, Matthews coefficient These values were calculated based on the sequence as shown in CATEGORY 2. If there are missing residues, you need to add the missing residues and re-run the program to get accurate values. (The command to re-run is 'extract -sol data_template.text') For crystal 1: <crystals_number_1 = " 1 "> (e.g. 1, 2, ...) <crystals_solvent_content_1 = "86.8 "> <crystals_matthews_coefficient_1 = "9.4 "> <crystals_mosaicity_1 = " "> (e.g. 0.5 ...) ...(add more if needed)... ================CATEGORY 17: Radiation Source=========================== Enter the details of the source of radiation, the X-ray generator, and the wavelength for each diffraction. For experiment 1: <radiation_experiment_1 = " "> (e.g. 1, 2, ...) <radiation_source_type_1 = " "> (e.g. rotating-anode, synchrotron ...) <radiation_source_name_1= " "> (e.g. Rigaku RU200, CHESS Beamline A1 ...) <radiation_wavelengths_1= " "> (e.g. 1.502 ...) <radiation_protocol_1= " "> (e.g. MAD, SINGLE WAVELENGTH ...) <radiation_detector_type_1 = " "> (e.g. CCD, IMAGE PLATE ...) <radiation_detector_name_1= " "> (e.g. SIEMENS-NICOLET, RIGAKU RAXIS ...) For experiment 2: <radiation_experiment_2 = " "> <radiation_source_type_2 = " "> <radiation_source_name_2 = " "> <radiation_wavelengths_2 = " "> <radiation_protocol_2= " "> <radiation_detector_type_2 = " "> <radiation_detector_name_2= " "> ....(add more if needed).... ================CATEGORY 18: Collection Temperature===================== Enter the temperature for data collection (in Kelvin) <collection_temperature_crystal_1 = " "> (for crystal 1:) <collection_temperature_crystal_2 = " "> (for crystal 2:) ....(add more if needed).... ================CATEGORY 19: Structure Genomics========================= If it is the structure genomics project, give the information <SG_project_id = " 1"> <SG_project_name = " "> <full_name_of_SG_center = " "> <initial_of_SG_center = " "> =====================================END==================================
| ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ THE LOG_SCRIPT.INP FILE ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ NOTES AND REMINDER This script file is used to enter the names of the crystallographic software used for structure determination and the log, PDB, mmCIF or text files generated by them. PLEASE COMPLETE the ENTRY FIELDS according to the type of your experiment and use the command 'extract -ext log_script.inp' to obtain the completed structure data ready for validation and deposition. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ GUIDELINES FOR USING THIS FILE 1. Only strings included between the 'lesser than' and 'greater than' signs (<.....>) will be parsed for evaluation by the program. Therefore, DO NOT write either on the left or right of the 'less than' and 'greater than' signs respectively. 2. All alphanumeric values or strings that you include in the different categories should be within double-quotes. Blank spaces or carriage returns within a pair of double quotes are ignored by the program. DO NOT use double quotes (") within strings that you enter. 3. Log files used for generating the deposition should be generated from the best (usually the last) trial for each crystallographic software. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~START INPUT DATA BELOW~~~~~~~~~~~~~~~~~~~~~~~ ===============PART 1: Structure Factor for Final Refinement============== Enter reflection data file used for final structure refinement NOTE: * Usually the highest resolution or best data set is used for the refinement. Use that structure factor file here. * In some cases, it may not be possible to collect a complete dataset from a single crystal. Thus, multiple data sets have to be scaled and merged together for refinement. Use the merged reflection file here. * If the reflection data format is not one of those listed below, please use OTHER for the data format, and provide an ASCII file that has at least five values [H, K, L, I (or F), sigmaI (or sigmaF)] for each reflection and seperate each item by one or more spaces. Include the test flags as the sixth column in the file (if available). * If the reflection file is in mtz format (e.g. using REFMAC5), convert it to mmCIF format using the mtz2various application provided by CCP4. Reflection data format: CNS|SHELX|TNT|REFMAC5|HKL|SCALEPACK|DTREK|SAINT|SCALA|3DSCALE <reflection_data_type = "F" > [enter I (intensity) or F (amplitude)] <reflection_data_format = "CNS" > <reflection_data_file_name = " " > ==============PART 2: Structure Factors for Protein Phasing================ Enter reflection data files used for heavy atom or MAD phasing NOTE: * Enter this category if you have more than one complete reflection file (e.g. in the case of MAD,SIRAS, MIR). The LOG files generated from data scaling software for all these data sets are also needed. * If the scaling program is not one of those listed below (HKL|SCALEPACK|DTREK|SAINT|3DSCALE), enter OTHER for the program name and provide an ASCII file with five values [H, K, L, I (or F), sigmaI (or sigmaF)] for each reflection and seperate each item by a space * If the same crystal was used for collecting multiple data sets, the crystal number will remain '1' as the wavelength numbers change. However, if multiple crystals were used, for the data collections, the corresponding crystal numbers should be used for each data set. * IT IS IMPORTANT THAT THE LOG FILE AND DATA FILE COME FROM THE SAME PROGRAM. <scale_data_type = "I" > [enter I (intensity) or F (amplitude)] <scale_program_name = "HKL" > For data set 1: <crystal_number = "1" > <diffract_number = "1" > <scale_data_file_name = " " > <scale_log_file_name = " " > For data set 2: <crystal_number = "1" > <diffract_number = "2" > <scale_data_file_name = " " > <scale_log_file_name = " " > For data set 3: <crystal_number = "1" > <diffract_number = "3" > <scale_data_file_name = " " > <scale_log_file_name = " " > ==================PART 3: Statistics for Indexing===================== Enter log file and software name for data indexing NOTE: * This is only for the data of final structure refinment. Software for indexing is one of the following: (HKL|DENZO|DTREK|MOSFLM) <data_indexing_software = "HKL" > <data_indexing_LOG_file_name = " " > <data_indexing_CIF_file_name = " " > (if mmCIF format) ==================PART 4: Statistics for Data Scaling===================== Enter log file and software name for data scaling NOTE: * The log file included here should have scaling statistics of the file used for the final structure refinement. If multiple data sets were scaled and merged for refinement (as described in Part 1 above) use the log file generated during merging of the data sets. Software for scaling is one of the following: (HKL|SCALEPACK|DTREK|SAINT|3DSCALE|SCALA) <data_scaling_software = "HKL" > <data_scaling_LOG_file_name = " " > <data_scaling_CIF_file_name = " " > (if mmCIF format) ==============PART 5: Statistics for Molecular Replacement================ Enter log files and software name for molecular replacement NOTE: Software is one of the following: (CNS|AMORE|MOLREP|EPMR|PHASER) The log file should be from the best trial of MR. <mr_software = " " > <mr_log_file_LOG_1 = " " > <mr_log_file_LOG_2 = " " > =================PART 6: Statistics for Protein Phasing=================== Enter log files and software name for heavy atom phasing NOTE: The phasing method should be one of (SAD|MAD|SIR|SIRAS|MIR|MIRAS). Software is one of the following: (CNS|MLPHARE|SOLVE|SHELXS|SHELXD|SNB|BNP|SHARP|PHASES) The log file should be from the best trial of phasing. <phasing_method = "MAD" > <phasing_software = "SOLVE" > <phasing_log_file_LOG_1 = " " > <phasing_log_file_PDB_1 = " " > (if PDB format (heavy atom coordinates)) <phasing_log_file_CIF_1 = " " > (if mmCIF format) <phasing_log_file_LOG_2 = " " > <phasing_log_file_PDB_2 = " " > <phasing_log_file_CIF_2 = " " > ... add more if needed ... ===============PART 7: Statistics for Density Modification================ Enter log files and software name for density modification NOTE: Software is one of the following: (CNS|DM|RESOLVE|SOLOMON|SHELXE) The log file should be from the best trial of density modification. <dm_software = "RESOLVE " > <dm_log_file_LOG_1 = " " > <dm_log_file_CIF_1 = " " > (if mmCIF format) ===============PART 8: Statistics for Structure Refinement================ Enter log files and software name used for final structure refinement NOTE: Software is one of the following: (CNS|REFMAC5|SHELXL|TNT|PROLSQ|NUCLSQ|RESTRAIN) The log file should be from the final trial of structure refinement. <refine_software = "REFMAC5" > <refine_log_file_PDB_1 = " " > (coordinate file in PDB format) <refine_log_file_CIF_1 = " " > (mmCIF file containing refinement statistics) <refine_log_file_LOG_1 = " " > =======================PART 9: Data Template File========================= Enter file name of the data template file NOTE: This file 'data_template.text' was generated by using the command 'extract -pdb pdb_file' or 'extract -cif cif_file'. It contains the sequences of all unique polymers (protein or nucleic acid) present in the structure. It also contains other non-electronically captured information. Please complete the data template file before running pdb_extract. <data_template_file = "data_template.text" > ==========================PART 10: Output Files============================ Enter the output file names NOTE: If you do not give the output file names, the default names pdb_extract_sf.mmcif containing structure factors and pdb_extract.mmcif containing coordinates will be assigned by the program <sf_output= " " > (for structure factors) <statistics_output= " " > (for coordinates and statistics) =====================================END================================== |