The dataset is composed of tar.bz2 files with the following name convention:
1- First 4 Amino Acids (or 2 for tripeptide) using IUPAC three letter notation.
2- Last Amino Acid using IUPAC three letter notation
3- Replica number from 1 to 3.
For example, a valid file is: NVALALAPROALACTHR-R2.tar.bz2
Inside each tarball, there are up to 4 files:
.traj: trajectory files. A Trajectory file contains one or more atoms objects, usually to be interpreted as a time series. This file can be interacted with the ASE Python library
.crd: Sander coordinate files. This file is generated by the LEaP program. It defines the coordinates of the atoms in your system. This file can be read with the Bio3D R package
.out: Output file from Sander. This is the captured standard output from Sander. There is a custom parser at Toyoko Github repository
.top file TEXT Topology file from Tleap (like p.1.top). Note: This file is present only for pentapeptides.
Installation and Download
The data is in a public accessible S3 bucket, it can be downloaded with different methods.
For tripeptides: There are 3 files per each replica of a tri-peptide. The files are: Amber .out, .traj and .prod. Here is the general format:
For example, the tripeptide ALA-SER-ASN .traj file for replica 2, the name is NALASERCASN-R2.dmd.1.traj
For pentapeptides: All penta peptides files are packaged inside a tarball. Here is the general format:
For example, the pentapeptide SER-ALA-GLY-LEU-PRO, the package filename is NSERALAGLYLEUCPRO-R1.tar.bz2
Common naming standard for all methods
1- Downloading from the web.
S3 supports https download. You can use wget, curl or a web browser to download each peptide. The base URL is:
For tripeptides, the directory is "tripep" and for pentapeptides, "5pep".
To download the ALA-SER-ASN out file from replica 2:
Methods to download
2- Using AWS S3 command line tools. https://aws.amazon.com/cli/
The peptides can be downloaded using the S3 URI using the AWS command line tools (or any AWS compatible library like boto3 [URL]).
This should work in any modern LinuxL
To download the SER-ALA-GLY-LEU-PRO peptide:
In a terminal, paste the following commands:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
Download the last installer from https://awscli.amazonaws.com/AWSCLIV2.msi double click on it and follow the instructions.
If you have Docker installed, just pull the AWS-CLI image from amazon/aws-cli.
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /
docker run --rm -it amazon/aws-cli command
For information on updating and using your own credentials, see URL https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-docker.html
To download the NSERALAGLYLEUCPRO-R1.tar.bz2 file using AWS-CLI:
aws s3 cp s3://toyokounqpeptides/5pep/NSERALAGLYLEUCPRO-R1.tar.bz2 .
For a tripeptide file:
aws s3 cp s3://toyokounqpeptides/tripep/NALAALACALA-R4.prod.1.crd .
Dipeptidyl-peptidase IV (DPP-IV) (EC number 220.127.116.11) is an enzyme that modulates the biological activity of peptide hormones circulating in several tissues, plasma and other body fluids. DPP-IV is a serine protease that inactivates two hormones of the incretin system, which favor the regulation of glucose: glucagon-like peptide 1 (GLP-1) and the glucose-dependent insulinotropic peptide (GIP). Therefore, suppression of DPP-IV activity is a molecular goal for treatment of diabetes mellitus (Yan et al., 2019). DPP-IV inhibitors have been used to control postprandial glycaemia in type 2 diabetes (Hatanaka et al., 2012). Among these, inhibitory peptides gained relevance over the last years, with 427 peptides reported in Biopep. Interestingly, 143 of them with experimentally determined IC/EC50 values, a measure of their inhibitory response in specific biochemical reactions.
Cleavage sites for DPP-IV have been determined according to reports on different substrates. In particular, it removes N-terminal segments by cleaving after Xaa-Pro and Xaa-Ala dipeptides (https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=S09.003).
Several DPP-IV structures are known and deposited in the Protein Data Bank. For example, 1WCY is co-crystallized with the diprotin A, a peptide of sequence Ile-Pro-Ile which serves as a substrate of slow hydrolysis (Hiramatsu et al. 2004). Collectively, these structures represent a diverse subset of the natural variability of this protein, thus providing insights into the biological activity and relevance of the protein.
The Conformational Space of Short Peptides dataset can be used to study the structural constraints of binding tripeptides and pentapeptides in proteins such as DPP-IV. This dataset samples a representative subset of the structural conformations that are available for each of all possible peptides of the selected length. The structure of DPP-IV bound to the Ile-Pro-Ile tripeptide can be used as a starting template for docking studies aimed to analyze which of the alternative tripeptides in our dataset can also ‘inhabit’ the same binding pocket of DPP-IV. Positive results would suggest interesting candidates for DPP-IV inhibition. The binding affinities of this selected subset can be compared with those of inhibitory tripeptides reported in Biopep (13 for DPP-IV) to assess if any enhanced inhibitor could be available in our dataset. This can be achieved in silico by studying the interaction energies obtained by docking studies (e.g. with readily available online tools such as FlexPepDock) and the sequence and structure recognition patterns of DPP-IV towards these tripeptides.
Hiramatsu H, Yamamoto A, Kyono K, Higashiyama Y, Fukushima C, Shima H, Sugiyama S, Inaka K, Shimizu R. The crystal structure of human dipeptidyl peptidase IV (DPPIV) complex with diprotin A. Biol Chem. 2004, 385(6), 561-564.
Hatanaka T, Inoue Y, Arima J, Kumagi Y, Usuki H, Kawakami K, Kimura M and Mukaihara T. Production of dipeptidyl peptidase IV inhibitory peptides from defatted rice bran. Food Chem. 2012, 134, 797-802.J.
Yan, J. Zhao, R. Yang and W. Zhao. Bioactive peptides with antidiabetic properties: a review, Int. J. Food Sci. Technol., 2019, 54, 1909-1919.
Juillerat-Jeanneret L. Dipeptidyl peptidase IV and its inhibitors: therapeutics for type 2 diabetes and what else? J Med Chem. 2014, 57(6), 2197-2212.