Conformational Space of Short Peptides Dataset

Conformational Space of Short Peptides dataset allows us to explore the conformational space of all possible peptides using the 20 common amino acids. It consists of a collection of exhaustive molecular dynamics simulations of tripeptides and pentapeptides.

Go to Dataset

Data

The dataset is composed of tar.bz2 files with the following name convention:


N[1]C[2]-R[3].tar.bz2 where:


1- First 4 Amino Acids (or 2 for tripeptide) using IUPAC three letter notation.

2- Last Amino Acid using IUPAC three letter notation

3- Replica number from 1 to 3.


For example, a valid file is: NVALALAPROALACTHR-R2.tar.bz2


Inside each tarball, there are up to 4 files:

.traj

.crd

.out

.top

.traj: trajectory files. A Trajectory file contains one or more atoms objects, usually to be interpreted as a time series. This file can be interacted with the ASE Python library

.crd: Sander coordinate files. This file is generated by the LEaP program. It defines the coordinates of the atoms in your system. This file can be read with the Bio3D R package

.out: Output file from Sander. This is the captured standard output from Sander. There is a custom parser at Toyoko Github repository

.top file TEXT Topology file from Tleap (like p.1.top). Note: This file is present only for pentapeptides.

Installation and Download

The data is in a public accessible S3 bucket, it can be downloaded with different methods.

For tripeptides: There are 3 files per each replica of a tri-peptide. The files are: Amber .out, .traj and .prod. Here is the general format:


N{aa}{aa}C{aa}-R{replica_number}.dmd.1.{extension}


For example, the tripeptide ALA-SER-ASN .traj file for replica 2, the name is NALASERCASN-R2.dmd.1.traj


For pentapeptides: All penta peptides files are packaged inside a tarball. Here is the general format:


N{aa}{aa}{aa}{aa}C{aa}-R1.tar.bz2


For example, the pentapeptide SER-ALA-GLY-LEU-PRO, the package filename is NSERALAGLYLEUCPRO-R1.tar.bz2

Common naming standard for all methods

1- Downloading from the web.


S3 supports https download. You can use wget, curl or a web browser to download each peptide. The base URL is:


https://toyokounqpeptides.s3.us-west-2.amazonaws.com/


For tripeptides, the directory is "tripep" and for pentapeptides, "5pep".


To download the ALA-SER-ASN out file from replica 2:


Methods to download

2- Using AWS S3 command line tools. https://aws.amazon.com/cli/


The peptides can be downloaded using the S3 URI using the AWS command line tools (or any AWS compatible library like boto3 [URL]).


Installing AWS-CLI


In Linux:

This should work in any modern LinuxL

wget https://toyokounqpeptides.s3.us-west-2.amazonaws.com/tripep/NALASERCASN-R2.dmd.1.out

To download the SER-ALA-GLY-LEU-PRO peptide:


wget https://toyokounqpeptides.s3.us-west-2.amazonaws.com/5pep/NSERALAGLYLEUCPRO-R1.tar.bz2

In macOS:

In a terminal, paste the following commands:


curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"

unzip awscliv2.zip

sudo ./aws/install


More information: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html


In Windows:

Download the last installer from https://awscli.amazonaws.com/AWSCLIV2.msi double click on it and follow the instructions.


Using Docker:


If you have Docker installed, just pull the AWS-CLI image from amazon/aws-cli.


Usage:

curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"

sudo installer -pkg AWSCLIV2.pkg -target /

docker run --rm -it amazon/aws-cli command


For information on updating and using your own credentials, see URL https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-docker.html


Using AWS-CLI:


To download the NSERALAGLYLEUCPRO-R1.tar.bz2 file using AWS-CLI:


aws s3 cp s3://toyokounqpeptides/5pep/NSERALAGLYLEUCPRO-R1.tar.bz2 .


For a tripeptide file:

aws s3 cp s3://toyokounqpeptides/tripep/NALAALACALA-R4.prod.1.crd .

Use Cases

Dipeptidyl-peptidase IV (DPP-IV) (EC number 3.4.14.5) is an enzyme that modulates the biological activity of peptide hormones circulating in several tissues, plasma and other body fluids. DPP-IV is a serine protease that inactivates two hormones of the incretin system, which favor the regulation of glucose: glucagon-like peptide 1 (GLP-1) and the glucose-dependent insulinotropic peptide (GIP). Therefore, suppression of DPP-IV activity is a molecular goal for treatment of diabetes mellitus (Yan et al., 2019). DPP-IV inhibitors have been used to control postprandial glycaemia in type 2 diabetes (Hatanaka et al., 2012). Among these, inhibitory peptides gained relevance over the last years, with 427 peptides reported in Biopep. Interestingly, 143 of them with experimentally determined IC/EC50 values, a measure of their inhibitory response in specific biochemical reactions.


Cleavage sites for DPP-IV have been determined according to reports on different substrates. In particular, it removes N-terminal segments by cleaving after Xaa-Pro and Xaa-Ala dipeptides (https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=S09.003).

Several DPP-IV structures are known and deposited in the Protein Data Bank. For example, 1WCY is co-crystallized with the diprotin A, a peptide of sequence Ile-Pro-Ile which serves as a substrate of slow hydrolysis (Hiramatsu et al. 2004). Collectively, these structures represent a diverse subset of the natural variability of this protein, thus providing insights into the biological activity and relevance of the protein.

The Conformational Space of Short Peptides dataset can be used to study the structural constraints of binding tripeptides and pentapeptides in proteins such as DPP-IV. This dataset samples a representative subset of the structural conformations that are available for each of all possible peptides of the selected length. The structure of DPP-IV bound to the Ile-Pro-Ile tripeptide can be used as a starting template for docking studies aimed to analyze which of the alternative tripeptides in our dataset can also ‘inhabit’ the same binding pocket of DPP-IV. Positive results would suggest interesting candidates for DPP-IV inhibition. The binding affinities of this selected subset can be compared with those of inhibitory tripeptides reported in Biopep (13 for DPP-IV) to assess if any enhanced inhibitor could be available in our dataset. This can be achieved in silico by studying the interaction energies obtained by docking studies (e.g. with readily available online tools such as FlexPepDock) and the sequence and structure recognition patterns of DPP-IV towards these tripeptides.

References:


Hiramatsu H, Yamamoto A, Kyono K, Higashiyama Y, Fukushima C, Shima H, Sugiyama S, Inaka K, Shimizu R. The crystal structure of human dipeptidyl peptidase IV (DPPIV) complex with diprotin A. Biol Chem. 2004, 385(6), 561-564.


Hatanaka T, Inoue Y, Arima J, Kumagi Y, Usuki H, Kawakami K, Kimura M and Mukaihara T. Production of dipeptidyl peptidase IV inhibitory peptides from defatted rice bran. Food Chem. 2012, 134, 797-802.J.


Yan, J. Zhao, R. Yang and W. Zhao. Bioactive peptides with antidiabetic properties: a review, Int. J. Food Sci. Technol., 2019, 54, 1909-1919.


Juillerat-Jeanneret L. Dipeptidyl peptidase IV and its inhibitors: therapeutics for type 2 diabetes and what else? J Med Chem. 2014, 57(6), 2197-2212.

This dataset was generated by

&

Powered by Toyoko

1900 Powell St. STE 700

Emeryville, CA 94608.

TEL +1 510 545 4521

info@toyoko.io


Follow us