Proteomics, The Snapshots of Proteins

 

INTRODUCTION

Imagine peering into the microscopic world of proteins, the molecular workhorses that drive the processes vital to life. Proteomics—the study of all the proteins produced by an organism—provides a powerful framework for understanding how our cells function, adapt, and interact. Proteins are essential to nearly every biological activity, from growth and repair to immune defense and disease progression. By analyzing the vast and complex network of proteins, researchers are uncovering insights that could revolutionize medicine, health, and drug development. In this blog, we’ll explore the captivating field of proteomics and how it’s reshaping our understanding of biology, health, and disease.

The term “proteomics” was first coined in 1995 and is defined as the large-scale characterization of the entire protein complement of a cell line, tissue, or organism.



HISTORY

·         The first protein studies that can be called proteomics began in 1975 with the introduction of the two-dimensional gel by O’Farrell, Klose, and Scheele, who began mapping proteins from Escherichia coli, mouse, and guinea pig, respectively. In their experiment, although many proteins could be separated and visualized, they could not be identified.

·         Further 2D Protein electrophoresis was used to catalog all human proteins.

Figure 1: 2D Gel Electrophoresis.

·         The first major technology to emerge for the identification of proteins was the sequencing of proteins by Edman degradation.

Figure 2: Edman degradation.

·         A major breakthrough was the development of microsequencing techniques for electroblotted proteins. This technique was used for the identification of proteins from 2-D gels to create the first 2-D databases.

·         One of the most important developments in protein identification has been the development of MS technology, which helped the research in enhancing the sensitivity and accuracy as it can detect the proteins in the femtomolar level and can be used in high-throughput operations.

Figure 3: Mass Spectrometry Technology.

METHODOLOGY OF PROTEOMICS

1.      Genome sequencing and Annotation of the genome.

·         Without Genomic data base proteins cannot be sequenced.

·         Haemophilus influenza (a gram-negative coccobacillary bacteria) was the first organism, whose genome was sequenced for the first time in 1995.

·         Gene annotation is the process of identifying the total no of the genes in the genome along with its Exons and Introns. 

2.      Protein expression studies.

·         The analysis of mRNA expression by various methods includes Serial Analysis of Gene Expression (SAGE) and DNA microarray technology.

Figure 4: Serial Analysis of Gene Expression (SAGE)

Figure 5: DNA Microarray Technique.

·         It includes the studies of the transcription, post-transcriptional modifications like splicing and editing, translation, post-translational modification, etc.

·         It is estimated that up to 200 different types of post-translational protein modification exist.

·         Proteins can also be regulated by proteolysis (Protein degradation) and compartmentalization (spatial organization of proteins in different regions inside the cell). 

3.      Functional analysis of proteins.

·         The functions of many proteins can only be inferred by examination of their 3-D structure.

Figure 6: HBA1 
(Alpha 1 subunit of Hemoglobin, the oxygen carrying protein present in RBCs). 

4.      Protein-protein interactions studies.

·         Of fundamental importance in biology is the understanding of protein-protein interactions.

·         The process of cell growth, programmed cell death, and the decision to proceed through the cell cycle are all regulated by signal transduction through protein complexes.

·         Proteomics aims to develop a complete 3-D map of all protein interactions in the cell.


 TECHNOLOGIES USED IN PROTEOMICS

A typical proteomics experiment (such as protein expression profiling) can be broken down into the following categories: (i) The separation and isolation of proteins from a cell line, tissue, or organism; (ii) The acquisition of protein structural information for the purposes of protein identification and characterization; and (iii) Database utilization. 

(i)   Separation and Isolation of Proteins

The predominant technology for protein separation and isolation is polyacrylamide gel electrophoresis (PAGE).

The main types of Protein PAGE (Polyacrylamide Gel Electrophoresis) in electrophoresis are:

1. SDS-PAGE (Sodium Dodecyl Sulfate-PAGE): Separates proteins based on size (molecular weight) in the presence of SDS, which denatures and coats proteins with a negative charge.

Figure 7: SDS-PAGE.

2. Native PAGE: Separates proteins based on their native charge and size, without denaturing agents, preserving protein structure and function.

3. Gradient PAGE: Uses a gradient of acrylamide concentrations to separate proteins over a wide range of sizes.

4. 1D-PAGE (One-Dimensional PAGE): Separates proteins on the basis of size.

5. 2D-PAGE (Two-Dimensional PAGE): Combines two separation techniques:

    - IEF (Isoelectric Focusing): separates proteins by charge (Isoelectric point)

    - SDS-PAGE: separates proteins by size (molecular weight)

6. Urea-PAGE: Uses urea to denature proteins and separate them based on size, often used for membrane proteins.

7. Gel-Free PAGE: Uses microfluidic devices or capillary electrophoresis to separate proteins without a gel matrix.

8. CN-PAGE (Clear Native PAGE): Separates proteins in their native state, preserving protein complexes and interactions.

9. BN-PAGE (Blue Native PAGE): Separates proteins in their native state, using a charge-based separation mechanism.

 

Ø  Generally in Proteomics, 1DE is used as it preserves the native structure and conformations intact. If the sample is complex like crude cell lysate, then 2DE is performed. 

Ø  Although the 2DE helps in the separation, there are a number of limitations with it including-

·         Time consuming up to 2 days.

·         Only a single sample can be analysed at a time.

·         It is limited by the number and type of protein to be resolved.

·         Many large or hydrophobic proteins will not enter the gel during the first dimension.

·         Proteins of extreme acidity or basicity (proteins with pIs below pH 3 and above pH 10) are not well represented.

·         Low copy proteins cannot be detected when a cell lysate is analysed.

 

Ø  To overcome these issues, one method is to convert the entire protein mixture to peptides by digesting them with Trypsin followed by purification of those peptides in by using the methods like capillary electrophoresis, liquid chromatography, cation exchange chromatography, reverse phase chromatography

Ø  These techniques aid the researcher in bypassing the 2DE. But, these techniques also have limitations like-

·         Time consuming.

·         Requires computing power to deconvolute the data obtained. 

Ø  One of the most exciting techniques to emerge as an alternative to protein electrophoresis is that of Isotope-Coded Affinity Tags (ICAT). This method allows the quantitative protein profiling between different samples without the use of electrophoresis. 

(ii)     Acquisition of protein structural information

1.      Sequencing

Edman sequencing

Figure 8: Edman sequencing.

It is one of the earliest methods used for protein identification.

It is also called micro sequencing by Edman chemistry to obtain N-terminal amino acid sequences.

Limitation – The proteins modified at N-terminal cannot be sequenced.

To overcome this, another method is used –

Mixed peptide sequencing

The process of mixed-peptide sequencing involves separation of a complex protein mixture by polyacrylamide gel electrophoresis (1-D or 2-D) and then transfer of the proteins to an inert membrane by electroblotting. The proteins of interest are visualized on the membrane surface, excised, and fragmented chemically at methionine (by CNBr) or tryptophan (by skatole) into several large peptide fragments. On average, three to five peptide fragments are generated, consistent with the frequency of occurrence of methionine and tryptophan in most proteins. The membrane piece is placed directly into an automated Edman sequencer without further manipulation and then sequenced simultaneously.

Ø  After sequencing, the mixed-sequence data are fed into the FASTF or TFASTF algorithms, which sort and match the data against protein (FASTF) and DNA (TFASTF) databases to unambiguously identify the protein.

Figure 9: Mixed peptide sequencing.

2.           Mass spectrometry

Mass Spectrometry helps the researchers to get information like peptide mass, amino acid sequence, type and locations of protein modifications, etc.

It involves steps like sample preparation, ionization and mass analysis. 

Figure 10: Mass spectrometry.

Sample Preparation

In most of proteomics, a protein is resolved from a mixture by using a 1- or 2-D polyacrylamide gel.

As the extraction of the whole protein is inefficient, its constituent peptides from the gel are then extracted by digesting it by a protease (in-gel digestion),

The obtained peptides with gel contaminants are purified by Reverse phase chromatography or with ZipTips (Millipore) or with Poros R2 perfusion material or by High performance liquid chromatography (HPLC).

Figure 11: Sample preparation by HPLC.

Sample Ionization

For biological samples to be analyzed by MS, the molecules must be charged and dry. This is accomplished by converting them to desolvated ions. The two most common methods for this are electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI). In both methods, peptides are converted to ions by the addition or loss of one or more protons.

Figure 12: Sample ionization by ESI.

Figure 13: Sample ionization by MALDI.

Mass Analysis

This is accomplished by the mass analyzers in a mass spectrometer, which resolve the molecular ions on the basis of their mass and charge in a vacuum.

Different types of mass analysers include- Quadrupole mass analyzers, Time of flight, Ion trap, etc. 

Figure 14: Quadrupole mass analyzer.

Figure 15: TOF mass analyzer.

Figure 16: Ion trap mass analyzer.

(iii)             Database Utilization for Identification of Protein

Databases allow protein structural information harvested from Edman sequencing or MS to be used for protein identification.

The goal of database searching is to be able to quickly and accurately identify large numbers of proteins. The success of database searching depends on the quality of the data obtained in the mass spectrometer, the quality of the database searched, and the method used to search the database.

 

Figure 17: The web page interface of PDB (Protein Data Bank), NCBI.

Methods of Protein Identification

1.      Peptide mass fingerprinting database searching

In this method, the masses of peptides obtained from the proteolytic digestion of an unknown protein are compared to the predicted masses of peptides from the theoretical digestion of proteins in a database.

Figure 18: The web page interface of Matrix Science for peptide mass fingerprinting.

Figure 19: The method of peptide mass fingerprinting.

Limitations

 Mass analysis can be hindered when

·         A protein is extensively modified during post translational modifications or,

·         A protein is present in a complex mixture with several other proteins.

 

To overcome the above limitations, researchers use some bioinformatics tools like ProFound enables protein identification in simple protein mixtures; the ExPASy server provides a variety of tools for proteomics and programs for protein identification, PepSea, PeptIdent/MultiIdent, MS-Fit, MOWSE and many more.

 

2.      Amino acid sequence database searching

·         The most specific type of database searching for protein identification uses peptide amino acid sequence. One method which utilizes this information is Peptide Mass Tag Searching.

·         In this method, a partial amino acid sequence is obtained by interpretation of the MS/MS spectrum (the sequence tag) and this information is combined with the mass of the peptide and the masses of the peptide on either side of the sequence tag where the sequence is not known.

·         Peptide mass tag searching is a more specific tool for protein identification than peptide mass fingerprinting.

 

Figure 20: The web page interface of  UniProt (Universal Protein Resource).

3.      De novo peptide sequence information

·         It involves the method to obtain de novo sequence data from peptides by MS/MS and then use all the peptide sequences to search appropriate databases.

·         Multiple peptide sequences can be used for protein identification by searching databases with the FASTS program.

·         The single biggest advantage of this method is the capability of searching peptide sequence information across both DNA and protein databases. 

 

Figure 21: PEAKS (De Novo Sequencing Software).

4.      Uninterpreted MS/MS data searching 

·         The development of un-interpreted MS/MS search algorithms that are error tolerant.

·         The searches against un-annotated or un-translated DNA databases with un-interpreted MS/MS data are likely to suffer from the same pitfalls associated with mass fingerprinting. In particular, polymorphisms, sequencing errors, and conservative substitutions will probably contribute to failure to accurately identify a protein, we can overcome these shortcomings using this method.

·         Examples include programs such as Mascot, SONAR, and SEQUEST.

  

Figure 22: Strategies for protein identification.


TYPES OF PROTEOMICS AND THEIR APPLICATIONS IN BIOLOGY

1.      Protein Expression Profiling

·         Disease Mechanism

·         Signal Transduction

·         Medical Microbiology 

2.      Post-Translational Modifications

·         Glycosylation  

·         Phosphorylation

·         Proteolysis 

3.      Protein-Protein Interactions

·         Yeast two-hybrid

·         Co-precipitation

·         Phage Display 

4.      Structural Proteomics

·         Organelle Composition

·         Sub-proteome Isolation

·         Protein Complexes 

5.      Functional Proteomics

·         Yeast genomics

·         Affinity Purified Protein Complexes

·         Mouse Knockouts 

6.      Proteome Mining

·         Drug Discovery

·         Target Identification/Validation

·         Differential Display


SIGNIFICANCE OF PROTEOMICS

Many types of information cannot be obtained from the study of genes alone. For example, proteins, not genes, are responsible for the phenotypes of cells. It is impossible to elucidate mechanisms of disease, aging, and effects of the environment solely by studying the genome. Only through the study of proteins can protein modifications be characterized and the targets of drugs identified.

 

References

 Paul, R. and Haystead, T.A.J. (2002) 'Molecular Biologist's Guide to Proteomics', MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, p. 39–63.

Abbott, A. 1999. A post-genomic challenge: learning to read patterns of protein synthesis. Nature 402:715–720.

 Aebersold, R., B. Rist, and S. P. Gygi. 2000. Quantitative proteome analysis: methods and applications. Ann. N. Y. Acad. Sci. 919:33–47.

 Aebersold, R. H., J. Leavitt, R. A. Saavedra, L. E. Hood, and S. B. Kent. 1987. Internal amino acid sequence analysis of proteins separated by oneor two-dimensional gel electrophoresis after in situ protease digestion on nitrocellulose. Proc. Natl. Acad. Sci. USA 84:6970–6974.

Cho, W.C.S. (2016). Proteomics technologies and challenges. Clinical and Translational Medicine, 5(1), 1-10. doi:10.1186/s40169-016-0107-6


Comments

Popular Posts