A sequence-based protein feature survey on Glycoside Hydrolases Family 28 in Aspergillus
GH 28 is a large protein family which contributes to pectin hydrolysis. This enzyme family is widely exists in Aspergilli species, which are known to be suitable for industrial usage for biomass conversion to produce biofuel. GH28 enzymes were separated into endopolygalacturonase, exopolygalacturonase, endorhamnogalacturonase, exorhamnogalacturonase and xylogalacturonan hydrolase based on their enzymatic specificities. The CAZy Database and PROSITE have provided solid sequence models to identify GH28 proteins from protein sequences, even genome scale. However, by far no detailed classification on enzyme activity of GH28 members is available. In this research, sequence analysis on enriched GH28 enzymes in Aspergilli sp. were carried out and the sequences near known enzymatic active site were aligned. The phylogenetic tree generated by aligning active sites revealed that GH28 enzymes could be clustered into distinct clade according to specific enzyme activities. For each group of enzyme specificity, unique conserved sequence features could be detected. Among all groups, Endo-rhamnogalacturonase was the most distinct group from the GH28 due to the fact that it has an active site identified by similarity located outside the active site cleft. This indicated that the enzyme substrate specificity is strongly affected by the active site structure and therefore, the amino acid sequence composition of the protein. For each group, the manually curated active site alignments were used to generate Hidden Markov Models. These models can be served as methods classify newly identified (putative) Aspergillus GH28 enzymes for their functions. Moreover, this report points out the possible mutation sites and effect of mutations that might affect the enzymatic activities of GH28 members, therefore can be used as supports for future experimental targets.