GH6, GH67 and CE4 acetyl xylan esterases were only relevant for p

GH6, GH67 and CE4 acetyl xylan esterases were only relevant for prediction with the eSVMfPFAM classifier. Additionally, both models specified protein domains not commonly associated with plant biomass degradation as being relevant for assignment, such as the lipoproteins selleck chemical DUF4352 and PF00877 and binding domains PF10509 and PF03793. Distinctive CAZy families of microbial plant biomass degraders We searched for distinctive CAZy families of microbial plant biomass degraders with our method. CAZy fam ilies include glycoside hydrolases, carbohydrate binding modules, glycosyltransferases, polysaccharide lyases and carbohydrate esterases. The annotations from the CAZy database comprised 64 genomes of non lignocellulose degrading species and 16 genomes of lignocellulose degraders.

There were no CAZy annotations available for the remaining genomes. In addition, we included the metagenomes of the gut microbiomes of the Tammar wallaby, the wood degrading higher termite and of the cow rumen microbiome. We evaluated the value of information about the presence or absence of CAZy domains, or of their rela tive frequencies for identification of lignocellulose degrading microbial genomes in the following experiments 1 By training of the classifiers eSVMCAZY A and eSVMCAZY a, based on genome annotations with all CAZy families. 2 By training of the classifiers eSVMCAZY B and eSVMCAZY b, based on the annotations of the genomes and the TW sample with all CAZy families, except for the GT family members, which were not annotated for the TW sample.

3 By training of the classifiers eSVMCAZY C and eSVMCAZY c with the entire data set based on GH family and CBM annotations, as these were the only ones available for the three metagenomes. The macro accuracy of these classifiers ranged from 0. 87 to 0. 96, similar to the Pfam domain based models. Notably, almost exclusively Actinobacteria were misclassified by the eSVMCAZY classifiers, except for the Firmicute Caldicellulosiruptor saccharolyticus. The best classification results were obtained with the presence absence information for all CAZy families ex cept for the GT families of the microbial genomes and the TW sample. In this setting only two species were misclassified. These species remained misclassified with all six classifiers. Using feature selection, we determined the CAZy fam ilies from the six eSVMCAZy classifiers that are most rele vant for identifying microbial cellulose degraders.

Many of these GH families and CBMs are present in all genomes. This analysis identified Anacetrapib further gene families known to be relevant for plant biomass degrad ation. Among them are cellulase containing selleck chemical Y-27632 families, hemicellulase containing families, families with known oligosaccharide/side chain degrading activities and several CBMs. Several of these were consistently identified by at least half of the six classifiers as distinctive for plant biomass degraders.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>