Skip to main content
Fig. 4 | BioData Mining

Fig. 4

From: LightCUD: a program for diagnosing IBD based on human gut microbiome data

Fig. 4

The pipeline of data processing and the LightCUD program construction. With WGS raw data of 349 samples, we eliminated the low-quality reads and assembled the remaining reads into contigs. Contigs > 1000 bp were taxonomically binned into strains and genera. 16S rRNA-based discrimination modules were constructed with genus-level profiles and WGS-based discrimination modules were constructed with strain-level profiles. For the four modules, we designed different feature selection procedures and compared different machine learning algorithms. LightGBM was selected as the core algorithm for modules construction for its best performance. For WGS-based modules, we further optimized the model by shrinking the feature set through pre-training. Finally, a high-performance dual-usage discrimination program LightCUD was successfully constructed. The corresponding reference databases were released along with the prediction modules

Back to article page