Skip to main content

Table 2 A partial list of P. jirovecii full-length msg isoforms (~3 kb) identified in a clinical sample by PacBio sequencing and clustering-based analysis

From: Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads

Contig no.

Length (bp)

Matched msg gene

Status

Identity to the known (%)

PCR verification

Contig0020

3086

EF371022

Known

98.92

ND

Contig0007

3086

EF371023

Known

99.87

ND

Contig0021

3008

EF371024

Known

99.83

ND

Contig0025

3011

EF371025

Known

99.60

ND

Contig0133

3068

EF371026

Known

99.71

ND

Contig0022

3104

EF371028

Known

99.58

ND

Contig0006

3062

EF371029

Known

99.97

ND

Contig0008

3041

EF371030

Known

99.87

ND

Contig0026

3032

EF371031

Known

98.76

ND

Contig0012

3005

EF371032

Known

99.70

ND

Contig0013

2996

EF371033

Known

99.87

ND

Contig0011

3092

EF371035

Known

99.88

ND

Contig0003

3038

EF371036

Known

99.97

ND

Contig0005

3002

EF371038

Known

99.90

ND

Contig0010

2996

EF371040

Known

99.80

ND

Contig0015

3129

EF371041

Known

99.78

ND

Contig0001

3002

EF371042

Known

99.37

ND

Contig0027

3065

EF371045

Known

99.64

ND

Contig0014

3077

EF371050

Known

99.71

ND

Contig0018

3044

EF371051

Known

99.70

ND

Contig0009

3029

EF371052

Known

99.87

ND

Contig0017

3023

EF371053

Known

99.50

ND

Contig0016

3050

EF371055

Known

99.84

ND

Contig0004

3026

EF371056

Known

99.97

ND

Contig0010b

3060

No

Novel

NA

99.53

Contig0004b

3086

No

Novel

NA

99.45

Contig0015b

3039

No

Novel

NA

99.20

Contig0053b

3077

No

Novel

NA

98.91

Contig0006b

3062

No

Novel

NA

98.50

Contig0054b

3074

No

Novel

NA

98.32

Contig0138b

3041

No

Novel

NA

ND

  1. A total of 72 unique msg isoforms identified in this study, with only 31 of them shown in this table. The first 24 contigs matched in full-length with the 24 previously identified msg genes from the same clinical sample [22] as shown in the third column with GenBank accession no, NA not applicable, ND not determined by PCR. Additional file 4 contains a complete list of sequences for 72 Msg isoforms