Skip to main content

Table 2 Results obtained by our pan-cancer signature on 57 gene expression datasets

From: Towards a potential pan-cancer prognostic signature for gene expression based on probesets and ensemble machine learning

 

dataset name

cancer type

MCC

F1 score

accuracy

TPR

TNR

PPV

NPV

PR AUC

ROC AUC

1

dataHeaton2011

adrenocortical cancer

+0.082

0.401

0.404

0.852

0.245

0.318

0.813

0.617

0.631

2

dataReister2012

bladder cancer

+0.061

0.557

0.471

0.844

0.204

0.428

0.669

0.501

0.541

3

dataHatzis2009

breast cancer

+0.252

0.414

0.723

0.466

0.801

0.406

0.839

0.395

0.697

4

dataYenamandra2015

breast cancer

+0.219

0.509

0.568

0.738

0.490

0.417

0.798

0.511

0.648

5

dataJezequel2015

breast cancer

+0.189

0.799

0.703

0.854

0.308

0.759

0.479

0.835

0.654

6

dataSchmidt2008

breast cancer

+0.162

0.347

0.691

0.381

0.784

0.353

0.811

0.352

0.643

7

dataMiller2013

breast cancer

+0.151

0.277

0.634

0.561

0.650

0.207

0.908

0.267

0.640

8

dataSinn2019

breast cancer

+0.148

0.898

0.822

0.923

0.204

0.876

0.304

0.903

0.621

9

dataDesmedt2007

breast cancer

+0.129

0.424

0.520

0.681

0.462

0.317

0.802

0.360

0.597

10

dataKarn2011

breast cancer

+0.104

0.313

0.449

0.227

0.871

0.761

0.371

0.740

0.578

11

dataLoi2008

breast cancer

+0.093

0.253

0.631

0.433

0.683

0.235

0.849

0.311

0.610

12

dataIvshina2006

breast cancer

+0.070

0.669

0.571

0.740

0.321

0.624

0.456

0.675

0.563

13

dataSabatier2010

breast cancer

+0.028

0.248

0.613

0.210

0.815

0.357

0.673

0.366

0.527

14

dataHuang2014

breast cancer

+0.004

0.644

0.540

0.575

0.420

0.789

0.221

0.816

0.505

15

dataDedeurwaerder2011

breast cancer

–0.009

0.877

0.788

0.878

0.117

0.881

0.104

0.897

0.527

16

dataWang2010

breast cancer

–0.009

0.293

0.536

0.395

0.596

0.268

0.725

0.299

0.487

17

dataLin2009

breast cancer

–0.032

0.347

0.348

0.298

0.621

0.695

0.239

0.795

0.522

18

dataKim2020

breast cancer

–0.036

0.891

0.811

0.879

0.072

0.915

0.029

0.898

0.375

19

dataMetzgerFilho2016

breast cancer

–0.055

0.832

0.723

0.839

0.094

0.835

0.109

0.851

0.464

20

dataSieber2010

colorectal cancer

+0.384

0.800

0.725

0.729

0.711

0.895

0.442

0.925

0.801

21

dataChen2020

colorectal cancer

+0.374

0.482

0.766

0.406

0.912

0.656

0.795

0.609

0.776

22

dataSmith2009b

colorectal cancer

+0.272

0.684

0.645

0.664

0.621

0.751

0.513

0.792

0.678

23

dataShinto2020

colorectal cancer

+0.240

0.356

0.696

0.540

0.739

0.339

0.887

0.492

0.723

24

dataSmith2009a

colorectal cancer

+0.225

0.666

0.620

0.672

0.549

0.673

0.557

0.708

0.650

25

dataBeauchamp2014

colorectal cancer

+0.213

0.801

0.708

0.798

0.408

0.814

0.414

0.842

0.641

26

dataMarisa2013

colorectal cancer

+0.081

0.251

0.645

0.183

0.876

0.431

0.682

0.404

0.573

27

dataGotoh2018

colorectal cancer

+0.055

0.085

0.808

0.205

0.863

0.098

0.937

0.355

0.789

28

dataStaub2009

colorectal cancer

+0.002

0.863

0.769

0.874

0.142

0.863

0.116

0.849

0.442

29

dataDelRoi2017

colorectal cancer

–0.007

0.677

0.564

0.662

0.328

0.721

0.275

0.750

0.502

30

dataSpivak2014

leukemia

+0.325

0.572

0.632

0.525

0.847

0.864

0.524

0.873

0.788

31

dataHerold2013

leukemia

+0.235

0.474

0.610

0.674

0.590

0.375

0.837

0.460

0.682

32

dataMetzeler2018

leukemia

+0.136

0.309

0.632

0.240

0.866

0.518

0.662

0.535

0.685

33

dataHerold2011

leukemia

–0.001

0.187

0.537

0.189

0.808

0.363

0.602

0.479

0.526

34

dataPotti2006

lung cancer

+0.382

0.752

0.688

0.652

0.779

0.898

0.442

0.927

0.780

35

dataKohno2011

lung cancer

+0.369

0.862

0.785

0.813

0.626

0.925

0.391

0.954

0.801

36

dataRousseaux2013

lung cancer

+0.325

0.521

0.711

0.514

0.803

0.553

0.783

0.606

0.679

37

dataSon2007

lung cancer

+0.233

0.615

0.605

0.666

0.565

0.593

0.643

0.634

0.665

38

dataBild2005

lung cancer

+0.153

0.525

0.565

0.525

0.627

0.568

0.586

0.596

0.640

39

dataBotling2012

lung cancer

+0.129

0.418

0.522

0.674

0.469

0.310

0.806

0.415

0.613

40

dataHeiskanen2015

lung cancer

+0.127

0.622

0.535

0.827

0.281

0.508

0.647

0.584

0.615

41

dataPintilie2013

lung cancer

+0.070

0.669

0.571

0.740

0.321

0.624

0.456

0.675

0.563

42

dataZChen2020

lung cancer

+0.061

0.586

0.538

0.627

0.431

0.571

0.494

0.585

0.542

43

dataTsao2010

lung cancer

+0.055

0.615

0.519

0.750

0.289

0.541

0.539

0.572

0.531

44

dataPhilipsen2010

lung cancer

+0.045

0.375

0.547

0.386

0.652

0.434

0.621

0.491

0.545

45

dataXie2011

lung cancer

–0.016

0.548

0.496

0.508

0.480

0.646

0.334

0.689

0.491

46

dataMicke2011\(^a\)

lung cancer

–0.059

0.436

0.460

0.488

0.459

0.433

0.500

0.487

0.464

47

dataVanLoo2009

lymphoma

+0.370

0.679

0.673

0.691

0.754

0.825

0.584

0.949

0.890

48

dataLenz2008

lymphoma

+0.327

0.755

0.685

0.805

0.506

0.714

0.630

0.766

0.723

49

dataIqbal2015

lymphoma

+0.168

0.525

0.565

0.646

0.520

0.473

0.698

0.537

0.623

50

dataKawaguchi2012

lymphoma

+0.015

0.215

0.606

0.251

0.762

0.291

0.706

0.458

0.550

51

dataLeich2009\(^a\)

lymphoma

–0.014

0.458

0.481

0.466

0.520

0.486

0.500

0.503

0.494

52

dataShi2010

multiple myeloma

+0.148

0.680

0.601

0.640

0.514

0.745

0.399

0.755

0.609

53

dataMulligan2007

multiple myeloma

+0.033

0.189

0.423

0.120

0.901

0.671

0.393

0.650

0.529

54

dataHiyama2009

neuroblastoma

+0.213

0.869

0.785

0.964

0.220

0.803

0.624

0.928

0.776

55

dataUehara2015

ovarian cancer

–0.070

0.800

0.678

0.752

0.116

0.887

0.032

0.889

0.412

56

dataBogunovic2009

skin cancer

+0.328

0.673

0.637

0.766

0.543

0.665

0.694

0.735

0.712

57

dataPasini2021

stomach cancer

+0.385

0.469

0.750

0.500

0.959

0.843

0.772

0.986

0.974

 

average

breast cancer

+0.083

0.531

0.628

0.593

0.489

0.570

0.513

0.604

0.568

 

average

colorectal cancer

+0.184

0.567

0.695

0.573

0.615

0.624

0.562

0.673

0.658

 

average

leukemia

+0.176

0.422

0.621

0.440

0.745

0.549

0.637

0.604

0.668

 

average

lung cancer

+0.144

0.580

0.580

0.628

0.522

0.585

0.557

0.632

0.610

 

average

lymphoma

+0.173

0.526

0.602

0.572

0.612

0.558

0.624

0.643

0.656

 

average

multiple myeloma

+0.091

0.435

0.512

0.380

0.708

0.708

0.396

0.703

0.569

 

% sufficient scores

all datasets

33.33%

45.45%

61.82%

58.18%

21.82%

25.45%

50.91%

58.18%

58.18%

 

average

all datasets

+0.138

0.545

0.620

0.595

0.546

0.593

0.556

0.646

0.619

 

median

all datasets

+0.129

0.548

0.620

0.652

0.549

0.624

0.586

0.634

0.615

 

min

all datasets

–0.070

0.085

0.348

0.120

0.072

0.098

0.029

0.267

0.375

 

max

all datasets

+0.385

0.898

0.822

0.964

0.959

0.925

0.937

0.986

0.974

  1. Results obtained by the Random Forests machine learning method applied to each of the 57 prognostic cancer datasets of gene expression to predict the survival or death of the patients, sorted by cancer type and Matthews correlation coefficient. We highlighted in bold all the sufficient scores: \(MCC \ge +0.2\), and F\(_1\) score, accuracy, TPR, TNR, PPV, NPV, PR AUC, ROC AUC \(\ge 0.6\). We highlighted with \(^a\) the only two datasets for which all the binary classification metrics are insufficient: dataLeich2009 and dataMicke2011. MCC, F\(_1\) score, accuracy, TPR, TNR, PPV, and NPV confusion matrix threshold cut-off: 0.5. MCC Matthews correlation coefficient. TPR true positive rate, sensitivity. TNR true negative rate, specificity. PPV positive predictive value, precision. NPV negative predictive value. PR precision recall curve. ROC receiver operating characteristic curve. AUC area under the curve. MCC has worst value –1 and best value +1. F\(_1\) score, accuracy, TPR, TNR, PPV, NPV, PR AUC, and ROC AUC have worst value 0 and best value 1. The formulas of MCC, F\(_1\) score, accuracy, TPR, TNR, PPV, NPV, PR AUC and ROC AUC can be found in the Supplementary information. % sufficient scores: percentage of datasets where the signature achieved a sufficient score (for example, our signature obtained a sufficient accuracy score on 61.82% datasets). We report additional information about these datasets in Table 1