Skip to main content

Table 1 List of gene expression datasets employed in our analysis, sorted by cancer type

From: Towards a potential pan-cancer prognostic signature for gene expression based on probesets and ensemble machine learning

 

dataset name

GEO code

cancer type

neg#

pos#

samples#

neg%

pos%

1

dataHeaton2011

GSE33371

adrenocortical cancer

16

7

23

69.57

30.43

2

dataReister2012

GSE31684

bladder cancer

38

27

65

58.46

41.54

3

dataDedeurwaerder2011

GSE20711

breast cancer

63

25

88

71.59

28.41

4

dataDesmedt2007

GSE7390

breast cancer

141

56

197

71.57

28.43

5

dataHatzis2009

GSE25066

breast cancer

152

45

197

77.16

22.84

6

dataHuang2014

GSE48390

breast cancer

11

69

80

13.75

86.25

7

dataIvshina2006

GSE4922

breast cancer

160

89

249

64.26

35.74

8

dataJezequel2015

GSE58812

breast cancer

29

77

106

27.36

72.64

9

dataKarn2011

GSE31519

breast cancer

22

41

63

34.92

65.08

10

dataKim2020

GSE135565

breast cancer

7

76

83

8.43

91.57

11

dataLin2009

GSE19697

breast cancer

6

17

23

26.09

73.91

12

dataLoi2008

GSE9195

breast cancer

63

13

76

82.89

17.11

13

dataMetzgerFilho2016

GSE88770

breast cancer

19

97

116

16.38

83.62

14

dataMiller2013

GSE45255

breast cancer

116

18

134

86.57

13.43

15

dataSabatier2010

GSE21653

breast cancer

168

83

251

66.93

33.07

16

dataSchmidt2008

GSE11121

breast cancer

154

45

199

77.39

22.61

17

dataSinn2019

GSE124647

breast cancer

43

96

139

30.94

69.06

18

dataWang2010

GSE19615

breast cancer

14

100

114

12.28

87.72

19

dataYenamandra2015

GSE61304

breast cancer

38

20

58

65.52

34.48

20

dataBeauchamp2014

GSE38832

colorectal cancer

28

93

121

23.14

76.86

21

dataChen2020

GSE161158

colorectal cancer

145

59

204

71.08

28.92

22

dataDelRoi2017

GSE72970

colorectal cancer

32

91

123

26.02

73.98

23

dataGotoh2018

GSE92921

colorectal cancer

53

5

58

91.38

8.62

24

dataMarisa2013

GSE39582

colorectal cancer

384

194

578

66.44

33.56

25

dataShinto2020

GSE143985

colorectal cancer

75

15

90

83.33

16.67

26

dataSieber2010

GSE14333

colorectal cancer

50

176

226

22.12

77.88

27

dataSmith2009a

GSE17536

colorectal cancer

73

103

176

41.48

58.52

28

dataSmith2009b

GSE17537

colorectal cancer

20

34

54

37.04

62.96

29

dataStaub2009

GSE12945

colorectal cancer

12

49

61

19.67

80.33

30

dataHerold2011

GSE22762

leukemia

26

17

43

60.47

39.53

31

dataHerold2013

GSE37642

leukemia

307

109

416

73.80

26.20

32

dataMetzeler2018

GSE12417

leukemia

103

59

162

63.58

36.42

33

dataSpivak2014

GSE47018

leukemia

7

13

20

35.00

65.00

34

dataBild2005

GSE3141

lung cancer

57

53

110

51.82

48.18

35

dataBotling2012

GSE37745

lung cancer

144

51

195

73.85

26.15

36

dataHeiskanen2015

GSE68465

lung cancer

236

207

443

53.27

46.73

37

dataKohno2011

GSE31210

lung cancer

35

191

226

15.49

84.51

38

dataMicke2011

GSE28571

lung cancer

52

47

99

52.53

47.47

39

dataPhilipsen2010

GSE19188

lung cancer

49

32

81

60.49

39.51

40

dataPintilie2013

GSE50081

lung cancer

75

105

180

41.67

58.33

41

dataPotti2006

GSE3593

lung cancer

54

143

197

27.41

72.59

42

dataRousseaux2013

GSE30219

lung cancer

199

93

292

68.15

31.85

43

dataSon2007

GSE8894

lung cancer

68

69

137

49.64

50.36

44

dataTsao2010

GSE14814

lung cancer

60

72

132

45.45

54.55

45

dataXie2011

GSE29013

lung cancer

18

36

54

33.33

66.67

46

dataZChen2020

GSE157011

lung cancer

219

264

483

45.34

54.66

47

dataIqbal2015

GSE58445

lymphoma

76

50

126

60.32

39.68

48

dataKawaguchi2012

GSE34771

lymphoma

23

10

33

69.70

30.30

49

dataLeich2009

GSE16131

lymphoma

91

88

179

50.84

49.16

50

dataLenz2008

GSE10846

lymphoma

165

249

414

39.86

60.14

51

dataVanLoo2009

GSE7788

lymphoma

6

9

15

40.00

60.00

52

dataMulligan2007

GSE9782

multiple myeloma

103

160

263

39.16

60.84

53

dataShi2010

GSE24080

multiple myeloma

78

480

558

13.98

86.02

54

dataHiyama2009

GSE16237

neuroblastoma

11

39

50

22.00

78.00

55

dataUehara2015

GSE65986

ovarian cancer

6

48

54

11.11

88.89

56

dataBogunovic2009

GSE19234

skin cancer

20

23

43

46.51

53.49

57

dataPasini2021

GSE38749

stomach cancer

9

5

14

64.29

35.71

 

average

  

77.70

79.68

157.39

48.29

51.71

 

median

  

53

56

121

49.64

50.36

 

minimum

  

6

5

14

8.43

8.62

 

maximum

  

384

480

578

91.38

91.57

  1. All these datasets are based on the GPL96, GPL97, or GPL570 Affymetrix platforms and were downloaded from Gene Expression Omnibus (GEO) in April and May 2021.Positive sample: survived patient diagnosed with cancer. Negative sample deceased patient diagnosed with cancer. pos# number of positive samples in the dataset. neg# number of negative samples in the dataset. pos% percentage of positive samples in the dataset. neg% percentage of negative samples in the dataset. These prognostic datasets refer to 12 different cancer types: 17 breast cancer datasets, 13 lung cancer datasets, 10 colorectal cancer datasets, 5 lymphoma datasets, 4 leukemia datasets, 2 multiple myeloma datasets, 1 dataset for adrenocortical cancer, bladder cancer, neuroblastoma, ovarian cancer, skin cancer, and stomach cancer