O brave new world that has such machines in it

Malley, James D; Malley, Karen G; Moore, Jason H

doi:10.1186/1756-0381-7-26

Editorial
Open access
Published: 17 November 2014

O brave new world that has such machines in it

James D Malley¹,
Karen G Malley² &
Jason H Moore³

BioData Mining volume 7, Article number: 26 (2014) Cite this article

1491 Accesses
1 Altmetric
Metrics details

Machine learning is a critical component of any biological data mining. Given its established advantages, there remain challenges that need to be addressed before it can be considered practical and persuasive.

Thus, not quite The Tempest (Act V, Scene One), above, but Shakespeare would still have understood: learning machines have achieved stunning success in a stunning range of areas, but they are still often—correctly—seen as strange and mysterious. They make predictions {tumor, not tumor} with ease and rapidity, but how do we understand the forecasts? How were the forecasts made, and how do they apply to this patient, with this set of symptoms and exposure factors? How does a low mean square error translate into guidance for patient treatment and care? How is any machine translated?

These questions point to an unfortunate separation between the advances of learning machines and the needs of biomedical research or patient care. We have written about the nearly invisible interest shown by computer science groups in the shared task of communication and joint application, and a disconnect between some statistical teaching practices and the needs of researchers and medical practitioners [1]. Statisticians and computer scientists need to move ahead—together—to provide methods for interpreting the results of analysis and computation. Thus, many good methods, such as random forests or penalized regression, do not transparently offer the subject-matter researcher with directly understood conclusions.

We suggest that solving this harder problem, the interpretation of models derived from learning machines and algorithms, is both fundamental and possible. We can, for example, move away from pure binary yes/no classification algorithms to probability machines [2]. These take the same data with zero/one outcomes [tumor, not tumor] and return consistent estimates of probability for the two events, doing so in a model-free context. From the same data, risk machines can then be deployed to estimate familiar endpoints as relative risk or log odds [3]. The point here is not promote any specific methods but to show that such methods do exist, that desaturate the obscurity of the black box machine and help return us to familiar terms and the language of inference.

All these arts, the simple and evolved, the practical and theory-driven, the computational and the analytic, need collaborative attention for interpretable biomedical research. Prospero, Miranda, Ariel, and perhaps even troubled and treacherous Caliban would have understood.

References

Malley JD, Moore JH: The disconnect between classical biostatistics and the biological data mining community. BioData Min. 2013, 6: 12-10.1186/1756-0381-6-12.
Article CAS PubMed PubMed Central Google Scholar
Malley JD, Kruppa J, Dasgupta A, Malley KG, Ziegler A: Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inf Med. 2012, 51: 74-81.
Article CAS PubMed Google Scholar
Dasgupta A, Szymczak S, Moore JH, Bailey-Wilson JE, Malley JD: Risk estimation using probability machines. BioData Min. 2014, 7: 2-10.1186/1756-0381-7-2.
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Center for Information Technology, The National Institutes of Health, Bethesda, MD, USA
James D Malley
Malley Research Programming, Inc, Rockville, MD, USA
Karen G Malley
Departments of Genetics and Community and Family Medicine, Institute for Quantitative Biomedical Sciences, The Geisel School of Medicine, Dartmouth College, One Medical Center Dr, Lebanon, NH, 03756, USA
Jason H Moore

Authors

James D Malley
View author publications
You can also search for this author in PubMed Google Scholar
Karen G Malley
View author publications
You can also search for this author in PubMed Google Scholar
Jason H Moore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason H Moore.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Malley, J.D., Malley, K.G. & Moore, J.H. O brave new world that has such machines in it. BioData Mining 7, 26 (2014). https://doi.org/10.1186/1756-0381-7-26

Download citation

Received: 01 October 2014
Accepted: 18 October 2014
Published: 17 November 2014
DOI: https://doi.org/10.1186/1756-0381-7-26

O brave new world that has such machines in it

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

BioData Mining

Contact us