DISCOVERING SCIENCE THROUGH DATA SCIENCE: Scientific Discovery is the game, Data Scientists are the players

Data Science (or more specifically Data mining) has been used for a long time for business oriented applications and solutions, namely stock trading, credit scoring, and mail service optimization, among others. For these activities, a lot of data has to be processed, in order to see patterns and tendencies of individuals or singular entities (Read, 2010).

With the rising of cheap computational power and fast internet connections, Data science approaches for these activities are ideal, because they provide an integrated framework to process large amount of data, and learn from the data in a very optimized way, with a reduced interaction from human beings; after setting the attributes for creating the model, they let the machines learn, analyze, create the model, allowing analyst to evaluate the model.

From Business to Scientific Research

But a very important characteristic of Data mining approaches is the ability to extract human readable rules from the data, significant from a statistical point of view, which an expert can evaluate and use for practical purposes (Embcrechts, et al., 2005).

In science, a great part researching is analyzing great amount of data to support or refute hypothesis, and to find discernable patterns. We can take the example of pattern recognition of brain scans or electrocardiograms, to find problems or detect defect that might give us some insight on early detection of abnormalities. (Vasileios, et al., 2000)

Figure 1 – Steps of the Scientific Process (Shuttleworth, 2013)

And that’s when Data Science comes into play, giving the power and flexibility of deriving patterns from the enormous amount of data, in a fraction of the time.

Bioinformatics: Data science to Save Lives

For a long time, scientist researching potential cures and early detection technique for diseases have relied on visual identification of patterns in acquired tissue samples, analyzed through microscopes and medical devices in laboratories, which sometimes took a lot of time.

But now several enterprises and researchers are shifting from that paradigm to the Data science approach. We can name the research of the University of California, Santa Cruz (Schatz, 2015). With traditional procedures, doctors have conducted treatment of cancerous tumors depending on the part of the body it presents itself. But UCSC is working on a Cancer Genome atlas to process and cross reference tumors and is trying to find similarities among seemingly different types of cancer, to improve detection and treatment.

Figure 2 – Bioinformatics Twitter Hashtags (Bioinformatics Jobs, 2016)

We can also mention the case of pharmaceutical conglomerate Novartis, who made great advances in the field of detecting kidney disease. The company claimed that a team, resulting from a coalition with Necker Children’s Hospital-Imagine Foundation in Paris discovered a previously unnoticed gene abnormality that caused focal segmental glomerulosclerosis in just six weeks, using Big Data (Brien, 2013).

Certainly, there’s no shortage of Big Data advancements in the field of Bioinformatics, with numerous companies actively developing and using Data Science Software for research (KDnuggets, 2000).

Unveiling the Cosmos with Data Science

The vast immensity of the universe makes its study a prime field for Big Data. The amount of data produced can reach 25 Zetta-bytes a year, doubling each year, according to Moore’s law (Stephens, et al., 2015). These levels of information have been reached thanks to advances in telescope building and detectors sensitivity.

Figure 3 – Four Domains of Big Data (Stephens, et al., 2015)

But the problem is not only in storage, but also in processing. That is why projects such as GALEX and Kepler Space Telescope have enormous data and image processing frameworks, and ALMA and Square Kilometer Array projects have bigger data infrastructure planned (Andersen, 2012).

Moving Physics to Hyper Drive

The CERN has been active in media outlets with its latest discoveries, which incurs in the big problem of having extremely large sets of data to go through. But thanks to its data processing frameworks and capabilities, it has succeeded in numerous discoveries, including the Higgs boson discovery with its Large Hadron Collider and Daya Bay Reactor Neutrino Experiment, which looks to acquire better understanding neutrino, a subatomic produced by decaying radioactive elements (Prabhat, 2015)

These endeavors have been so successful, CERN is planning on expanding its Big Data analysis framework, by updating its detectors and processing units in the hopes of improving its understanding of dark matter (University of Bristol, 2016).

Data Science in the Aid of All Sciences

These are only examples of scientific players that entered the Data Science game, but any field that has large amounts of data can take advantage of big data processing and prediction models

Data Science is even present in scientific researching as a whole, with a company called Iris AI, which developed a machine learning algorithm that allows researchers to find relevant publications by inputting a text explaining the subject at hand (Frank, 2016).

Figure 4 – Data Processing approach used at CERN (Jones, 2011)

So it is only a matter of time until other disciplines adopt Data Science as an intrinsic part of the scientific process of research and discovery.

Bibliography

Andersen, R., 2012. How Big Data Is Changing Astronomy (Again). [Online]
Available at: http://www.theatlantic.com/technology/archive/2012/04/how-big-data-is-changing-astronomy-again/255917/
[Accessed May 2016].

Bioinformatics Jobs, 2016. Twitter. [Online]
Available at: https://twitter.com/bioinformaticsj
[Accessed May 2016].

Brien, T. O., 2013. Surfing the wave of big data analytics. [Online]
Available at: https://www.novartis.com/stories/discovery/surfing-wave-big-data-analytics
[Accessed May 2016].

Embcrechts, Szymanski & Sternickel, 2005. Chapter 10: Introduction to Scientific Data Mining. In: Computationally Intelligent Hybrid Systems. New York: s.n., pp. 317-365.

Frank, A., 2016. Machine Learning’s Next Trick Will Transform How Research Is Done. [Online]
Available at: http://singularityhub.com/2016/05/26/machine-learnings-next-trick-will-transform-how-research-is-done/
[Accessed May 2016].

Jones, B., 2011. Massive Computing at CERN and lessons learnt. [Online]
Available at: http://slideplayer.com/slide/6388912/
[Accessed May 2016].

KDnuggets, 2000. Bioinformatics Companies. [Online]
Available at: http://www.kdnuggets.com/companies/bioinformatics.html
[Accessed February 2016].

Prabhat, 2015. Big science problems, big data solutions. [Online]
Available at: https://www.oreilly.com/ideas/big-science-problems-big-data-solutions
[Accessed May 2016].

Read, B., 2010. Data Mining and Science?. [Online]
Available at: http://www.ercim.eu/publication/ws-proceedings/12th-EDRG/EDRG12_Re.pdf
[Accessed May 2016].

Schatz, R. D., 2015. Decoding and Defeating Cancer with Data Science. [Online]
Available at: http://www.slate.com/articles/health_and_science/ucsc2015/2015/04/decoding_and_defeating_cancer_with_data_science.html
[Accessed May 2016].

Shuttleworth, M., 2013. What is Research?. [Online]
Available at: https://explorable.com/what-is-research
[Accessed May 2016].

Stephens, et al., 2015. Big Data: Astronomical or Genomical?. [Online]
Available at: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
[Accessed May 2016].

University of Bristol, 2016. Dark Matter search enhanced by LHC’s new turbocharged ‘Brain’. [Online]
Available at: http://www.bristol.ac.uk/news/2016/may/dark-matter-search.html
[Accessed May 2016].

Vasileios, et al., 2000. Data mining in brain imaging - Abstract. [Online]
Available at: http://smm.sagepub.com/content/9/4/359.abstract
[Accessed 2016 May].

DISCOVERING SCIENCE THROUGH DATA SCIENCE

Monday, 30 May 2016

Scientific Discovery is the game, Data Scientists are the players