Monday, 13 June 2016

What Data Science does, can do and will do for scientific discoveries

So far, we have proven that Data Science is not only for business, marketing and advertising, but can have practical uses for various types of fields in Science, ranging from Astronomy to Physics, and beyond. But the examples presented are just the beginning.

We have ascertained how Data Science can be useful in the pursuit of scientific discoveries, but now it is time to review the vanguard of the industry, where it is headed, what does the future holds and what might be some roadblocks to those ends.

Deep learning the future

The concept of deep learning is based of neural networks, which has been around for a long time (Roberts, 2014), but it has taken speed in the latest years, with corporations and universities investing in multi-layer neural networks destined for the most varied of uses.

Without going much further, we can mention the much mentioned project Google Deepmind, which has developed things like human-level control deep reinforcement learning and many other projects (Google Deepmind, 2011).

Figure 1 – Example of how deep learning works for face recognition (Mayer, 2015)

But maybe the most impressive and well known feat achieved by the project is the development of a deep learning program that managed to learn the complex game of Go, and defeated the top Go player, Lee Sedol, in a 5-match competition (Gibney, 2016). This is an accomplishment that shows how much Deep Learning has advanced, since experts said that a computer would never beat a human player (Cho, 2016).

That’s why deep learning is being experimented on cell classification (Chen, et al., 2016), chemical mappings, x-ray scattering image classifications and many more (Brookhaven National Laboratory, 2015). Even major universities and research centers are investing in deep learning, like NERSC and Berkeley joining forces to test the capacity of the technology with health and medicine breakthroughs (Kincade, 2015).

Data Science as an aid of human knowledge broadening

With science advancing in giant leaps in several fields, and instruments getting more powerful and sophisticated, the amount of data to process is getting bigger and bigger. That is where data science comes into the scene.

The detection of gravitational waves is one of the biggest headlines in scientific discovery in the past months (Overbye, 2016), confirming a 100-year old Einstein theory. But the fact is that the Laser Interferometer Gravitational-Wave Observatory received a particular strong signal that managed to confirm the theory, a feat that proved difficult because of the difficulty of discerning signals from noise. That is how Data Science could help make this separation of signals from noise easier by finding underlying evidence by processing the outstanding amount of data produced by their equipment (Yuan, 2016).

Figure 2 – Consistent signals detected in LIGO sites located 2000 miles apart (Circus Bazaar, 2016)

And it is worth mentioning how Data Science could help Astronomy. As telescopes get more complex and sensitive to light, the amount of data gathered is getting larger and unmanageable. That is the reason several projects are using Data mining to recognize celestial bodies, to try to keep up with the data production (Galaxy Zoo, 2016).

But it is not a paved road ahead

As sciences advances, so does the fear that humans will be replaced by robots. With predictions of computers with advanced neural networks replacing entry level lawyers (Kravets, 2015), and advances made with IBM Watson learning case histories of hospitals to learn what diagnoses and treatments to recommend (Cohn, 2013), there is a concern about how the advancements of Data Science are going to affect the rest of the population.

Figure 3 – Example of IBM Watson’s healthcare capabilities (Saxena, 2012)

Also, for data science to thrive, it needs data. And because scientific papers, research and publications are so difficult or expensive to get a hold of (The Cost of Knowledge, 2012), sometimes the raw data or sources necessary to discover something novel is somewhat of an utopia; with publishing companies charging enormous amounts to get a glimpse of their material (Elbakyan, 2015).

Data Science’s has yet no bounds

While there are still titanic challenges in the sciences that Data Science is yet to conquest, there are breakthroughs made by the day, trying to overcome shortcomings and achieve a better understanding in several fields of science (Prabhat, 2015).

So the future looks bright for Data Science, showing significant increase of demand of people expert in the field (Islam, 2015), a number of companies getting into the game and being a participant an active participant of scientific discoveries. It is to be seen how bright it can be (NeRSC, 2015).

Bibliography

Brookhaven National Laboratory, 2015. Deep Learning for Analysis of Materials Science Data. [Online]
Available at: https://www.bnl.gov/compsci/projects/deep-learning.php
[Accessed June 2016].

Chen, C. L. et al., 2016. Deep Learning in Label-free Cell Classification. [Online]
Available at: http://www.nature.com/articles/srep21471
[Accessed June 2016].

Cho, A., 2016. Computer that mimics human brain beats professional at game of Go. [Online]
Available at: http://www.sciencemag.org/news/2016/01/huge-leap-forward-computer-mimics-human-brain-beats-professional-game-go
[Accessed June 2016].

Circus Bazaar, 2016. On Einsten's Gravitational Waves: The Paper. [Online]
Available at: http://www.circusbazaar.com/on-einsteins-gravitational-waves-the-paper/
[Accessed June 2016].

Cohn, J., 2013. The Robot Will See You Now. The atlantic, March.Issue March 2013 Issue.

Elbakyan, A., 2015. Sci-Hub Reply. [Online]
Available at: https://torrentfreak.com/images/sci-hub-reply.pdf
[Accessed June 2016].

Galaxy Zoo, 2016. Galaxy Zoo. [Online]
Available at: https://www.galaxyzoo.org/
[Accessed June 2016].

Gibney, E., 2016. Google AI algorithm masters ancient game of Go. [Online]
Available at: http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234
[Accessed June 2016].

Google Deepmind, 2011. Publications. [Online]
Available at: https://deepmind.com/publications
[Accessed June 2016].

Islam, M., 2015. Future of Data Science and Data Scientists. [Online]
Available at: https://www.linkedin.com/pulse/future-data-science-scientist-mohammad-islam
[Accessed June 2016].

Kincade, K., 2015. NERSC, Berkeley Lab Explore Frontiers of Deep Learning for Science. [Online]
Available at: http://www.nersc.gov/news-publications/nersc-news/science-news/2015/nersc-berkeley-lab-explore-frontiers-of-deep-learning-for-science/
[Accessed June 2016].

Kravets, D., 2015. Law firm bosses envision Watson-type computers replacing young lawyers. [Online]
Available at: http://arstechnica.com/tech-policy/2015/10/law-firm-bosses-envision-watson-type-computers-replacing-young-lawyers/
[Accessed June 2016].

Mayer, R., 2015. Deep Learning Smarts Up Your Smart Phone. [Online]
Available at: http://www.amax.com/blog/wp-content/uploads/2015/12/blog_deeplearning3.jpg
[Accessed June 2016].

NeRSC, 2015. Berkeley Lab Climate Software Honored for Pattern Recognition Advances. [Online]
Available at: https://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2015/berkeley-lab-climate-software-honored-for-pattern-recognition-advances/
[Accessed June 2016].

Overbye, D., 2016. Gravitational Waves Detected, Confirming Einstein’s Theory. [Online]
Available at: http://mobile.nytimes.com/2016/02/12/science/ligo-gravitational-waves-black-holes-einstein.html
[Accessed June 2016].

Prabhat, 2015. Big science problems, big data solutions. [Online]
Available at: https://www.oreilly.com/ideas/big-science-problems-big-data-solutions
[Accessed June 2016].

Roberts, E., 2014. Neural Networks History: The The 1940's to the 1970's. [Online]
Available at: https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html
[Accessed June 2016].

Saxena, M., 2012. Putting IBM Watson to Work. [Online]
Available at: http://www.slideshare.net/manojsaxena2/putting-ibm-watson-to-work-saxena
[Accessed June 2016].

The Cost of Knowledge, 2012. Statement of Purpose. [Online]
Available at: https://gowers.files.wordpress.com/2012/02/elsevierstatementfinal.pdf
[Accessed June 2016].

Yuan, M., 2016. Gravitational Wave Ushers in a New Wave of Data Science. [Online]
Available at: https://austinstartups.com/gravitational-wave-ushers-in-a-new-wave-of-data-science-928d620d727#.8x5fq0g9i
[Accessed June 2016].

Monday, 6 June 2016

Pushing The Boundaries Of Discoveries With Data Science

So far we covered what data science is, how it is helping in scientific discovery and who the active players are. But now it is important to understand what are the latest tools available to reach diverse Data Science Goals, the limitations these tools currently face and the availability of better solutions.

The field of Data Science attempts to create value from information by passing it through four principal stages, namely data preparation, data analysis, data reflections and data dissemination. This process of Data Science has opened the doors for accelerating discoveries but, as the saying goes “nothing comes easy in life.”

Figure 1 DataScience Workflow(Guo, 2016)

With the world getting connected more and more every day, information is growing vast and complex. Alex Szalay, an astrophysicist at Johns Hopkins University, says “How to make sense of all these data? People should be worried about how we train the next generation, not just of scientists, but people in government and industry” (The Economist, 2010). The information today demands companies and research institutes to come up with cost-effective and cutting edge technologies for enhanced insight, and decision making.

Let’s have a look at a few of the challenges information brings in the field of data science, and a few of the techniques researchers use. The challenges of data science are in terms of 5 V’s namely Volume, Variety, Velocity, Veracity and Value.

Figure 2 The 5V's of Big Data (Sweetlysocial, 2016)

Cluster processing to the rescue

To start with, the ever expanding volume and variety of information demands solid infrastructure in which we can store large scale datasets for analyzing them. Hadoop is one of the tools that is emerging as an efficient framework to help with the storage of big data. It is an open source platform which makes use of Google’s Map reduce program for processing large datasets at a granular level.

Not only this, Hadoop also provides a Hadoop distributed file system that allows parallel processing by spreading the data over different nodes. In May 2009, Hadoop made a world record for sorting a Petabyte of data in 16.25 hours and 1 TB of data in 62 seconds. (Rosenberg, 2009). Hadoop has enormous potential for making medical discoveries and hence is being used by several large genomics and medical projects. However, a bigger challenge than storing data is the challenge of processing data in a timely manner. Hadoop is constrained by its Disk IO and requirement of advanced programming skills by the developers.

To deal with the velocity and veracity of information, Apache Spark comes to our rescue. Spark can process data 100X faster than Map Reduce. In addition, Spark provides in-memory data processing and an in-built machine learning library. The in-memory data processing allows it to avoid in and out disk operations and achieve greater speed. The inbuilt machine learning library is composed of several machine learning algorithms that researchers can use with ease to handle complex structured and unstructured data.

A neural way of discovering

So far we saw how big data can be stored and processed faster. But, an important question that still remains is whether the insights generated from the analysis of data is useful or not. How do we identify valuable information from the dumps of Zettabyte data? To work out this challenge, a lot of machine learning techniques and visualization techniques are being developed. One of the hottest trend in machine learning is the deep learning algorithm. Deep learning is also known as artificial neural network. It combines the simple features into a complex feature layer by layer to extract high level abstract data representation.

Figure 3Deep Learning (Toy, 2016)

Notably, a new built computer model has detected genetic determinants of autism, colon cancer and spinal muscular atrophy in large areas of the genome that previously could not be identified. It used DNA sequences from five autistic patients and identified 39 new genes in autism spectrum disorder. This is claimed to be a 40 percent increase from roughly 100 previously known autism genes. Brendan Frey, a CIFAR senior fellow of the University of Toronto, says “My participation in the Neural Computation & Adaptive Perception program enabled my group to have access to the best techniques in deep learning.” (Cifar.ca, 2016)

Open source is the way to go

The above mentioned techniques are not the only techniques in solving the challenges. Companies and Universities alike are developing free, libre open source solutions to broaden the possibilities of Data Science and helping technology advance in a timely manner. There are tons of other open source projects such as Google’s Tensor Flow, Amazon Machine learning, Scikit-learn, H2O and etc.

The journey of data science has been thrilling and the road ahead looks even more exciting. We will be back with more interesting content on data science so stay tuned!

References

The Economist. (2010). Data, data everywhere. [online] Available at: http://www.economist.com/node/15557443 [Accessed 6 Jun. 2016].

Ieeexplore.ieee.org. (2016). IEEE Xplore Full-Text PDF:. [online] Available at: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7067026&tag=1 [Accessed 6 Jun. 2016].

Guo, P. (2016). Data Science Workflow: Overview and Challenges. [online] Cacm.acm.org. Available at: http://cacm.acm.org/blogs/blog-cacm/169199-data-science-workflow-overview-and-challenges/fulltext [Accessed 6 Jun. 2016].

Rosenberg, D. (2009). Hadoop breaks data-sorting world records. [online] CNET. Available at: http://www.cnet.com/au/news/hadoop-breaks-data-sorting-world-records/ [Accessed 6 Jun. 2016].

Cifar.ca. (2016). Deep learning finds autism, cancer mutations in unexplored regions of the genome : CIFAR. [online] Available at: https://www.cifar.ca/assets/deep-learning-finds-autism-cancer-mutations-in-unexplored-regions-of-the-genome/ [Accessed 6 Jun. 2016].

Toy, J. (2016). opening up deep learning for everyone. [online] Jtoy.net. Available at: http://www.jtoy.net/2016/02/14/opening-up-deep-learning-for-everyone.html [Accessed 6 Jun. 2016].

Google.com.au. (2016). Redirect Notice. [online] Available at: https://www.google.com.au/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=&url=http%3A%2F%2Fsweetlysocial.net%2Fbig-data-better-marketing%2F&psig=AFQjCNFz3aw0oHgWUtn7rISU68AsVqSchw&ust=1465280114674811 [Accessed 6 Jun. 2016].

Anon, (2016). [online] Available at: http://cra.org/ccc/wp-content/uploads/sites/2/2015/05/bigdatawhitepaper.pdf [Accessed 6 Jun. 2016].

Sweetlysocial.net. (2014). Big Data, Better Marketing | Sweetly Social. [online] Available at: http://sweetlysocial.net/big-data-better-marketing/ [Accessed 6 Jun. 2016].

Kdnuggets.com. (2016). 5 Best Machine Learning APIs for Data Science. [online] Available at: http://www.kdnuggets.com/2015/11/machine-learning-apis-data-science.html [Accessed 6 Jun. 2016].

Monday, 30 May 2016

Scientific Discovery is the game, Data Scientists are the players

Data Science (or more specifically Data mining) has been used for a long time for business oriented applications and solutions, namely stock trading, credit scoring, and mail service optimization, among others. For these activities, a lot of data has to be processed, in order to see patterns and tendencies of individuals or singular entities (Read, 2010).

With the rising of cheap computational power and fast internet connections, Data science approaches for these activities are ideal, because they provide an integrated framework to process large amount of data, and learn from the data in a very optimized way, with a reduced interaction from human beings; after setting the attributes for creating the model, they let the machines learn, analyze, create the model, allowing analyst to evaluate the model.

From Business to Scientific Research

But a very important characteristic of Data mining approaches is the ability to extract human readable rules from the data, significant from a statistical point of view, which an expert can evaluate and use for practical purposes (Embcrechts, et al., 2005).

In science, a great part researching is analyzing great amount of data to support or refute hypothesis, and to find discernable patterns. We can take the example of pattern recognition of brain scans or electrocardiograms, to find problems or detect defect that might give us some insight on early detection of abnormalities. (Vasileios, et al., 2000)

Figure 1 – Steps of the Scientific Process (Shuttleworth, 2013)

And that’s when Data Science comes into play, giving the power and flexibility of deriving patterns from the enormous amount of data, in a fraction of the time.

Bioinformatics: Data science to Save Lives

For a long time, scientist researching potential cures and early detection technique for diseases have relied on visual identification of patterns in acquired tissue samples, analyzed through microscopes and medical devices in laboratories, which sometimes took a lot of time.

But now several enterprises and researchers are shifting from that paradigm to the Data science approach. We can name the research of the University of California, Santa Cruz (Schatz, 2015). With traditional procedures, doctors have conducted treatment of cancerous tumors depending on the part of the body it presents itself. But UCSC is working on a Cancer Genome atlas to process and cross reference tumors and is trying to find similarities among seemingly different types of cancer, to improve detection and treatment.

Figure 2 – Bioinformatics Twitter Hashtags (Bioinformatics Jobs, 2016)

We can also mention the case of pharmaceutical conglomerate Novartis, who made great advances in the field of detecting kidney disease. The company claimed that a team, resulting from a coalition with Necker Children’s Hospital-Imagine Foundation in Paris discovered a previously unnoticed gene abnormality that caused focal segmental glomerulosclerosis in just six weeks, using Big Data (Brien, 2013).

Certainly, there’s no shortage of Big Data advancements in the field of Bioinformatics, with numerous companies actively developing and using Data Science Software for research (KDnuggets, 2000).

Unveiling the Cosmos with Data Science

The vast immensity of the universe makes its study a prime field for Big Data. The amount of data produced can reach 25 Zetta-bytes a year, doubling each year, according to Moore’s law (Stephens, et al., 2015). These levels of information have been reached thanks to advances in telescope building and detectors sensitivity.

Figure 3 – Four Domains of Big Data (Stephens, et al., 2015)

But the problem is not only in storage, but also in processing. That is why projects such as GALEX and Kepler Space Telescope have enormous data and image processing frameworks, and ALMA and Square Kilometer Array projects have bigger data infrastructure planned (Andersen, 2012).

Moving Physics to Hyper Drive

The CERN has been active in media outlets with its latest discoveries, which incurs in the big problem of having extremely large sets of data to go through. But thanks to its data processing frameworks and capabilities, it has succeeded in numerous discoveries, including the Higgs boson discovery with its Large Hadron Collider and Daya Bay Reactor Neutrino Experiment, which looks to acquire better understanding neutrino, a subatomic produced by decaying radioactive elements (Prabhat, 2015)

These endeavors have been so successful, CERN is planning on expanding its Big Data analysis framework, by updating its detectors and processing units in the hopes of improving its understanding of dark matter (University of Bristol, 2016).

Data Science in the Aid of All Sciences

These are only examples of scientific players that entered the Data Science game, but any field that has large amounts of data can take advantage of big data processing and prediction models

Data Science is even present in scientific researching as a whole, with a company called Iris AI, which developed a machine learning algorithm that allows researchers to find relevant publications by inputting a text explaining the subject at hand (Frank, 2016).

Figure 4 – Data Processing approach used at CERN (Jones, 2011)

So it is only a matter of time until other disciplines adopt Data Science as an intrinsic part of the scientific process of research and discovery.

Bibliography

Andersen, R., 2012. How Big Data Is Changing Astronomy (Again). [Online]
Available at: http://www.theatlantic.com/technology/archive/2012/04/how-big-data-is-changing-astronomy-again/255917/
[Accessed May 2016].

Bioinformatics Jobs, 2016. Twitter. [Online]
Available at: https://twitter.com/bioinformaticsj
[Accessed May 2016].

Brien, T. O., 2013. Surfing the wave of big data analytics. [Online]
Available at: https://www.novartis.com/stories/discovery/surfing-wave-big-data-analytics
[Accessed May 2016].

Embcrechts, Szymanski & Sternickel, 2005. Chapter 10: Introduction to Scientific Data Mining. In: Computationally Intelligent Hybrid Systems. New York: s.n., pp. 317-365.

Frank, A., 2016. Machine Learning’s Next Trick Will Transform How Research Is Done. [Online]
Available at: http://singularityhub.com/2016/05/26/machine-learnings-next-trick-will-transform-how-research-is-done/
[Accessed May 2016].

Jones, B., 2011. Massive Computing at CERN and lessons learnt. [Online]
Available at: http://slideplayer.com/slide/6388912/
[Accessed May 2016].

KDnuggets, 2000. Bioinformatics Companies. [Online]
Available at: http://www.kdnuggets.com/companies/bioinformatics.html
[Accessed February 2016].

Prabhat, 2015. Big science problems, big data solutions. [Online]
Available at: https://www.oreilly.com/ideas/big-science-problems-big-data-solutions
[Accessed May 2016].

Read, B., 2010. Data Mining and Science?. [Online]
Available at: http://www.ercim.eu/publication/ws-proceedings/12th-EDRG/EDRG12_Re.pdf
[Accessed May 2016].

Schatz, R. D., 2015. Decoding and Defeating Cancer with Data Science. [Online]
Available at: http://www.slate.com/articles/health_and_science/ucsc2015/2015/04/decoding_and_defeating_cancer_with_data_science.html
[Accessed May 2016].

Shuttleworth, M., 2013. What is Research?. [Online]
Available at: https://explorable.com/what-is-research
[Accessed May 2016].

Stephens, et al., 2015. Big Data: Astronomical or Genomical?. [Online]
Available at: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
[Accessed May 2016].

University of Bristol, 2016. Dark Matter search enhanced by LHC’s new turbocharged ‘Brain’. [Online]
Available at: http://www.bristol.ac.uk/news/2016/may/dark-matter-search.html
[Accessed May 2016].

Vasileios, et al., 2000. Data mining in brain imaging - Abstract. [Online]
Available at: http://smm.sagepub.com/content/9/4/359.abstract
[Accessed 2016 May].

Monday, 23 May 2016

Data Science And Scientific Discovery

We live in a world where every day there is a new challenge. As we solve these challenges, we create volumes of information that change the way we perceive the world around us. This systematic method of research, directed at understanding every aspect of our perceivable universe based on evidence, is what we call as science.

THE ROLE OF SCIENCE IN HUMAN HISTORY

The history of science claims a timespan from ancient history to present time. During this period, there has been an emergence of numerous scientific revolutionaries, methods and scientific discoveries. According to Britannica,

“A new view of nature emerged, replacing the Greek view that had dominated science for almost 2,000 years. Science became an autonomous discipline, distinct from both philosophy and technology and came to be regarded as having utilitarian goals”- (Encyclopedia Britannica, 2016)

Over the centuries, the domain of science has expanded exponentially. To a great extent, it has helped us to answer what is happening, why it is happening and what will happen in a wide array of fields such as Astronomy, Biology, Ecology, Genetics, Physics and many more.

So far, scientific research has followed the traditional approach of deductive reasoning. In the deductive process, a hypothesis is created and then experiments are carried out to test its validity. Unfortunately, with this approach, it could be years before sufficient data is gathered from tests to support the claim and back it up with resounding and definitive results.

Figure 1 - Data Intensive Science (Slideshare.net, 2016)

However, this approach has been changing, and at an exponential speed thanks to the advances in knowledge and technology. As an evidence of the speed of growth of the field, today we have access to 2.5 quintillion bytes (Storagenewsletter.com, 2016) of data every day and cheap computational processing power at hand. This renewed scenario has made new techniques of research possible, which were not possible earlier, due to technological and. The field that combines data-oriented techniques is known as “Data Science”.

DATA SCIENCE: A SHIFT OF PARADIGM

Data science is a multidisciplinary field that combines the power of machine learning, artificial intelligence, data mining, statistics, applied mathematics, and visualization. The field also focuses on providing the ability to perform both deductive and inductive reasoning. While the former is hypothesis driven, the later focuses on refining existing hypothesis or generating new hypothesis by spotting interesting patterns available in huge heterogeneous and unstructured data. This approach of data science is helping the scientific community to accelerate the rate of scientific discoveries.

Figure 2 - TimeLine data science (→, 2015)

DATA SCIENCE IN THE AID OF SCIENTIFIC DISCOVERIES

To give an example, In the field of high-energy particle physics, there are instruments such large hadron collider (O'Reilly Media, 2015), that are used to break open atoms and examine its constituents. This process produces exabytes of data, which makes its analysis dependent on powerful supercomputers and advanced data science techniques. These techniques have recently led to the discovery of Higgs-Boson particles and this is considered as a landmark achievement in the history of particle physics.

Another example worthy to be mentioned is from the field of genetics, where the researchers, in order to understand the relationship between complex diseases and genetic effects (Feero, Guttmacher and Manolio, 2010), are also using data science techniques and so far they have been able to identify connections between 2000 genes and 300 common human diseases traits.

Figure 1Large Hydron Collider (Apod.nasa.gov, 2016) and Genome (IFLScience, 2015)

WHERE DATA SCIENCE IS GOING IS YET TO BE TOLD

The potential of data science is vast and inspiring. Watch this space for more information on data science. During the coming weeks, a deeper study will be performed, starting by providing details on the active users of data science for scientific discovery, the challenges they face and the ways they solve the problems.

Tags: data science, big data, scientific discovery, research, data processing, data analytics, scientific process, inductive process.

Bibliography

Encyclopedia Britannica. (2016). physical science | Definition, History, & Topics. [online] Available at: http://www.britannica.com/science/physical-science [Accessed 22 May 2016].

Storagenewsletter.com. (2016). StorageNewsletter » Every Day We Create 2.5 Quintillion Bytes of Data. [online] Available at: http://www.storagenewsletter.com/rubriques/market-reportsresearch/ibm-cmo-study/ [Accessed 22 May 2016].

O'Reilly Media. (2015). Big science problems, big data solutions. [online] Available at: https://www.oreilly.com/ideas/big-science-problems-big-data-solutions [Accessed 22 May 2016].

Feero, W., Guttmacher, A. and Manolio, T. (2010). Genomewide Association Studies and Assessment of the Risk of Disease. New England Journal of Medicine, 363(2), pp.166-176.

Anon, (2016). [online] Available at: http://renci.org/wp-content/uploads/2015/11/SCi-Discovery-BigData-FINAL-11.23.15.pdf [Accessed 22 May 2016].

Anon, (2016). [online] Available at: https://www.boozallen.com/content/dam/boozallen/documents/2015/12/2015-FIeld-Guide-To-Data-Science.pdf [Accessed 23 May 2016].

→, V. (2015). History of Data Science (Infographic). [online] What's The Big Data?. Available at: https://whatsthebigdata.com/2015/02/17/history-of-data-science-infographic/ [Accessed 23 May 2016].

Apod.nasa.gov. (2016). APOD: 2011 December 18 - Hints of Higgs from the Large Hadron Collider. [online] Available at: http://apod.nasa.gov/apod/ap111218.html [Accessed 23 May 2016].

IFLScience. (2015). Entire Human Genome Can Now Be Sequenced For Just $1,000. [online] Available at: http://www.iflscience.com/health-and-medicine/entire-human-genome-can-now-be-read-1000 [Accessed 23 May 2016].

Slideshare.net. (2016). The fourth paradigm: data intensive scientific discovery - Jisc Digif…. [online] Available at: http://www.slideshare.net/JISC/the-fourth-paradigm-data-intensive-scientific-discovery-jisc-digifest-2016/4 [Accessed 23 May 2016].