How best can we retrieve value from the rich streams of data in our profession, and introduce a solid, systematic process for analyzing that data? Here Kayser et al. describe such a process from the perspective of data science experts at Ernst & Young, offering a model that “aims to structure and systematize exploratory analytics approaches.” After discussing the building blocks for value creation, they suggest a thorough process of developing analytics approaches to data analytics. They conclude that “[t]he process as described in this work [effectively] guides personnel through analytics projects and illustrates the differences to known IT management approaches.”
The development of data science: Implications for education, employment, research, and the data revolution for sustainable development
In this 2018 paper by Murtagh and Devlin, a historical and professional perspective on data science and how collaborative work across multiple disciplines is increasingly common to data science. This “convergence and bridging of disciplines” strengthens methodology transfer and collaborative effort, and the integration of data and analytics guides approaches to data management. But education, research, and application challenges still await data scientists. The takeaway for the authors is that “the importance is noted of how data science builds collaboratively on other domains, potentially with innovative methodologies and practice,”
Scientists everywhere nod to the power of big data but are still left to develop tools to manage it. This is just as true in the field of agriculture, where practitioners of precision agriculture are still developing tools to do their work better. Leroux et al. have been developing their own solution, GeoFIS, to better handle geolocalized data visualization and analysis. In this 2018 paper, they use three case studies to show off how GeoFIS visualizes data, processes it, and incorporates associated data (metadata and industry knowledge) for improved agricultural outcomes. They conclude that the software fills a significant gap while also promoting the adoption of precision agriculture practices.
In this 2017 paper published in the Data Science Journal, University of Oxford’s Louise Bezuidenhout makes a case for how local challenges with laboratory equipment, research speeds, and design principles hinder adoption of open data policies in resource-strapped countries. Noting that “openness of data online is a global priority” in research, Bezuidenhout uses interviews in various African countries and corresponding research to draw conclusions that many in high-income countries may not. The main conclusion: “Without careful and sensitive attention to [the issues stated in the paper], it is likely that [low- and middle-income country] scholars will continue to exclude themselves from opportunities to share data, thus missing out on improved visibility online.”
In this educational journal article published in PLoS Computational Biology, Cole and Moore of the University of Pennsylvania’s Institute for Biomedical Informatics offer 11 tips for health informatics researchers and practitioners to embrace in improving reproducibility, knowledge sharing, and costs: adopt cloud computing. The authors compare more traditional “in-house enterprise compute systems such as high-performance computing (HPC) clusters” located in academic institutions with more agile cloud computing installations, showing various ways researchers can benefit from building biomedical informatics workflows on the cloud. After sharing their tips, they conclude that “[c]loud computing offers the potential to completely transform biomedical computing by fundamentally shifting computing from local hardware and software to on-demand use of virtualized infrastructure in an environment which is accessible to all other researchers.”
Welcome to Jupyter: Improving collaboration and reproduction in psychological research by using a notebook system
Jupyter Notebook, an open-source interactive web application for the data science and scientific computing community (and with some of the features of an electronic laboratory notebook), has been publicly available since 2015, helping scientists make computational records of their research. In this 2018 tutorial article by Friedrich-Schiller-Universität Jena’s Phillipp Sprengholz, the installation procedures and features are presented, particularly in the context of aiding psychological researchers with their efforts in making research more reproducible and shareable.
Developing a file system structure to solve healthcare big data storage and archiving problems using a distributed file system
There’s been plenty of talk about big data management over the past few years, particularly in the domain of software-based management of said data. But what of the IT infrastructure, particularly in the world of heathcare, where file size and number continue to grow? Ergüzen and Ünver describe in this 2018 paper published in Applied Sciences how they researched and developed a modern file system structure that handles the intricacies of big data in healthcare for Kırıkkale University. After discussing big data problems and common architectures, the duo lay out the various puzzle pieces that make up their file system, reporting system performance “97% better than the NoSQL system, 80% better than the RDBMS, and 74% better than the operating system” via improvements in read-write performance, robustness, load balancing, integration, security, and scalability.
In this 2018 paper published in the International Journal of Interactive Multimedia and Artificial Intelligence , Baldominos et al. present DataCare, a scalable healthcare data management solution built on a big data architecture to improve healthcare performance, including patient outcomes. Designed to provide “a complete architecture to retrieve data from sensors installed in the healthcare center, process and analyze it, and finally obtain relevant information, which is displayed in a user-friendly dashboard,” the researchers explain the architecture and how it was evaluated in a real-life facility in Madrid, Spain. They also explain how key performance indicators are affected and how the system could be improved in the future.
Application of text analytics to extract and analyze material–application pairs from a large scientific corpus
Text analytics is a data analysis technique that is used in several industries to develop new insights, make new discoveries, and improve operations. This should be applicable to materials scientists and informaticists also, say Kalathil et al. in this 2018 paper published in Frontiers in Research Metrics and Analytics. Using a coclustering text analysis technique and custom tools, the researchers demonstrate how others can “better understand how specific components or materials are involved in a given technology or research stream, thereby increasing their potential to create new inventions or discover new scientific findings.” In their example, they reviewed nearly 438,000 titles and abstracts to examine 16 materials, allowing them to “associate individual materials with specific clean energy applications, evaluate the importance of materials to specific applications, and assess their importance to clean energy overall.”
What is “information management”? How is it used in the context of research papers across a wide variety of industries and scientific disciplines? How do the definitions vary, and can an improved definition be created? Ladislav Buřita of the University of Defense in Brno attempts to answer those questions and more in this 2018 paper published in the Journal of Systems Integration.
A systematic framework for data management and integration in a continuous pharmaceutical manufacturing processing line
Cao et al. describe their design
and methodology used in constructing a system of tighter data integration for pharmaceutical research and manufacturing in this 2018 paper published in Process. Recognizing the “integration of data in a consistent, organized, and reliable manner is a big challenge for the pharmaceutical industry,” the authors developed an ontological information structure relying on the ANSI/ISA-88 batch control standard, process control systems, a content management systems, a purpose-built electronic laboratory notebook, and cloud services, among other aspects. The authors conclude, after describing two use cases, that “data from different process levels and distributed locations can be integrated and contextualized with meaningful information” with the help of their information structure, allowing “industrial practitioners to better monitor and control the process, identify risk, and mitigate process failures.”
In this brief education article published in PLOS Computational Biology, Barone et al. present the results of a survey of funded National Science Foundation (NSF) Biological Sciences Directorate principal investigators and how/if their computational needs were being met. Citing several other past surveys and reports, the authors describe the state of cyberinfrastructure needs as they understood them before their survey. Then they present their results. “Training on integration of multiple data types (89%), on data management and metadata (78%), and on scaling analysis to cloud/HPC (71%) were the three greatest unmet needs,” they conclude, also noting that while hardware isn’t a bottleneck, a “growing gap between the accumulation of big data and researchers’ knowledge about how to use it effectively” is concerning.
What happens when you combine clinical big data tools and data with clinical decision support systems (CDSS)? In this 2018 journal article published in Frontiers in Digital Humanities, Dagliati et al. report two such effective implementations affecting diabetes and arrhythmogenic disease research. Through the lens of the “learning healthcare system cycle,” the authors walk through the benefits of big data tools to clinical decision support and then provide their examples of live use. They conclude that through the use of big data and CDDS, “when information is properly organized and displayed, it may highlight clinical patterns not previously considered … [which] generates new reasoning cycles where explanatory assumptions can be formed and evaluated.”
Implementation and use of cloud-based electronic lab notebook in a bioprocess engineering teaching laboratory
In this 2017 paper published in the Journal of Biological Engineering, Riley et al. of Northwestern University describe their experience with implementing the LabArchives cloud-based electronic laboratory notebook (ELN) in their bioprocess engineering laboratory course. The ultimate goal was to train students to use the ELN during the course, meanwhile promoting proper electronic record keeping practices, including good documentation practices and data integrity practices. They concluded that not only was the ELN training successful and useful but also that through the use of the ELN and its audit trail features, “a true historical record of the lab course” could be maintained so as to improve future attempts to integrate the ELN into laboratory training.
When it comes to experimental materials science, there simply aren’t enough “large and diverse datasets” made publicly available say National Renewable Energy Laboratory’s Zakutayev et al. Noting this lack, the researchers built their own High Throughput Experimental Materials (HTEM) database containing 140,000 sample entries and underpinned by a custom laboratory information management system (LIMS). In this 2018 paper, the researchers discuss HTEM, the LIMS, and the how the contained sample data was derived and analyzed. They conclude that HTEM and other databases like them are “expected to play a role in emerging materials virtual laboratories or ‘collaboratories’ and empower the reuse of the high-throughput experimental materials data by researchers that did not generate it.”
“Cannabis … is an iconic yet controversial crop,” begin Dufresnes et al. in this 2017 paper published in PLOS ONE. They reveal that in actuality, due to regulations and limitations on supply, we haven’t performed the same level of genetic testing on the crop in the same way we have others. Turning to next-generation sequencing (NGS) and genotyping, we can empower the field of Cannabis forensics and other research tracks to make new discoveries. The researchers discuss their genetic database and how it was derived, ultimately concluding that databases like theirs and the “joint efforts between Cannabis genetics experts worldwide would allow unprecedented opportunities to extend forensic advances and promote the development of the industrial and therapeutic potential of this emblematic species. “
In this 2018 paper published in Frontiers in Neuroinformatics, Antolik and Davison present Arkheia, “a web-based open science platform for computational models in systems neuroscience.” The duo first describes the reasoning for creating the platform, as well as the similar systems and deficiencies. They then describe the platform architecture and its deployment, pointing out its benefits along the way. They conclude that as a whole, “Arkheia provides users with an automatic means to communicate information about not only their models but also individual simulation results and the entire experimental context in an approachable, graphical manner, thus facilitating the user’s ability to collaborate in the field and outreach to a wider audience.”
This brief case study by the National Institutes of Health’s (NIH) Nathan Hosburgh takes an inside look at how the NIH took on the responsibility of bioinformatics training after the National Center for Biotechnology Information (NCBI) had to scale back its training efforts. Hosburgh provides a little background on bioinformatics and its inherent challenges. Then he delves into how the NIH—with significant help from Dr. Medha Bhagwat and Dr. Lynn Young—approached the daunting task of filling the education gap on bioinformatics, with the hope of providing “a dynamic and valuable suite of bioinformatics services to NIH and the larger medical research community well into the future.”
This 2018 article published in International Journal of Interactive Multimedia and Artificial Intelligence sees Rosas and Carnicero provide their professional take—from their experience with the Spanish and other European public health system—on the benefits and challenges of implementing big data management solutions in the world of health care. After citing numbers on public and private health expenditures in relation to population, as well as reviewing literature on the subject of bid data in healthcare, the authors provide insight into some of the data systems, how they’re used, and what challenges their implementation pose. They conclude that “the implementation of big data must be one of the main instruments for change in the current health system model, changing it into one with improved effectiveness and efficiency, taking into account both healthcare and economic outcomes of health services.”
Generating big data sets from knowledge-based decision support systems to pursue value-based healthcare
With the push for evidence-based medicine and advances in health information management over the past 30 years, the process of clinical decision making has changed significantly. However, new challenges have emerged regarding how to put the disparate data found in information management technologies such as electronic health records and clinical research databases to better use while at the same time honoring regulations and industry standards. González-Ferrer et al. discuss these problems and how they’ve put solutions in place in this 2018 paper in the International Journal of Interactive Multimedia and Artificial Intelligence. They conclude that despite the benefits of clinical decision support systems and other electronic data systems, “the development and maintenance of repositories of dissociated and normalized relevant clinical data from the daily clinical practice, the contributions of the patients themselves, and the fusion with open-access data of the social environment” will all still be required to optimize their benefits.