For information technology professionals and informaticists alike, when handling data, the idea of “garbage in, garbage out” remains a popular refrain. Collecting data isn’t enough; its quality for future analysis, sharing, and use is also important. Similarly, with the growth of the internet, the amount of health-related information being pumped online increases, but its quality isn’t always attended to. In this 2018 paper by Al-Jefri et al., the topic of online health information quality (IQ) gets addressed in the form of a developed framework “that can be applied to websites and defines which IQ criteria are important for a website to be trustworthy and meet users’ expectations.” The authors conclude with various observations, including differences in how education, gender, and linguistic background affects users’ ability to gauge information quality, and how there seems to be an overall lack of caring about the ethical trustworthiness of online health information by the public at large.
In this 2018 article published in Future Internet, Teixeira et al. test five machine learning algorithms in a supervisory control and data acquisition (SCADA) system testbed to determine whether or not machine learning is useful in cybersecurity research. Given the increasing number and sophistication of network-based attacks on industrial and research sensor networks (among others), the authors assessed the prior research of others in the field and integrated their findings into their own SCADA testbed dedicated to controlling a water storage tank. After training the algorithms and testing the system with attacks, they concluded that the Random Forest and Decision Tree algorithms were best suited for the task, showing ” the feasibility of detecting reconnaissance attacks in [industrial control system] environments.”
Semantics for an integrative and immersive pipeline combining visualization and analysis of molecular data
The field of bioinformatics has really taken off over the past decade, and so with it has the number of data sources and the need for improved visualization tools, including in the realm of three-dimensional visualization of molecular data. As such, Trellet et al. have developed the infrastructure for “an integrated pipeline especially designed for immersive environments, promoting direct interactions on semantically linked 2D and 3D heterogeneous data, displayed in a common working space.” The group discusses in detail bioinformatics ontologies and semantic representation of bioinformatics knowledge, as well as vocal-based query management with such a detailed system. They conclude their efforts towards their “pipeline might be a solid base for immersive analytics studies applied to structural biology,” including the ability to propose “contextualized analysis choices to the user” during interactive sessions.
From bioinformatics applications to social media research, the volume and velocity of data to manage continues to grow. Analysis of this massive faucet of data requires new ways of thinking, including new software, hardware, and programming tools. In this 2019 paper published in Journal of Cloud Computing, Domenico Talia of the University of Calabria in Italy presents a detailed look at exascale computing systems as a way to manage and analyze this river of data, including the use of cloud computing platforms and exascale programming systems. After a thorough discussion, the author concludes that while “[c]loud-based solutions for big data analysis tools and systems are in an advanced phase both on the research and the commercial sides,” more work remains in the form of finding solutions to a number of design challenges, including on the data mining side of algorithms.
Transferring exome sequencing data from clinical laboratories to healthcare providers: Lessons learned at a pediatric hospital
We go back in time a year for this brief paper published by Swaminathan et al. of Nationwide Children’s Hospital in Ohio. The researchers present their experiences handling the nuances of transferring large genomic data files of individual patients to a sequencing lab, all while handling the security and privacy protections surrounding the data. Handling only 19 patients’ genomic files, at least initially, presented a number of workflow and protocol challenges for both hospital and laboratory. They conclude with barriers (file size and workflow management consistency) and suggestions (EHR-based alerts, blockchain) about what could be improved with such data transfers in the future to better realize the “massive potential to leverage genomic data to advance human health overall.”
Whether it’s a document management system or a laboratory information management system, some sort of query function is involved to help the user find specific documents or data. How that data is retrieved—using information retrieval methods—can vary, however. In this 2019 paper published in EURASIP Journal on Wireless Communications and Networking, Binbin Yu details a modified information retrieval methodology that uses a domain-ontology-based approach that integrates document processing and retrieval aspects of the query. Domain ontology takes into account semantic information and keywords, which improves recall and precision of results. After explaining the mathematics and experimentation methodology, Yu concludes “the genetic algorithm shortens the distance compared with simulated annealing, and the ontology retrieval model exhibits a better precision and recall rate to understand the users’ requirements.”
In this late 2018 paper published in BMC Medical Informatics and Decision Making, Pathinarupothi et al. with the Amrita Institute of Medical Sciences present their Rapid Active Summarization for Effective Prognosis (RASPRO) framework for healthcare facilities. Noting an increasing volume of data coming from body-attached senors and a lack of making the best sense of it, the researchers developed RASPRO to provide summarized patient/disease-specific trends via body sensor data and aid physicians in being proactive in more rapidly identifying the onset of critical conditions. This is done through the implementation of “physician assist filters” or PAFs, which also enable succinctness and decision making even over bandwidth-limited communication networks. They conclude the system “helps in personalized, precision, and preventive diagnosis of the patients” while also providing the benefits of availability, accessibility, and affordability for healthcare systems.
What do you do when your newborn screening program grows in importance, beyond its original data management origins in a time of cloud computing and integrated informatics systems for healthcare? Entities such as Newborn Screening Ontario (NSO) have risen to the challenges inherent to this question, undertaking an end-to-end assessment of their needs and existing capabilities, in the process deciding on “a holistic full product lifecycle redesign approach.” This paper describes the full process as conducted by NSO, from theory to practice. The authors conclude “that developing, implementing, and deploying a [screening information management system] is about much more than the technology; team engagement, strong leadership, and clear vision and strategy can lead newborn screening programs looking to do the same to success and long-term gains in patient outcomes.
Adapting data management education to support clinical research projects in an academic medical center
In this 2019 paper written by New York University School of Medicine’s Kevin B. Read, the topic of clinical research data management (CRDM) is discussed, particularly in its application at the NYU Health Sciences Library. Identifying a strong need by the clinical research community at the university for CRDM training, Read—acting as the Data Services Librarian and Data Discovery Lead—developed curriculum to support such a mission and offered training. This article details his journey as such, ending with supporting data and a strong feeling that the end result is a “research community being better trained, more compliant, and increasingly aware of established institutional workflows for clinical research.”
Development of an electronic information system for the management of laboratory data of tuberculosis and atypical mycobacteria at the Pasteur Institute in Côte d’Ivoire
In this 2019 paper, Koné et al. of the Pasteur Institute of Côte d’Ivoire provide insight into their self-developed laboratory information system (LIS) specifically designed to meet the needs of clinicians treating patients infected with Mycobacterium tuberculosis. After discussing its design, architecture, installation, training sessions, and assessment, the group describes system launch and how its laboratorians perceived the change from paper to digital. With some discussion, they conclude they have improved, more real-time “indicators on the follow-up of samples, the activity carried out in the laboratory, and the state of resistance to antituberculosis treatments” with the conversion.
Codesign of the Population Health Information Management System to measure reach and practice change of childhood obesity programs
Attempting to implement a regional public health initiative affecting thousands of children is daunting enough, but collecting, analyzing, and reporting critical data that shows efficacy can be even more challenging. This 2018 article published in Public Health Research & Practice demonstrates one approach to such an endeavor in New South Wales Australia. Green et al. discuss the design and implementation of their Population Health Information Management System (PHIMS) to integrate and act upon data associated with not one but two related public health programs targeting the prevention of childhood obesity. The article also discusses some of the challenges with the project, from funding and training all 15 New South Wales local health districts to ensuring support across all the districts for consistent operation and security despite differing IT infrastructures. They conclude that despite the challenges, their award-winning PHIMS solution has been vital to the two programs’ success.
In this brief paper published in Folia Forestalia Polonica, Series A – Forestry, Dorota Grygoruk of Poland’s Forest Research Institute presents the development of the open data concept within the context of Poland and other countries, while also addressing how data sharing and management is challenged by the paradigm. Grygoruk first defines the open data and open access concepts and then describes how policy in Poland and the European Union has been adopted to specify those concepts within institutions. The author then analyzes the challenges of implementing data sharing inherent to research data management, including within the context of forestry informatics. The conclusion? The “organizational and technological solutions that enable analysis” are increasingly vital, and ” it becomes necessary for research institutions to implement data management policies,” including data sharing policies.
In the inaugural issue of the journal Energy Informatics, Watson et al. of the University of Georgia – Athens provide research and insight into how databases, data streams, and schedulers can be joined with an information system to drive more cost-effective energy production for greenhouses. Combining past research and new technologies, the authors turn their sights to food security and the importance of developing more efficient systems for greater sustainability. They conclude that an energy informatics framework applied to controlled-environment agriculture can significantly reduce energy usage for lighting, though “engaging growers will be critical to adoption of information-systems-augmented adaptive lighting.”
In this brief collaborative article by various researchers in the United Kingdom, a statement of fact is quickly set out for the reader: health data science and clinical informatics have a considerable gap between each other that must be addressed. Wasting no time, Scott et al. dig into the U.K. context of “the operational realities of health data quality and the implications for data science.” Collected clinical data is “problematic,” they claim, and clinical informaticians don’t always link the “two cultures” of using 1. clinical data and knowledge as a primary tool to 2. improve human health outcomes. They close by recognizing existing efforts to bridge the gap between the two cultures and make recommendations of their own such as recognizing “the interdisciplinary nature of biomedical informatics” and a need for “a significant expansion of clinical informatics capacity and capability.”
Kristin Briney, Data Services Librarian at the University of Wisconsin – Milwaukee, gives a brief commentary on the perils of managing research data with inconsistent or non-standardized date formats. Tapping into the stories of statisticians and ecologists, Briney notes that despite being a more western, Gregorian-based system, the international standard ISO 8601 provides benefits of consistency, formatting, extensibility, and sorting. And while ISO 8601 doesn’t play nicely with Microsoft Excel, the author provides several ways around the problem. She concludes that “ISO 8601 is a natural partner for research data management” and encourages other researchers to adopt the standard.
What can medical librarians do to better support patrons? How can clinical medicine and research librarians work together to foster an environment of improved research cycles and patient outcomes? Bardyn et al. address these concerns and others through a demonstration of what the University of Washington’s Translational Research and Information Lab (TRAIL) program has accomplished since its inception. The authors introduce basic concepts in clinical and translational research and then provide background and methodology for how they improved researcher-focused spaces, clinical research support services, and research data management services. They conclude that “initiatives like TRAIL are vital to supporting universities’ clinical data research efforts,” noting that “[i]n uniting leading on-campus health sciences organizations, such initiatives build off the strengths of each partner” and encourage new skill sets to be developed to support cross-discipline research on campus.
This brief article published in Journal of Taibah University Medical Sciences in 2017 looks at public health informatics (PHI) from the perspective of a researcher in the Kingdom of Saudi Arabia. Aziz discusses the concept of PHI and then looks at the various surveillance systems within PHI. Later he delves into the challenges provided by paper-based systems and how electronic systems can alleviate them. He closes with a discussion of PHI in the Kingdom of Saudi Arabia and concludes that various “applications and initiatives are currently available to meet the growing needs for faster and accurate data collection methods” in the country, as well as around the world.
The development and application of bioinformatics core competencies to improve bioinformatics training and education
In this 2018 article by Mulder et al., a broad collective of knowledge and experience is brought together to better shape the competencies required for a modern bioinformatics education program and their training contexts. Need is immense, yet methodologies are diverse, necessitating cooperation to refine core competencies for different groups. The authors describe the development of these competencies and then provide practical use cases for them. They conclude the competencies “provide a basis for the community of bioinformatics educators, despite widely divergent goals and student populations, to draw upon their common experiences in designing, refining, and evaluating their own training programs.” However, they also caution that they shouldn’t be viewed as “a prescription for a specific set of curricula or curricular standards.”
This paper by Malykh and Rudetskiy “discusses different approaches to building a clinical decision support system based on big data,” with a focus on non-biased processing methods and their comparative assessments. After an in-depth analysis of methods and objectives, the authors present their findings from the clinical decision support data and their significance. They conclude that case-based and precedent-based approaches each have their advantages–including more accurate recommendations and faster system speeds–but are not without disadvantages. The authors suggest future research is needed to address “problems with optimization of provided metrics, compression of state descriptions, and construction of training procedures.”