One tool to find them all: A case of data integration and querying in a distributed LIMS platform

In this early 2019 journal article published in the journal Database, Grand et al. of the Candiolo Cancer Institute present the fine details of their laboratory information management system (LIMS) Laboratory Assistant Suite (LAS) for cancer and other genomic research. Citing “a substantial mismatch between the LIMS solutions on offer and the functional requirements dictated by research practice,” the authors describe the requirements they had for a LIMS in their institution and how they went about creating it. After describing the data models, functionalities, modular architecture, and its usage, the authors conclude that their LAS, in conjunction with a custom data management module, allows researchers to “execute complex queries without any knowledge of query languages or database structures, and easily integrate heterogeneous data stored in multiple databases,” while also resulting in an improvement in data quality, a reduction in data entry and retrieval, and new insights with the enabled data interconnections.

What is the “source” of open-source hardware?

While the definitions surrounding “open-source” software have largely sufficed for such software, the term as applied to tangible hardware products has been insufficient, according to arguments by Bonvoisin et al. in this 2017 paper published in the Journal of Open Hardware. Their work turns to analysis of 132 proclaimed open-source ” non-electronic and complex open-source hardware products.” After lengthy background information and discussion of their methods and results, the authors conclude: “The empirical results strongly indicate the existence of two main usages of open-source principles in the context of tangible products: publication of product-related documentation as a means to support community-based product development and to disseminate privately developed innovations. It also underlines the high variety of interpretations and even misuses of the concept of open-source hardware.”

From command-line bioinformatics to bioGUI

The topic of making bioinformatics applications more approachable to researchers and students has been discussed off and on for years, and some efforts have even been made in that regard. Another step forward for bioinformatics applications is offered by Joppich and Zimmer of Ludwig-Maximilians-Universität München, with their open-source bioGUI. The software attempts to address two problems of bioinformatics applications that rely heavily on the command line: many of them work on Unix-based systems but not Microsoft Windows, and researchers have a tendency to shy away from complex command-line apps despite their utility. The authors present in detail their framework and its use cases, showing how a graphical user interface or GUI can make many such command-line apps more approachable. They conclude that providing a GUI and easy-to-use install modules for bioinformatics apps using the command line makes “execution and usage of these tools more comfortable” while allowing scientists to better analyze their data.

ChromaWizard: An open-source image analysis software for multicolor fluorescence in situ hybridization analysis

In this brief article published in Cytometry Part A, researchers at the Austrian Centre for Industrial Biotechnology present their open-source multiplex fluorescence in situ hybridization (M‐FISH) software for chromosome painting. The tool—ChromaWizard—acts as a free and open-source option for hybridization analysis, integrating “image processing, multicolor integration, chromosome separation, and visualization with false color assignments.” The software can handle images in TIFF, PNG, and JPEG formats and provides robust visualization tools. The authors conclude that ChromaWizard “allows direct inspection of the original hybridization signals and enables either manual or automatic assignment of colors, making it a functional and versatile tool that can also be used for other multicolor applications.”

Haves and have nots must find a better way: The case for open scientific hardware

Open-source software has been a topic of discussion for decades, both as a model of software development and distribution and as a potential for what could be done with it. But the concept of open-source hardware, particularly in the field of science, has been more challenging to address. In this brief perspective article by University of Tübingen’s André Maia Chagas, he discusses the benefits of open science hardware to address the growing concern of “haves and have nots” in the scientific research community. Citing high prices, the overall closed-source nature of equipment, businesses that can close without warning, and poor customer support, Chagas attempts to demonstrate that advances in modern design—such as the smartphone—and organizational efforts to implement and promote open hardware philosophies provide opportunity for more people to engage in scientific endeavors. He concludes we “need to reassess our relationship to knowledge and technology, how it determines our role in society, and how we want to spend grant money entrusted to us by the people,” and by focusing on making hardware more open and accessible, we’ll shrink the divide and improve scientific research as a result.

CytoConverter: A web-based tool to convert karyotypes to genomic coordinates

In this brief article published in BMC Bioinformatics, Wang and LaFramboise address the topic of cytogenetic data and their genomic coordinates, which the authors describe as “precisely [specifying] a chromosomal location according to its distance from the end of the chromosome.” The authors note that despite changes in techniques over the years, the use of karyotype and cytogenetic nomenclature was the primary way of characterizing aberrations in chromosomes, and those methods are still being used today. Additionally, archival data used karyotypes and cytogenetic nomenclature. Given the lack of a maintained, robust tool for converting that nomenclature to genomic coordinates, the authors address their creation, CytoConverter, and explain how it accomplishes such conversions. They conclude that the tool should have “considerable value to the community for analyzing archival patient samples, as well as samples for which higher-resolution copy number data is unavailable.”

Implementing a novel quality improvement-based approach to data quality monitoring and enhancement in a multipurpose clinical registry

The clinical registry is an important tool in assisting those affected by rare diseases. Using observational study methods, the registry collects and assists with analysis of data related to a particular disease or condition, helping to serve researchers and their policies towards helping the afflicted. While studies have been performed on the “fit for use” nature of registry data, data quality studies of registries using specific quality improvement approaches have been fewer. Pratt et al. give their take on a quality improvement scheme for registries in this 2019 paper, tapping into the significant data generated by the ImproveCareNow (ICN) Network and its goal of improving outcomes of pediatric and adolescent inflammatory bowel diseases. After going through their methodology, the authors conclude that data quality improvement campaigns, particularly those that include support for training and tools, can improve “the completeness and consistency” of registry data, and by extension resulting “in a higher level of confidence when accessing the data for various purposes, including clinical decision making.”

Fast detection of 10 cannabinoids by RP-HPLC-UV method in Cannabis sativa L.

In this 2019 article published in Molecules, Mandrioli et al. detail the process they developed for detecting 10 cannabinoids in Cannabis sativa L. using reversed-phase high-performance liquid chromatography with an ultraviolet detector (RP-HPLC-UV). Citing numerous other published methods, many of them requiring more expensive mass spectrometry detectors, the authors sought to “identify and titrate cannabinoids in a simple way” using a method that optimally would be “fast, easy, robust, and cost-efficient,” making the process more approachable by not only research laboratories but also small businesses focused on quality control. Using their method, the authors conclude that the method can produce results in eight minutes, with high sensitivity and simple methodology.

What is this sensor and does this app need access to it?

We use our mobile phones daily, and many of don’t give consideration to whether or not those devices are tracking or monitoring our activities. At the root of this cybersecurity issue is, most frequently, the permissions given to one or more applications on the device to access one or more sensors contained in the device. The lackadaisical attitude of the average user towards the cybersecurity of their mobile device can be attributed to a variety of aspects, including poor education regarding smartphone use, low public awareness, and ignorance due to developers’ stealthy or “permission hungry” methodologies. Mehrnezhad and Toreini discuss these issues and more at length in this 2019 paper published in Informatics, concluding that while “teaching about general aspects of sensors might not immediately improve people’s ability to perceive the risks,” over time users may “successfully identify over-privileged apps” and make more informed decisions about “modifying the app permissions, uninstalling, or keeping it as-is.”

AI meets exascale computing: Advancing cancer research with large-scale high-performance computing

In this 2019 review paper published in Frontiers in Oncology, Bhattacharya et al. describe the state of collaborative, artificial-intelligence-based computational cancer research within various agencies and departments of the United States. The researchers point to three major initiatives that aim “to push the frontiers of computing technologies in specific areas of cancer research” at the cellular, molecular, and population levels. They present details concerning the three initiatives, enacted as pilot programs with specific goals: Pilot One “to develop predictive capabilities of drug response in pre-clinical models of cancer,” Pilot Two “on delivering a validated multiscale model of Ras biology on a cell membrane,” and Pilot Three “to leverage high-performance computing and artificial intelligence to meet the emerging needs of cancer surveillance.” Additionally, emerging opportunities and challenges that continue to arise out of these pilots are also addressed, before concluding that “opportunities for extreme-scale computing in AI and cancer research extend well beyond these pilots.”

Building infrastructure for African human genomic data management

Genomic and sequencing data are inherently complex and have significant storage requirements. They require a robust infrastructure with well-considered policies to make the most of their potential. While North America and Europe have helped lead the way in this goal, Africa is behind them in the adoption of genomic technologies. Parker et al., of the Human Hereditary and Health in Africa (H3Africa) program, have taken on the challenge of provisioning and managing the infrastructure required to meet the goals of various Africa-based genetic research projects. This paper describes the H3Africa Data Archive, “the first formalized human genomic data archive on the continent.” The authors discuss their process and findings, also noting various challenges that presented during the implementation process, as well as recognizing the various benefits from such a project.

Process variation detection using missing data in a multihospital community practice anatomic pathology laboratory

In this brief journal article by Ochsner Health System’s Gretchen Galliano, a case is made for a programmatic approach to analyzing missing data in various laboratory information systems (LIS) and determining potential correlations with procedural and systemic processes in the health system. Using the R programming language, homemade scripts, and existing R packages, Ochsner Health Systems visualized and analyzed data from more than 70,000 cases of missing timestamp data, splitting the various cases into five pools. Galliano concludes that the process of “evaluating cases with missing predefined process timestamps” has the potential for improving “the ability to detect other data variations and procedure noncompliance in the AP workflow in a prospective fashion.” She added that as an additional benefit, “[p]eriodically evaluating data patterns can give AP LIS teams and operations teams insight into user–LIS interactions and may help identify areas that need focus or updating.”

Development and validation of a fast gas chromatography–mass spectrometry method for the determination of cannabinoids in Cannabis sativa L

In this 2018 paper published in the Journal of Food and Drug Analysis, Cardenia et al. discuss their development of a “routine method for determining cannabinoids” in the flowers of Cannabis sativa L. using fast gas chromatography coupled to mass spectrometry (fast GC-MS), with appropriate derivatization approaches that take into account potential decarboxylation. The authors discuss the various problems with other methods then present their materials and methods. After considerable discussion of their results, they conclude that the procedure is fast (within seven minutes), has good resolution (R > 1.1), and remains cost-effective. Sensitivity was also high, with “a high repeatability and robustness in both cannabinoid standard mixtures and hemp inflorescence samples.”

Design and refinement of a data quality assessment workflow for a large pediatric research network

Clinical data research networks (CDRNs)—consisting of a variety of health care delivery organizations that share deidentified clinical data for clinical research purposes—constitute yet another collaboratory mechanism for scientific researchers to pool data and make new discoveries. However, one of the faults of CDRN data is that it typically comes from electronic health records (EHRs), which contain data with a lean more towards supporting “clinical operations rather than clinical research.” This means data quality is of the utmost importance when pooling and putting to effective use such disparate data sources. In this research, Khare et al.propose a systematic workflow for making quality assessments of CRDN’s data before use, a workflow that includes hundreds of systematic data checks and a GitHub-based reporting system to track and correct issues in a more timely fashion. They conclude that their publicly available toolkit definitively has value, though implementers should be advised that “sufficient resources should be dedicated for investigating problems and optimizing data” due to the time-intensive nature of the entire process.

Identification of Cannabis sativa L. (hemp) retailers by means of multivariate analysis of cannabinoids

In this 2019 article published in Molecules, Palmieri et al. demonstrate their ability to use nine cannabinoids, a specific analytical method, and multivariate analysis—without any other identifying information—to identify the retailer of 161 hemp samples from four retailers. Highlighting the fact that simply using analyses of Δ9-tetrahydrocannabinol (THC) and cannabidiol (CBD) “to extrapolate the phytochemical composition of hemp” may be insufficient in some cases, the researchers turn to high-performance liquid chromatography–tandem mass spectrometry (HPLC-MS/MS) and partial least squares discriminant analysis (PLS-DA) to identify hemp sample origins. The authors note that using their techniques, “92% of the hemp samples were correctly classified by the cannabinoid variables in both fitting and cross-validation.” They conclude “that a simple chemical analysis coupled with a robust chemometric method could be a powerful tool for forensic purposes.”

Data sharing at scale: A heuristic for affirming data cultures

The concept of data sharing and open science have been touted more over the past decade, often in the face of claims of lack of reproducibility and the need for more collaboration across scientific disciplines. At times researchers will point to a specific “culture” evident in their organization that helps or hinder the move towards data sharing. But the concept of aligning data cultures—particularly through the lens of identifying and solving the inherent differences between disciplines—isn’t the way to look at data sharing, argue Poirier and Costelloe-Kuehn. Instead, we must ” showcase and affirm the diversity of traditions and modes of analysis that have shaped how data gets collected, organized, and interpreted in diverse settings,” they say. In this essay, the authors present their heuristic (a problem solving and self-discovery method) for sharing data at scale, from the meta level down to the nano level, giving researchers the tools to “affirm and respect the diversity of cultures that guide global and interdisciplinary research practice.”

Design and evaluation of a LIS-based autoverification system for coagulation assays in a core clinical laboratory

In this 2019 paper published in BMC Medical Informatics and Decision Making, Wang et al. of China Medical University demonstrate the results of an attempt to add autoverification mechanisms for coagulation assays into their laboratory information systems (LIS) to better improve both operations and patient care. After providing background on coagulation assays and autoverification guidelines, the researchers describe their methodology for programmatically developing autoverification decisions rules and implementing them into their laboratory workflow. Additionally, they discuss how best assess validation of the new system and its results. The authors conclude that not only has the new system improved turnaround time in the lab, but it also has improved the level of medical safety in its diagnoses in the affiliated hospitals.

CyberMaster: An expert system to guide the development of cybersecurity curricula

In this 2019 paper published in the International Journal of Online and Biomedical Engineering, authors Hodhod et al. present their expert system CyberMaster, designed to “to assist inexperienced instructors with cybersecurity course design.” They highlight the need for improved cybersecurity training in not only universities but also the workplace and give some underlying reasons for why cybersecurity issues seem to be increasing. The authors turn to the National Institute of Standards and Technology (NIST) National Initiative for Cybersecurity Education Framework (NICE Framework) for the development of CyberMaster. After describing its creation and implementation, they conclude that “[t]he system contributes to changing the current status of cybersecurity education by helping instructors anywhere in the world to develop cybersecurity courses.”

Costs of mandatory cannabis testing in California

Compared to other U.S. states, California arguably has some of the most strict laws regarding the laboratory testing of cannabis. Economically, what have been some of the effects of these regulations? Valdes-Donoso et al. attempt to contribute to that conversation in this 2019 paper published in California Agriculture. Using state regulations, expert opinions, primary data from California’s laboratories, and data from cannabis testing equipment manufacturers, the authors attempt to estimate the cost per pound of testing and sampling under the state’s regulatory framework. This includes what they consider to be particularly costly: cases where cannabis is rejected for testing failure. They conclude their research by discussing the economic and regulatory implications of their findings, including supply and demand issues, costs of legal vs. illegal cannabis, and comparisons to other state-mandated agricultural testing in the state.

An integrated data analytics platform

In this brief paper published in Frontiers in Marine Science, Armstrong et al. present the details of their OceanWorks integrated data analytics platform (IDAP) (which later was open sourced as the Apache Science Data Analytics Platform [SDAP]). Confronted with disparate data management solutions for performing research on oceanographic research data, the authors developed OceanWorks to provide an integrated platform capable of advanced data queries, data analysis, anomaly detection, data matching, data subsetting, and more. Since its creation, OceanWorks has been deployed in multiple NASA environments to handle a wide variety of data management tasks at various deployment intensities. They conclude that under its open-source SDAP iteration, the software platform will “continue to evolve and leverage any new open-source big data technology” in order “to deliver fast, web-accessible services for working with oceanographic measurements.”