Practical approaches for mining frequent patterns in molecular datasets

Most researchers in the life sciences know now of the concept of “big data” and the push to better organize and mine the data that comes out of biological research. However, the techniques used to mine promising and interesting information from accumulating biological datasets are still developing. In this 2016 paper published in Bioinformatics and Biology Insights, Naulaerts et al. look at three specific software tools for mining and presenting useful biological data from complex datasets. They conclude that while the Apriori and arules software tools have their benefits and drawbacks, “MIME showed its value when subsequent mining steps were required” and is inherently easy-to-use.

Improving the creation and reporting of structured findings during digital pathology review

In this 2016 paper published in Journal of Pathology Informatics, Cervin et al. look at the state of pathology reporting, in particular that of structured or synoptic reporting. The group write about their prototype system that “sees reporting as an activity interleaved with image review rather than a separate final step,” one that has “an interface to collect, sort, and display findings for the most common reporting needs, such as tumor size, grading, and scoring.” They conclude that their synoptic reporting approach to pathology + imaging can provide a level of simplification to improve pathologists’ time to report as well ability to communicate with the referring physician.

The challenges of data quality and data quality assessment in the big data era

While big data is a popular topic these days, the quality of that data is at times overlooked. This 2015 paper published in Data Science Journal attempts to address that importance and lay out a framework for big data quality assessment. After conducting a literature review on the topic, Cai and Zhu analyzed the challenges associated with ensuring quality of big data. “Poor data quality will lead to low data utilization efficiency and even bring serious decision-making mistakes,” they conclude, presenting “a dynamic big data quality assessment process with a feedback mechanism, which has laid a good foundation for further study of the assessment model.”

Water, water, everywhere: Defining and assessing data sharing in academia

In this 2016 article published in PLOS ONE, Van Tuyl and Whitmire take a close look at what “data sharing” means and what data sharing practices researchers have been using since the National Science Foundation’s data management plan (DMP) requirements went into effect in 2011. Making federally-funded research “data functional for reuse, validation, meta-analysis, and replication of research” should be priority, they argue; however, they conclude not enough is being done in general. The researchers close by making “simple recommendations to data producers, publishers, repositories, and funding agencies that [they] believe will support more effective data sharing.”

Principles and application of LIMS in mouse clinics

Clinical researchers conducting mouse studies at seven different facilities around the world shared their experiences using a laboratory information management system (LIMS) in order “to facilitate or even enable mouse and data management” better in their facilities. This 2015 paper by Maier et al. examines those discussions and final findings in a “review” format, concluding “the unique LIMS environment in a particular facility strongly influences strategic LIMS decisions and LIMS development” though “there is no universal LIMS for the mouse research domain that fits all requirements.”

Multilevel classification of security concerns in cloud computing

In this 2016 article in Applied Computing and Informatics, Hussain et al. take a closer look at the types of security attacks specific to cloud-based offerings and proposes a new multi-level classification model to clarify them, with an end goal “to determine the risk level and type of security required for each service at different cloud layers for a cloud consumer and cloud provider.”

Assessment of and response to data needs of clinical and translational science researchers and beyond

Published in the Journal of eScience Librarianship, this 2016 article by Norton et al. looks at the topic of “big data” management in clinical and translational research from the university and library standpoint. As academic libraries are a major component of such research, Norton et al. reached out to the various medical colleges at the University of Florida and sought to clarify researcher needs. The group concludes that its research has led to “addressing common campus-wide concerns through data management training, collaboration with campus IT infrastructure and research units, and creating a Data Management Librarian position” to improve the library system’s role with data management for clinical researchers.

SUSHI: An exquisite recipe for fully documented, reproducible and reusable NGS data analysis

Many next-generation sequencing (NGS) data analysis frameworks exist, from Galaxy to bpipe. However, Hatakeyama et al. at the University of Zürich noted a distinct lack of a framework that 1. offers both web-based and scripting options and 2. “puts an emphasis on having a human-readable and portable file-based representation of the meta-information and associated data.” In response, the researchers created SUSHI (Support Users for SHell-script Integration). They conclude that “[i]n one solution, SUSHI provides at the same time fully documented, high level NGS analysis tools to biologists and an easy to administer, reproducible approach for large and complicated NGS data to bioinformaticians.”

Open source data logger for low-cost environmental monitoring

This 2014 paper by Ed Baker of London’s Natural History Museum outlines a methodology for combining open-source software such as Drupal with open hardware like Arduino to create a real-time environmental monitoring station that is low-power and low-cost. Baker outlines step by step his approach (he calls it a “how to guide”) to creating an open-source environmental data logger that incorporates a digital temperature and humidity sensor. Though he offers no formal conclusions, Baker states: “It is hoped that the publication of this device will encourage biodiversity scientists to collaborate outside of their discipline, whether it be with citizen engineers or professional academics.”

Evaluating health information systems using ontologies

Evaluating health information systems/technology is no easy task. Eivazzadeh et al. recognize that, as well as the fact that developing evaluation frameworks presents its own set of challenges. Having looked at several different models, the researchers wished to develop their own evaluation method, one that taps into “evaluation aspects for a set of one or more health information systems — whether similar or heterogeneous — by organizing, unifying, and aggregating the quality attributes extracted from those systems and from an external evaluation framework.” As such, the group developed the UVON method, which they conclude can be used ” to create ontologies for evaluation” of health information systems as well as “to mix them with elements from other evaluation frameworks.”

From the desktop to the grid: Scalable bioinformatics via workflow conversion

This featured article from the journal BMC Bioinformatics falls on the heels of several years of discussion on the topic of reproducibility of a scientific experiment’s end results. De la Garza et al. point to workflows and their repeatability as vital cogs in such efforts. “Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization,” they state. The researchers developed their own set of free platform-independent tools for designing, executing, and sharing workflows. They conclude: “We are confident that our work presented in this document … not only provides scientists a way to design and test workflows on their desktop computers, but also enables them to use powerful resources to execute their workflows, thus producing scientific results in a timely manner.”

Terminology spectrum analysis of natural-language chemical documents: Term-like phrases retrieval routine

In this 2016 journal article published in Journal of Cheminformatics, Alperin et al. present the fruits of their labor in an attempt to ” to develop, test and assess a methodology” for both extracting and categorizing words and terminology from chemistry-related PDFs, with the goal of being able to apply “textual analysis across document collections.” They conclude that “[t]erminology spectrum retrieval may be used to perform various types of text analysis across document collections” as well as “to find out research trends and new concepts in the subject field by registering changes in terminology usage in the most rapidly developing areas of research.”

A legal framework to support development and assessment of digital health services

Digital health services is an expanding force, empowering people to track and manage their health. However, it comes with cost and legal concerns, requiring a legal framework for the development and assessment of those services. In this 2016 paper appearing in JMIR Medical Informatics, Garrell et al. lay out such a framework based around Swedish law, though leaving room for the framework to be adapted to other regions of the world. They conclude that their framework “can be used in prospective evaluation of the relationship of a potential health-promoting digital service with the existing laws and regulations” of a particular region.

The GAAIN Entity Mapper: An active-learning system for medical data mapping

In this 2016 article appearing in Frontiers in Neuroinformatics, Ashish et al. present GEM, “an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model.” Used for Alzheimer research though applicable to many other fields, the group concludes “[o]ur experimental evaluations demonstrate significant mapping accuracy improvements obtained with our approach, particularly by leveraging the detailed information synthesized for data dictionaries.”

Visualizing the quality of partially accruing data for use in decision making

The state of data management across the sciences is getting increasingly complex as data stores build up, and the world of public health is no less affected. Making sense of data is one portion of management, but quality analysis is also an important but slightly understated aspect as well. This 2015 paper by Eaton et al. explains a series of “data quality tools developed to gain insight into the data quality problems associated with these data.” The group concludes “our key insight was the need to assess temporal patterns in the data in terms of accrual lag.”

Digital pathology and anatomic pathology laboratory information system integration to support digital pathology sign-out

What happens when you integrate a digital pathology system (DPS) with an anatomical pathology laboratory information system (APLIS)? In the case of Guo et al. and the University of Pittsburgh Medical Center, “[t]he integration streamlined our digital sign-out workflow, diminished the potential for human error related to matching slides, and improved the sign-out experience for pathologists.” This paper, published in Journal of Pathology Informatics in 2016, describes their line of thinking, integration plans, and final results.

A polyglot approach to bioinformatics data integration: A phylogenetic analysis of HIV-1

In this 2016 paper published in Evolutionary Bioinformatics, Reisman et al. discuss a polyglot approach “involving multiple languages, libraries, and persistence mechanisms” towards managing genomic sequence data. Using a NoSQL and RESTful web service approach, the team tested their developed pipeline on an evolutionary study of HIV-1. They conclude that ” the case study highlights the abilities of the tool,” and “although utilized for the investigation of a virus here, the approach can be applied to any species of interest.”

The systems biology format converter

Rodriguez et al. found that when converting computational models from one format to another, while many tools exist, they tend not to be very interoperable and can often be redundant. Additionally, they can be unmaintained or left abandoned. The researchers saw a need for a modular, open-source software system “to support rapid implementation and integration of new converters” in a more collaboratory way. They developed the System Biology Format Converter (SBFC), a Java-based tool that, per their conclusion, “helps computational biologists to process or visualise their models using different software tools, and software developers to implement format conversion.”

Chemozart: A web-based 3D molecular structure editor and visualizer platform

In this 2015 journal article published in the open-access journal Journal of Cheminformatics, Mohebifar and Sajadi describe their web-based HTML5/CSS3 3D molecule editor and visualizer Chemozart. Able to be run from the public web source or your own personal instance, Chemozart is both useful for educational and research purposes. The authors tout “that there’s no need to install anything and it can be accessed easily via a URL.”

Perceptions of pathology informatics by non-informaticist pathologists and trainees

Perhaps frustrated with the state of education and misperceptions in regards to pathology informatics (PI), Walker et al. set out to conduct a survey of noninformatics-oriented pathologists and trainees at the Cleveland Clinic and Massachusetts General Hospital to better grasp views of the professional field. In this paper published in Journal of Pathology Informatics in April 2016, the researchers present their findings and opine about the state of pathology informatics education. They conclude: “Improved understanding and acceptance of PI throughout the pathology community could facilitate the communication and cooperation necessary to realize the type of informatics initiatives capable of advancing the importance of pathologists in the changing healthcare environment.”