Optimizing Artificial Intelligence (AI) Starts with a Harmonized LIMS

Starlims Post Thumbnail Blog

Artificial intelligence (AI) and machine learning (ML) are must-have applications within labs of the future. AI allows deep mining of data across databases from labs throughout the enterprise, and ML is integral in forging otherwise undiscovered insights and linkages among data points that can accelerate projects and, sometimes, point them in novel directions. Getting the most from these systems requires more than optimizing the AI system. The quality and thoroughness AI delivers depends largely upon your laboratory information management system (LIMS).

Your LIMS must be robust, reliable, and flexible enough to accommodate future needs, and the data must also be harmonized to ensure that the results returned reflect a comprehensive search of the relevant data within the system. Only then can AI and the researchers using it be confident that their conclusions are supported by science.

Those conclusions may lead to new insights that may, for example, increase understandings of mechanisms of action or pharmaceutical/target interactions that can result in identifying new pharmaceutical targets, designing novel molecules in silico and modeling protein interactions more accurately. AI-enabled insights can be instrumental in accelerating the path toward filing an investigational new drug (IND) application.

Because your LIMS contains experiment outcomes and sample-centric test results that may span multiple tests or protocols, it is the source of truth for everything that follows. Therefore, it’s important to ensure that both the master data and the data generated by the individual labs is harmonized. Harmonization ensures that all the relevant data can be found and, thus, considered by the AI. In short, it enables the AI to discover more correlations.

For new LIMS implementations, it’s advisable for all stakeholders – lab personnel, managers, production managers, quality assurance experts and others – to outline their workflow processes relevant to the LIMS, identify bottlenecks and optimize processes. This exercise will enhance the efficiency of the overall system, with or without AI.

If the LIMS has been in use for a while, start by reviewing the LIMS’ master data – the information users need to operate the LIMS properly. First, ensure it is available in one location. This core data is non-transactional, so it only changes when new technologies or processes are added. Examples of master data include a list of reserved keywords that can’t be used as file names, and conventions that dictate how special characters or capital letters may be used. This file should be easy to find and easy to update when changes are made to the labs or as the LIMS software is updated.

Also, take the time to establish or update naming conventions so they are meaningful enterprise wide. Consider naming conventions that reference the project, a variation, lab type, manufacturing line, or site. For example, multisite production facilities might name an assay based upon the site, building, production line, and purpose. A pH assay for line one in building two at the Boston facility thus might be labeled “Bos2-Line1pH.” This level of detail enables assays to be identified easily later.

For non-manufacturing applications, consider using the department name, method number, and type of reference. Other conventions may find it helpful to use regulatory reference numbers within the file name. If more specificity is needed, link the data to a lookup chart as needed. The value of that approach may seem negligible now but becomes more apparent when historic data is incorporated into the LIMS. In that case, a file name that references an MRI machine that was replaced a decade ago may be less helpful than noting the regulatory process number or the type of lab. Also, standardize the allowable length of data files among labs to ensure that searches can include very long as well as very short file names.

Ensure the LIMS and the labs themselves use standard terms and definitions of tests and products to streamline any searches. Detail how the LIMS deals with sample submissions and tests results, interacts with lab instruments, and documents chain of custody.

Global organizations must also account for international differences in the master data. These include language (i.e., English and Chinese), spellings (i.e., UK and US), currencies, and time zones. Therefore, establish standard conventions to govern those differences. For example, determine whether times should be listed in the local time zone or using Greenwich Mean Time (GMT), and whether reports should be in the organization’s primary language or in the lab’s local language, and ensure that value inputs translate correctly.

Once the LIMS structure is in place, stakeholders can begin to harmonize the data by establishing standards and formats so the LIMS will have the consistency users need to use the system effectively and efficiently.

Although a LIMS isn’t a data warehouse, it faces many of the same issues. With multiple labs using the LIMS, many are likely to have their own ontologies, naming and filing conventions. Unless they are harmonized, certain data is effectively siloed and won’t appear in search results unless the searchers know to use those specific terms.

For example, the LIMS needs to know that weight, wt., pounds, and kilograms all refer to weight, and users need to agree upon a standard convention. The need for such standardization also applies to terminology and naming conventions for tests, methods, materials, and even containers, as well as sample identifications and the status of inventories.

If the LIMS has been in use for a while, take the time now to ensure that any changes to processes, raw materials, or other elements are reflected in the system. Small changes (such as changes in assay specificity that resulted from a change of vendor, for example) can affect the accuracy of the data and the AI’s conclusions.

At the project level, it’s a good practice to separate data into raw analysis and summary analysis. Noting the distinction allows researchers to drill down to the raw data to check outcomes or to perform new analysis without destroying the original data.

Before implementing an AI application is also a good time to de-duplicate files. Storage space is no longer unlimited. In fact, many organizations are beginning to think of storage as consumable and are starting to curate data rather than merely purchasing more storage. Curation entails keeping useful data and destroying data that can serve no further purpose. Take an instrument quality check, for example. If it’s good, store it with the rest of the data, but it’s unnecessary to store multiple failed quality tests while tweaking parameters before an acceptable check is accomplished.

If your LIMS doesn’t include historic data, add it as a secondary project. Including legacy data, especially if it is input from paper sources, will probably be tedious, but could be invaluable if data from prior years – that may not have seemed important at the time – sheds light on current questions.

Ideally, all the data will have been harmonized when the LIMS was first implemented throughout the organization, and the labs will still use the standardized terms consistently. In practice, some drift should be expected as time passes and personnel changes. Therefore, it’s worth the time for lab managers to ensure their lab’s interactions with the LIMS still reflect the agreed-upon standards.

Achieving actionable, robust results from AI analysis is possible only when the data it accesses is robust. That requires a harmonized LIMS that enables comprehensive AI analysis. Making the effort to clean the data now – and to ensure it remains clean – will pay off in the long run.