Artificial Intelligence (AI) within LIMS and the Laboratory

A Laboratory Information Management System (LIMS) provides the central repository for data produced in the laboratory. Mining that data using artificial intelligence (AI) allows managers to make decisions based on insights. But what do we mean by AI, when and how can a laboratory start to incorporate it in its processes, and when will it be all pervasive in the laboratory?

What is AI?                           

People misuse the term ‘AI’; sometimes because they misunderstand what it is, but often because they want to hype up the subject.  One definition (from the Oxford English Dictionary) is “the theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.” But this does not clearly capture what it means for a software program to be ‘intelligent’. John McCarthy’s 2004 definition is closer to the truth, “It is the science and engineering of making intelligent machines, especially intelligent computer programs….” However, there are more fundamental questions including what on earth is intelligence and how will we know if a machine displays it?

Can a lump of silicon think in the same way as a human. Well, no, well maybe not yet at least! But perhaps we are getting closer. Artificial intelligence as it is often understood combines computer science and robust datasets, to enable problem-solving. This can greatly expand the power of business analytics to find patterns and answers in huge data sets and supports and expands on the concepts of machine learning. Deep learning algorithms help to eliminate some of the data pre-processing that has typically been involved with machine learning. These algorithms can ingest and process unstructured data, like text and images, and automate feature extraction, removing some dependency on human experts.

However, actual intelligence is probably more complex than this. It’s about not following rules; it’s not just about translating a language, it’s about knowing when to create a new word where the current language does not quite cut it, it’s about adapting to the environment you find yourself in. Intelligence is not about using complex mathematical probability to predict what the next word in an article will be and repeating this process until the article is completed. It’s about learning from the dataset, adapting to new inputs, and recognising patterns in the data.

Whilst true AI is the long-term goal, the common technologies in use today are Machine Learning and Advanced Business Analytics. The term AI is often used interchangeably with these concepts. However, we may be closer than we think to a computer passing the Turing Test. This was proposed by Alan Turing, who many consider the father of modern computing, in his 1950 paper “Computing Machinery and Intelligence” the essence of which is “Can machines think”.

Using AI In Laboratories Today

Business analytics and machine learning powered by concepts behind AI do exist and we are moving forward quickly in many fields. The UK’s National Health Service (NHS) has an AI Lab to foster its use. Programmes include the use of imaging software to detect likely cancerous lesions for further investigation. Hitherto an intensively manual task, automation quickly draws attention to the small percentage of possible images to investigate further. Automating image interpretation is a natural place to start but where next? And how does this apply to a general laboratory?

Data Lakes and Evolution Will Drive AI Adoption

Having robust datasets to work on are key to all current AI solutions. Easy you might say as, because of digitization, laboratories are adopting laboratory information management systems (LIMS). But, along with sample data, laboratories must also keep relevant sample metadata. So what is metadata and what do I need to keep? And that is the 10,000-dollar question. Meta data is information that gives sample data context: Where is it from? How was it collected? Is it related to other samples? How is it stored? Who is the custodian? And so on. Different data will have different metadata. For instance, if you are collecting surveillance data from herds of cows to check for bovine spongiform encephalopathy (BSE, or mad cow disease) relevant meta data might include breed, geographic location, ZIP/postcode, herd statistics, related animals, animal feed used, insemination type, even veterinary case history. If it is a water testing laboratory then you would be interested in meta data around the sampling point, date and time, location, sample route, sampler and so forth.

Once we have all this data, we need somewhere to store it so that it is easily accessible. This is where Data Warehouses and Data Lakes come into it. These allow large volumes of data, generally from many different sources, to be brought together. Typically, it is on these large collections of data that the Machine Learning and AI algorithms will run. However, bringing all this data together can reveal another important data issue, that of data compatibility. This can be as simple as ensuring that the data is in a standard format, to as complex as ensuring that data items from different systems are identified in the same way. This is an issue that, for example, has plagued the clinical arena where problems such as multiple identifiers often exist for the same test. This is where data standardisation and data standards become important.

Data Analysis

The function of laboratories outside of the R&D area is to analyse samples and report results. The primary output is often a form of certificate of analysis or report. It is not therefore surprising that many laboratories have not invested in general database analysis functionality for the LIMS. Such tools do exist though. LIMS suppliers tend to use PowerBI or Tableau as their go to analytics tools. PowerBI is particularly useful as it is ‘free’, at least at a basic level, for corporate clients who already have Microsoft.

Using Data Analytics tools is one step on the road to AI. You need to ask defined questions to extract the answers you want, however. Show me where clusters of BSE are occurring? Which rivers are the most contaminated in my water samples, and which farmers are applying pesticides nearby? Analytics tools are good at finding trends and outliers in big data sets; an increasing failure rate on certain tests when operator A is in the lab; instrument Y goes down more often than most instruments of the same type. However, is this enough to answer the real question which is why would labs bother making the investment in Business Analytics, Machine Learning and AI?

Non-R&D laboratories, in general, exist to perform a testing function and report on it. They may need to collate information for audit purposes, but all too often they may not think that asking what-if questions is within their remit. No matter how much metadata you collect to find correlations and peaks many will not bother to look, because that is not in their job description.

Where is AI Gaining Ground?

Accountants often dictate where data analytics is most used because they think they know where there will be a payback on the money invested.  In the laboratory field R&D laboratories, rather than QC laboratories, are more likely to keep and repurpose their data. In the pharmaceutical industry it is not uncommon for data results from one drug development programme to be re-screened against different limits or used in other development programmes, for instance. Directing deep learning programmes at data lakes within large Pharma companies is one area where data analytics, and their modern offspring AI, will pay dividends.

Much, if not everything, depends on the quality of the data, and especially the metadata. The old adage of rubbish in equals rubbish out has never been truer. Millions of dollars can be wasted throwing new ideas against a wall to see what sticks, no matter how automated or ‘intelligent’ the data analytics process, if the data is not up to scratch.

Laboratory Automation and AI Drives Efficiency

Robotic systems are increasingly being used for tasks such as sample handling, liquid dispensing, and data analysis, reducing human error, and increasing throughput. Over time AI algorithms will assess the overall workload and juggle resource allocation in order maximize overall efficiency.

AI could help with predictive maintenance of laboratory equipment, predicting when instruments might fail or require maintenance, ensuring downtime is reduced to a minimum. In a similar way AI can be used for real-time monitoring and quality control in manufacturing processes, identifying when results start to drift and ensuring tighter process tolerances. Such adaptive learning techniques will help improve bottom line profitability and are a relatively small step to the more ‘fixed’ process limits we use today.

A Golden Tomorrow

As we have discussed the typical QC laboratory using a LIMS records results for very specific quality control functions. The strength of alcohol in beer, ensuring food is safe to eat, safeguarding river water quality, checking the purity and value of a precious metal, and so on.

These laboratories see the value of a LIMS for automation and efficiency (going paperless, speeding the reporting function, integrating systems, and so forth). Data Analytics within these laboratories is usually highly directed; help me get my monthly report; show me which tests are most/least profitable for the lab; who could do with more training? Data Analytics, in the form of PowerBI or similar, provides the algorithms required, and even a natural search function, to help derive that data quickly.

Few non-R&D laboratories though are resourced to research their data to learn new ideas from it. Thus, the science in this area will move forward more slowly. The vanguards are, however, pharmaceutical companies who are repurposing and rigorously interrogating their data. You can easily imagine though that the large datasets such as those of the UK’s NHS would be a rich source of data to help in developing new treatments (say in Parkinson’s disease or Cystic Fibrosis). Proposals such as this do, however, raise issues of data confidentiality and ownership; something that the implementation of AI in various areas has brought to the fore.

The emphasis for most laboratory organisations today should be on improving the quality of their data lake so that it can be used for such purposes in the future. Ensuring your lab uses a LIMS and keeps data digitally prepares it for data analysis, or directed searches, looking for paybacks in efficiency either within the laboratory or across the business operations. Directed searches using data analytics tools will be enough for many, but AI and deep learning is increasingly important within the R&D laboratory, where using data to answer new questions, and re-purposing past work for future drugs, creates a new profit stream for the business. AI will only slowly penetrate the wider laboratory community as AI becomes easier to integrate and will likely be an extension of the Data Analytics tools you already use as part of your LIMS solution today.