{"ID":79638,"post_author":"9412100","post_date":"2019-04-01 14:02:48","post_date_gmt":"2019-04-01 18:02:48","post_content":"","post_title":"LIMSjournal - Spring 2019","post_excerpt":"","post_status":"publish","comment_status":"closed","ping_status":"closed","post_password":"","post_name":"limsjournal-spring-2019","to_ping":"","pinged":"","post_modified":"2019-04-01 14:56:39","post_modified_gmt":"2019-04-01 18:56:39","post_content_filtered":"","post_parent":0,"guid":"https:\/\/www.limsforum.com\/?post_type=ebook&p=79638","menu_order":0,"post_type":"ebook","post_mime_type":"","comment_count":"0","filter":"","_ebook_metadata":{"enabled":"on","private":"0","guid":"E2B4928C-F4A4-44C6-B7F1-9E2CB8EC103F","title":"LIMSjournal - Spring 2019","subtitle":"Volume 5, Issue 1","cover_theme":"nico_7","cover_image":"https:\/\/www.limsforum.com\/wp-content\/plugins\/rdp-ebook-builder\/pl\/cover.php?cover_style=nico_7&subtitle=Volume+5%2C+Issue+1&editor=Shawn+Douglas&title=LIMSjournal+-+Spring+2019&title_image=https%3A%2F%2Fwww.limsforum.com%2Fwp-content%2Fuploads%2FFig1_Talia_JOfCloudComp2019_8.png&publisher=LabLynx+Press","editor":"Shawn Douglas","publisher":"LabLynx Press","author_id":"26","image_url":"https:\/\/www.limsforum.com\/wp-content\/uploads\/Fig1_Talia_JOfCloudComp2019_8.png","items":{"15ab90bc3c6b03e3f0954255a3ab8dc7_type":"article","15ab90bc3c6b03e3f0954255a3ab8dc7_title":"Research on information retrieval model based on ontology (Yu 2019)","15ab90bc3c6b03e3f0954255a3ab8dc7_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Research_on_information_retrieval_model_based_on_ontology","15ab90bc3c6b03e3f0954255a3ab8dc7_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Research on information retrieval model based on ontology\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nResearch on information retrieval model based on ontologyJournal\n \nEURASIP Journal on Wireless Communications and NetworkingAuthor(s)\n \nYu, BinbinAuthor affiliation(s)\n \nJilin University, Beihua UniversityPrimary contact\n \nEmail: yubinbin80 at sina dot comYear published\n \n2019Volume and issue\n \n2019Page(s)\n \n30DOI\n \n10.1186\/s13638-019-1354-zISSN\n \n1687-1499Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttps:\/\/jwcn-eurasipjournals.springeropen.com\/articles\/10.1186\/s13638-019-1354-zDownload\n \nhttps:\/\/jwcn-eurasipjournals.springeropen.com\/track\/pdf\/10.1186\/s13638-019-1354-z (PDF)\n\n\n\n\n \n This article contains rendered mathematical formulae. You may require the TeX All the Things plugin for Chrome or the Native MathML add-on and fonts for Firefox if they don't render properly for you. \n\n\nContents\n\n1 Abstract \n2 Introduction \n3 Methods \n4 Based on the domain ontology information retrieval model \n\n4.1 Ontology documents processing \n4.2 Ontology document retrieval \n\n\n5 Experiment and results \n\n5.1 The experimental design of the information retrieval model based on ontology \n5.2 Analysis of experimental results \n\n\n6 Conclusion \n7 Abbreviations \n8 Declarations \n\n8.1 Acknowledgements \n\n8.1.1 Funding \n8.1.2 Availability of data and materials \n8.1.3 Author\u2019s contributions \n8.1.4 Author\u2019s information \n8.1.5 Competing interests \n\n\n\n\n9 References \n10 Notes \n\n\n\nAbstract \nAn information retrieval system not only occupies an important position in the network information platform, but also plays an important role in information acquisition, query processing, and wireless sensor networks. It is a procedure to help researchers extract documents from data sets as document retrieval tools. The classic keyword-based information retrieval models neglect the semantic information which is not able to represent the user\u2019s needs. Therefore, how to efficiently acquire personalized information that users need is of concern. The ontology-based systems lack an expert list to obtain accurate index term frequency. In this paper, a domain ontology model with document processing and document retrieval is proposed, and the feasibility and superiority of the domain ontology model are proved by the method of experiment.\nKeywords: ontology, information retrieval, genetic algorithm, sensor networks\n\nIntroduction \nInformation retrieval is the process of extracting relevant documents from large data sets. Along with the increasing accumulation of data and the rising demand of high-quality retrieval results, traditional information retrieval techniques are unable to meet the task of high-quality search results. As a newly emerged knowledge organization system, ontology is vitally important in promoting the function of information retrieval in knowledge management.\nExisting information retrieval models, such as the vector space model (VSM)[1], are based on certain rules to model text in pattern recognition and other fields. For example, a VSM splits, filters, and classifies text that looks very abstract and using certain rules calculates statistics such as word frequency.\nProbability models[2] mainly rely on probabilistic operation and Bayes rules to match data information, in which the weight values of feature words are all multivalued. The probabilistic model uses the index word to represent the user\u2019s interest, that is, the personalized query request submitted by the user. Meanwhile, there is no vocabulary set with a standard semantic feature and document label. Traditional weighted strategies lack semantic information of the document, which is not representative for the document description. On the basis of semantic annotation results, weighted item frequency[3] and domain ontology of the semantic relation are used to express the semantics of the document.[4]\nThe VSM and probability model can simplify the text processing into a vector space or probability set. It uses the \"term frequency\" property to describe the number of occurrences of query words in the paper. Considering the particularity of document segmentation, the word in different sections has a different weight of summarization for the paper, meaning that calculating word appearance is not sufficient. Meanwhile, there is no vocabulary set with standard semantic features and document labels.\nThe introduction of ontology into the information retrieval system can query users\u2019 semantic information based on ontology and better satisfy users\u2019 personalized retrieval needs.[5] Short of a vocabulary set with semantic description, attempts at a logic view of user information demand are insufficient to express the semantic of the user\u2019s requirement. In such an information retrieval model, even if we choose the appropriate sort function R (R is the reciprocal of the distance between points), the logical view cannot represent the requirements of the document and the user, and the retrieval results will be unconvincing to the user.\nIn order to improve the accuracy and efficiency of user retrieval, we build a model based on information retrieval and a domain ontology knowledge base. The ontology-based information retrieval system provides semantic retrieval, while the keyword-based information retrieval system calculates a better factor set in document processing, with better recall and precision results.\nIn order to accomplish this, a genetic algorithm was designed and implemented. A genetic algorithm is a kind of search method that refers to the evolution rule of the biological world. It mainly includes coding mechanisms and control parameters. The genetic algorithm provides a heuristic method which simulates the population evolution by searching through the solution space in each selection, crossover, and mutation to select an optimal factor set by combinations of factors. The option-weighted factor, tuned by a training set using genetic algorithms, is applied to a practical retrieval system.[6]\nDomain ontology was applied as the base of semantic representation to effectually represent user requirement and document semantics. Domain ontology involves the detailed description of domain conceptualization which expresses the abstract object, relation, and class in one vocabulary set.[7]\nDesigning and implementing the information retrieval system was composed of two parts: document processing and document retrieval. In this information retrieval model, an ontology server is added to tag and index the retrieval sources based on ontology; the query conversion module implements semantic processing in users\u2019 needs and expanses the initial query on its synonym, hypernym, and its senses. The retrieval agent module uses the conversion of queries for retrieving the information source.\nWe've already provided an overview of an ontology-based information retrieval system. The next part introduces the relevant work and methods of this study. The third part discusses the design of an information retrieval model based on domain ontology. The fourth part details the experimental study and analyses of the results. The final part summarizes the full text and declares related issues that need further study.\n\nMethods \nFaced with the problem of managing a large volume of data in a network, it remains vital for users to acquire information accurately and efficiently. So far, retrieval methods have been developed using various mathematical models. The classical information retrieval models include the Boolean model[8], probability model[9], vector model[10], binary independent retrieval model, and BM25 model. The following are the solutions of these models.\nSuppose ki is the index term, dj is the document, wi,j\u2009\u2265\u20090 is the weight of tuples (ki, dj), which is the significance of ki to dj semantic contents. Let t refer to the number of index terms. K\u2009=\u2009{k1, \u2026, kt} is index term set. If an index term does not appear in the document, then wi,j\u2009=\u20090. So the document dj is represented by an index term vector \n \n \n \n \n \n \n d\n →\n \n \n \n j\n \n \n \n \n {\\displaystyle {\\overset {\\rightarrow }{d}}_{j}}\n \n :\n\n \n \n \n \n \n \n d\n →\n \n \n \n j\n \n \n =\n \n (\n \n \n w\n \n 1\n j\n \n \n ,\n \n w\n \n 2\n j\n \n \n ,\n \n w\n \n 3\n j\n \n \n ,\n …\n ,\n \n w\n \n t\n j\n \n \n \n )\n \n \n \n {\\displaystyle {\\overset {\\rightarrow }{d}}_{j}=\\left(w_{1j},w_{2j},w_{3j},\\ldots ,w_{tj}\\right)}\n \n \n\r\n\nThe Boolean model is a classical information retrieval (IR) model based on set theory and Boolean algebra. Boolean retrieval can be effective if a query requires unambiguous selection.[11] But it can only result in whether the document is related or not related. The Boolean model lacks the ability to describe the situation that query words partially match a paper. The similarity result of document dj and query q is binary, either 0 or 1. The binary value has limitations and the Boolean queries are hard to construct.\nThe VSM, which is proposed earlier by Salton, is based on the vector space model theory and vector linear algebra operation, which abstract the query conditions and text into vectors in the multidimensional vector space. The multi-keyword matching here can express the meaning of the text more.[1] Compared with the Boolean model, the VSM calculates relevant document ranking by comparing the angle relating similarity between the vector of each document and the original query vector in the spatial representation.\nThe probabilistic model[2] mainly relies on probabilistic operation and Bayes rules to match data information. The probabilistic model not only considers the internal relations between keywords and documents, but it also retrieves texts based on probability dependency. The model, usually based on a group of parameterized probability distributions, consumes the internal relation between keywords and documents and retrieves according to probabilistic dependency. The model requires strong independent assumptions for tractability.\nThe binary independence retrieval model[12] is evolved from the probabilistic model with better performance. Assuming that document D and index term q is described in a two-valued vector (x1, x2, \u2026 xn), if index term ki\u2009\u2208\u2009D, then xi\u2009=\u20091; otherwise, xi\u2009=\u20090. The correlation function of index term and document are shown below.\n\n \n \n \n \n S\n i\n m\n \n \n (\n \n D\n ,\n q\n \n )\n \n =\n ∑\n log\n ⁡\n \n \n \n \n p\n \n i\n \n \n \n (\n \n 1\n −\n \n q\n \n i\n \n \n \n )\n \n \n \n \n q\n \n i\n \n \n \n (\n \n 1\n −\n \n p\n \n i\n \n \n \n )\n \n \n \n \n \n \n {\\displaystyle {Sim}\\left(D,q\\right)=\\sum \\log {\\frac {p_{i}\\left(1-q_{i}\\right)}{q_{i}\\left(1-p_{i}\\right)}}}\n \n \n\r\n\nHere, pi =\u2009ri\/r, qi =\u2009(fi \u2212\u2009ri)\/(f\u2009\u2212\u2009r), f refers to amount of documents in the training document set. r is the number of documents related to the user query in the training document set. fi represents a number of documents, including index term ki in the training document set. Ri is the number of documents, including ki in r relation documents.\nThe Okapi BM25 model is called BM25, which is an algorithm based on the probabilistic retrieval model. The Okapi BM25 model[13][14] is a model developed from the probabilistic model that incorporates term frequency and length normalization. The local weights are computed as parameterized frequencies, including term frequency, document frequency, and global weights as RSJ weights. Local weights are based on a 2D Poisson model, while the global weights are based on the Robertson-Sp\u00e4rck-Jones Probabilistic Model. By reducing the number of parameters to be learned and approximated, based on these heuristic techniques, BM25 often achieves better performance compared to TF-IDF (term frequency\u2013inverse document frequency).\n\nBased on the domain ontology information retrieval model \nThe concept of domain ontology has a relation to other concepts simultaneously. The interrelation between concepts of the semantic relative network implements synonym expansion retrieval, semantic entailment expansion, and semantic correlation expansion. We introduce a domain ontology information retrieval model to apply ontology into the traditional information retrieval model by query expansion to improve efficiency.\nAn illustration of the structure for the information retrieval model is shown in Fig. 1.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 1 An illustration of structure for information retrieval mode.\n\n\n\nThe system consists of two parts: ontology document processing (including domain ontology servers, data source, document process unit, and information database) and ontology document retrieval (including domain ontology server, query transition, custom process, and retrieval agent).\n\nOntology documents processing \nDocument processing extracts useful information from an unstructured text message and establishes mapping relations between document terms and concepts based on domain ontology.[15] Document processing is shown in Fig. 2.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 2 Ontology document processing\n\n\n\nIn the preprocessing procedure, each document in the document set implements vocabulary, analyzes words, and filters numbers, hyphens, and punctuation. Using a stop word list removes function words to leave useful words such as nouns and verbs.[16] Extracting stem words and removing the prefix and suffix improve the accuracy of retrieval. Finally, determining certain words as an index element expresses literature content conception.\nAnnotating semantic on a retrieved object by analyzing characteristic vocabulary builds the mapping relation between words and concepts. First, characteristic words are extracted and the weight of each word is calculated by counting word frequency to distinguish the importance of words. In this paper, the genetic algorithm is used to calculate the best weighting factor. In the end, it is applied to the actual retrieval system.\nThe system automatically learns weighted factor by genetic algorithm. It is a heuristic method which simulates biological evolution processes and through factor mutation eliminates the non-ideal factor sets and leaves the optimal factor set. The algorithm tries to maximize the fitness function as a parameter estimation to search a population consisting of the fittest individual; in our case, those are the parameters of the weighted term in retrieving. In Fig. 3, the pseudo-code of genetic algorithm for weighted term frequency is described.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 3 Pseudo-code for select weight factor by genetic algorithm\n\n\n\nThis algorithm simulates the evolution process by gradually adjusting weight factor and eliminating factor combination with a low fitness value. If the fitness result for one combination is lower than the other one, this group will be likely excluded in the next generation. To avoid the local optimization, we select many original generations and decrease the unqualified group time by time. In each iteration, the factor interval lies in [wi \u2212\u20090.2, wi\u2009+\u20090.4] to lower the negative factors. Fitness function P(t) determines how fit an individual is with new weighted combination (w'tit,\u2009w'key,\u2009w'abs). The traditional factor set is replaced by P(t) with higher fitness, then calculated with a query word for the similarity of each paper, and the rank list is generated. The penalty function f is used to get the distance of the expert list.\nThen, for each semantic meaning of ontology term, whether it exists in the extracting characteristic vocabulary is checked. If the semantic exists, the document and weight of the semantic term is calculated to manifest the text with semantic information.\nAfter document feature extraction, a document index based on the concept to reflect the internal relation between text index terms is established, and ambiguity during annotation is excluded. An index based on the concept consists of feature words with their relation given by semantic parsing. Feature words connect through ontology instance and documents. The structure of the ontology concept index is shown in Fig. 4.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 4 Index structure based on the ontology concept\n\n\n\nOntology document retrieval \nThe procedure of document retrieval is listed below:\n\nThe user inputs search words or phrases in the search interface, then the system removes function words and reserves the nouns and verbs. Term extraction from words is implemented to get semantic conceptual words and phrases. The result is passed to the query transition module.\nThe query transition module sends the results to the ontology server to search for a corresponding semantic concept, including hypernym, hyponym, synonym, and conceptual meanings.[17] If the word is not found in the ontology database, it prompts the user to adjust the retrieval strategy.\nFor the matching concept in domain ontology, the query transition module implements search, semantic judgment, and query extension to add semantic information to the query. The module submits the query to a retrieval agent for searching. For words with an uncertain semantic message, it executes a keyword matching method to search.\nHandled by the custom process module, the user interface then lists query results according to exact word, synonym, hypernym, and hyponym words.\nBefore the retrieval process, the system executes semantic analysis for the user query request. A keyword is extracted from stop words, and the determination of whether or not the keyword belongs to the ontology database is made. Through combining concepts in the ontology library, more semantic information is obtained by semantic reasoning. The pseudo-code of a query using the semantic analysis algorithm is shown in Fig. 5.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 5 Pseudo-code for the query semantic analysis algorithm\n\n\n\nAfter applying semantic analysis on the user request, semantic information is able to be used in the retrieval strategy. The pseudo-code of information retrieval algorithm is shown in Fig. 6.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 6 Pseudo-code for the information retrieval algorithm\n\n\n\nExperiment and results \nThe experimental design of the information retrieval model based on ontology \nIn order to evaluate the performance of the information retrieval model based on ontology, it is necessary to use ontology tools for modelling, such as Prot\u00e9g\u00e9[18] as an ontology modeling tool, ICTCLAS[19] as a word segmentation tool, Jena [20] as a semantic parsing tool, and Lucene as a semantic indexing tool.[20]\nThe data set contains 1000 scientific papers, as well as papers from the IEEE digital library, which are used to extract the core concepts in the domain ontology. Then the final conceptualization system is established. The literature is divided into 10 groups. Each group contains 100 papers related to a query subject or keywords (e.g., \"computer architecture\" and \"operating system\"). Therefore, 10 expert rank lists are available for retrieval.\nThe evaluation criterion considers the similarity of each paper towards every query word. For example, the mistaken sort term distance of the top neighboring papers is higher than the ones of the lowest papers. The formula below is used to collect the distance within rank list R and R':\n\n \n \n \n P\n (\n t\n )\n =\n \n \n \n \n ∑\n \n i\n =\n 1\n \n \n n\n \n \n \n [\n \n \n (\n \n n\n −\n i\n \n )\n \n ×\n \n d\n i\n s\n \n (\n i\n )\n \n ]\n \n \n \n \n ∑\n \n i\n =\n 1\n \n \n ⌊\n \n \n n\n 2\n \n \n ⌋\n \n \n \n [\n \n \n (\n \n n\n −\n i\n \n )\n \n ×\n i\n \n ]\n \n +\n \n ∑\n \n ⌊\n \n \n n\n 2\n \n \n ⌋\n \n \n n\n \n \n \n [\n \n \n (\n \n n\n −\n i\n \n )\n \n \n 2\n \n \n ]\n \n \n \n \n \n \n {\\displaystyle P(t)={\\frac {\\sum \\limits _{i=1}^{n}\\left\\lbrack \\left(n-i\\right)\\times {dis}(i)\\right\\rbrack }{\\sum \\limits _{i=1}^{\\lfloor {\\frac {n}{2}}\\rfloor }\\left\\lbrack \\left(n-i\\right)\\times i\\right\\rbrack +\\sum \\limits _{\\lfloor {\\frac {n}{2}}\\rfloor }^{n}\\left\\lbrack \\left(n-i\\right)^{2}\\right\\rbrack }}}\n \n \nHere, n represents the paper numbers in the rank list. The dis(i) represents the position distance for paper i in the rank list and expert rank list. P(t) represents the distance between the two rank lists of the denominator specification.\n\nAnalysis of experimental results \nThe genetic algorithm with simulated annealing method is compared in relation to iteration numbers and average distance of the rank list. The result is shown in Fig. 7. \n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 7 Comparison of the simulated annealing and genetic algorithm in average distance and iteration times\n\n\n\nThe X-axis time is the number of iterations in two algorithms, and the Y-axis average distance is calculated by the formula prior, demonstrating the difference of the ranking list with the expert list. After 200 iterations, the average distance is close to overall optimal. The algorithm deduces the optimized weight combination of factors which are wtit\u2009=\u20093, wabs\u2009=\u20092, wkey\u2009=\u20090.6.\nThe different threshold similarity value \u03b6 is taken, in which \u03b6\u2009=\u20090.5 means sim(Sq,\u2009Sj)\u2009\u2265\u20090.55. Every experiment counts retrieval documents set results |A|, ontology relevant documents |B|, and user query relevant document in the retrieval set |A\u2009\u2229\u2009B| to calculate the precision and recall rate. The result is shown in Table 1.\n\n\n\n\n\n\n\nTable 1. Precision and recall rate of ontology retrieval\n\n\n\nThreshold\n\n\u03b6\u2009=\u20090.5\n\n\u03b6\u2009=\u20090.55\n\n\u03b6\u2009=\u20090.6\n\n\nGroup num.\n\nPrecison\n\nRecall\n\nPrecision\n\nRecall\n\nPrecision\n\nRecall\n\n\n1\n\n84.50%\n\n83.36%\n\n100.00%\n\n82.45%\n\n100.00%\n\n81.85%\n\n\n2\n\n38.92%\n\n100.00%\n\n93.12%\n\n100.00%\n\n100.00%\n\n51.00%\n\n\n3\n\n74.35%\n\n100.00%\n\n94.65%\n\n94.43%\n\n99.12%\n\n94.65%\n\n\n4\n\n83.23%\n\n100.00%\n\n93.68%\n\n100.00%\n\n96.34%\n\n45.74%\n\n\n5\n\n51.36%\n\n100.00%\n\n95.44%\n\n100.00%\n\n100.00%\n\n100.00%\n\n\nAverage\n\n66.47%\n\n96.67%\n\n95.38%\n\n95.38%\n\n99.09%\n\n74.65%\n\n\n\nThe precision rate improves with the threshold increasing. The precision rate reaches more than 99% when \u03b6\u2009=\u20090.6. However, the recall rate only reaches 74%, which means the query result lost the critical information.\nWhen \u03b6\u2009=\u20090.5, the recall rate maintains a higher rate while precision remains low. Because of the system search, all the documents have ontology which relates with a query. The \u03b6\u2009=\u20090.55 balance both the precision rate and recall rate.\n\nConclusion \nIn order to better satisfy users\u2019 retrieval needs and optimize the performance of information retrieval, domain ontology is introduced into the information retrieval system. In this paper, an information retrieval model based on domain ontology is proposed. The system includes document processing and ontology document retrieval with the ontology server, information database, and query transition and retrieval agent modules. We present a genetic algorithm to calculate the optimum combination of weighted factors of word frequency. Based on the evaluation criterion, we applied the system to query documents and compare with expert lists. In the end, the genetic algorithm shortens the distance compared with simulated annealing, and the ontology retrieval model exhibits a better precision and recall rate to understand the users\u2019 requirements.\nIn the future, we wish to further implement an automatic or semi-automatic method such as data mining to an established ontology database to prevent the high difficulty in ontology establishment. And we may further implement modeling personalized query preferences and return retrieval results according to different user query demands.\n\nAbbreviations \nIR: Information retrieval\nVSM: Vector space model\n\nDeclarations \nAcknowledgements \nThis work is supported by the Science and Technology Research Project of the Department of Education of Jilin Province (Grant 201657).\n\nFunding \nThe Science and Technology Research Project of Department of Education of Jilin Province (Grant 201657).\n\nAvailability of data and materials \nThe data are included in this published article.\n\nAuthor\u2019s contributions \nThe manuscript was written through contributions of the author. The author read and approved the final manuscript.\n\nAuthor\u2019s information \nBinbin Yu: Ph.D. candidate, College of Computer Science and Technology, Jilin University. Lecturer, College of Information Technology and Media, Beihua University. His research interests include network security and so on.\n\nCompeting interests \nThe author declares that he has no competing interests.\n\nReferences \n\n\n\u2191 1.0 1.1 Tang, M.; Bian, Y.; Tao, F. (2010). \"The Research of Document Retrieval System Based on the Semantic Vector Space Model\". Journal of Intelligence 5 (29): 167\u201377. http:\/\/en.cnki.com.cn\/Article_en\/CJFDTOTAL-QBZZ201005036.htm .   \n\n\u2191 2.0 2.1 Ma, C.; Liang, W.; Zheng, M. et al. (2016). \"A Connectivity-Aware Approximation Algorithm for Relay Node Placement in Wireless Sensor Networks\". IEEE Sensors Journal 16 (2): 515-528. doi:10.1109\/JSEN.2015.2456931.   \n\n\u2191 Yang, X.Q.; Yang, D.; Yuan, M. (2014). \"Scientific Literature Retrieval Model Based on Weighted Term Frequency\". Proceedings of the 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing: 427\u2013430. doi:10.1109\/IIH-MSP.2014.113.   \n\n\u2191 Xu, M.; Yang, Q.; Kwak, K.S. (2016). \"Distributed Topology Control With Lifetime Extension Based on Non-Cooperative Game for Wireless Sensor Networks\". IEEE Sensors Journal 16 (9): 3332-3342. doi:10.1109\/JSEN.2016.2527056.   \n\n\u2191 Yang, Y.; Du, J.P.; Ping, Y. (2015). \"Ontology-based intelligent information retrieval system\". Journal of Software 26 (7): 1675\u201387. https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=3408856 .   \n\n\u2191 Lu, T.; Liang, M. (2014). \"Improvement of Text Feature Extraction with Genetic Algorithm\". New Technology of Library and Information Service 30 (4): 48\u201357. doi:10.11925\/infotech.1003-3513.2014.04.08.   \n\n\u2191 Vallet, D.; Fern\u00e1ndez, M.; Castells, P. (2005). \"An Ontology-Based Information Retrieval Model\". Proceedings from ESWC 2005, The Semantic Web: Research and Applications: 455\u201370. doi:10.1007\/11431053_31.   \n\n\u2191 Manning, C.D.; Raghavan, P.; Sch\u00fctze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. doi:10.1017\/CBO9780511809071. ISBN 9780511809071.   \n\n\u2191 Jones, K.S.; Walker, S.; Robertson, S.E. (2000). \"A probabilistic model of information retrieval: Development and comparative experiments: Part 1\". Information Processing & Management: 779\u2013808. doi:10.1016\/S0306-4573(00)00015-7.   \n\n\u2191 Wong, S.K.M.; Ziarko, W.; Wong, P.C.N. (1985). \"Generalized vector spaces model in information retrieval\". Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: 18\u201325. doi:10.1145\/253495.253506.   \n\n\u2191 Baeza-Yates, R.; Ribeiro-Neto, B. (1999). Modern Information Retrieval. Addison Wesley. pp. 544. ISBN 9780201398298.   \n\n\u2191 Premalatha, R.; Srinivasan, S. (2014). \"Text processing in information retrieval system using vector space model\". Proceedings from the 2014 International Conference on Information Communication and Embedded Systems: 1\u20136. doi:10.1109\/ICICES.2014.7033837.   \n\n\u2191 Voorhees, E.M.; Harman, D.K., ed. (2005). TREC: Experiment and Evaluation in Information Retrieval. MIT Press. pp. 368. ISBN 9780262220736.   \n\n\u2191 Pereira, R.A.M.; Molinari, A.; Pasi, G. (2005). \"Contextual weighted representations and indexing models for the retrieval of HTML documents\". Soft Computing 9 (7): 481-92. doi:10.1007\/s00500-004-0361-z.   \n\n\u2191 Zhang, K.; Nan, K.; Ma, Y. (2008). \"Research on ontology-based information retrieval system models\". Application Research of Computers 8 (25): 2241-49. https:\/\/www.oriprobe.com\/journals\/jsjyyyj\/2008_8.html .   \n\n\u2191 Kim, H.; Han, S.-W. (2015). \"An Efficient Sensor Deployment Scheme for Large-Scale Wireless Sensor Networks\". IEEE Communications Letters 19 (1): 98\u2013101. doi:10.1109\/LCOMM.2014.2372015.   \n\n\u2191 Messerly, J.J.; Heidorn, G.E.; Richardson, S.D. et al. (07 March 1997). \"Information retrieval utilizing semantic representation of text by identifying hypernyms and indexing multiple tokenized semantic structures to a same passage of text\". Google Patents. https:\/\/patents.google.com\/patent\/US6161084 .   \n\n\u2191 Ke\u00dfler, C.; Raubal, M.; Wosniok, C. (2009). \"Semantic Rules for Context-Aware Geographical Information Retrieval\". Proceedings from EuroSSC 2009 Smart Sensing and Context: 77\u201392. doi:10.1007\/978-3-642-04471-7_7.   \n\n\u2191 Cao, Y.-G.; Cao, Y.-Z.; Jin, M.-Z.; Liu, C. (2006). \"Information Retrieval Oriented Adaptive Chinese Word Segmentation System\". Journal of Software 3 (17). http:\/\/en.cnki.com.cn\/Article_en\/CJFDTOTAL-RJXB200603003.htm .   \n\n\u2191 Castells, P.; Fern\u00e1ndez, M.; Vallet, D. et al. (2005). \"Self-tuning Personalized Information Retrieval in an Ontology-Based Framework\". Proceedings from On the Move to Meaningful Internet Systems 2005: 977\u2013986. doi:10.1007\/11575863_119.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. Grammar and punctuation was edited to American English, and in some cases additional context was added to text when necessary. In some cases important information was missing from the references, and that information was added.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Research_on_information_retrieval_model_based_on_ontology\">https:\/\/www.limswiki.org\/index.php\/Journal:Research_on_information_retrieval_model_based_on_ontology<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles (with rendered math)LIMSwiki journal articles on information retrievalLIMSwiki journal articles on sensor networks\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 13 February 2019, at 01:19.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 166 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","15ab90bc3c6b03e3f0954255a3ab8dc7_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Research_on_information_retrieval_model_based_on_ontology skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Research on information retrieval model based on ontology<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p>An information retrieval system not only occupies an important position in the network information platform, but also plays an important role in <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> acquisition, query processing, and wireless sensor networks. It is a procedure to help researchers extract documents from data sets as document retrieval tools. The classic keyword-based information retrieval models neglect the semantic information which is not able to represent the user\u2019s needs. Therefore, how to efficiently acquire personalized information that users need is of concern. The ontology-based systems lack an expert list to obtain accurate index term frequency. In this paper, a domain ontology model with document processing and document retrieval is proposed, and the feasibility and superiority of the domain ontology model are proved by the method of experiment.\n<\/p><p><b>Keywords<\/b>: ontology, information retrieval, genetic algorithm, sensor networks\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Information retrieval is the process of extracting relevant documents from large data sets. Along with the increasing accumulation of data and the rising demand of high-quality retrieval results, traditional information retrieval techniques are unable to meet the task of high-quality search results. As a newly emerged knowledge organization system, ontology is vitally important in promoting the function of information retrieval in <a href=\"https:\/\/www.limswiki.org\/index.php\/Information_management\" title=\"Information management\" class=\"wiki-link\" data-key=\"f8672d270c0750a858ed940158ca0a73\">knowledge management<\/a>.\n<\/p><p>Existing information retrieval models, such as the vector space model (VSM)<sup id=\"rdp-ebb-cite_ref-TangTheRes10_1-0\" class=\"reference\"><a href=\"#cite_note-TangTheRes10-1\">[1]<\/a><\/sup>, are based on certain rules to model text in pattern recognition and other fields. For example, a VSM splits, filters, and classifies text that looks very abstract and using certain rules calculates statistics such as word frequency.\n<\/p><p>Probability models<sup id=\"rdp-ebb-cite_ref-MaAConn16_2-0\" class=\"reference\"><a href=\"#cite_note-MaAConn16-2\">[2]<\/a><\/sup> mainly rely on probabilistic operation and Bayes rules to match data information, in which the weight values of feature words are all multivalued. The probabilistic model uses the index word to represent the user\u2019s interest, that is, the personalized query request submitted by the user. Meanwhile, there is no vocabulary set with a standard semantic feature and document label. Traditional weighted strategies lack semantic information of the document, which is not representative for the document description. On the basis of semantic annotation results, weighted item frequency<sup id=\"rdp-ebb-cite_ref-YangScient14_3-0\" class=\"reference\"><a href=\"#cite_note-YangScient14-3\">[3]<\/a><\/sup> and domain ontology of the semantic relation are used to express the semantics of the document.<sup id=\"rdp-ebb-cite_ref-XuDist16_4-0\" class=\"reference\"><a href=\"#cite_note-XuDist16-4\">[4]<\/a><\/sup>\n<\/p><p>The VSM and probability model can simplify the text processing into a vector space or probability set. It uses the \"term frequency\" property to describe the number of occurrences of query words in the paper. Considering the particularity of document segmentation, the word in different sections has a different weight of summarization for the paper, meaning that calculating word appearance is not sufficient. Meanwhile, there is no vocabulary set with standard semantic features and document labels.\n<\/p><p>The introduction of ontology into the information retrieval system can query users\u2019 semantic information based on ontology and better satisfy users\u2019 personalized retrieval needs.<sup id=\"rdp-ebb-cite_ref-YangOnto15_5-0\" class=\"reference\"><a href=\"#cite_note-YangOnto15-5\">[5]<\/a><\/sup> Short of a vocabulary set with semantic description, attempts at a logic view of user information demand are insufficient to express the semantic of the user\u2019s requirement. In such an information retrieval model, even if we choose the appropriate sort function <i>R<\/i> (<i>R<\/i> is the reciprocal of the distance between points), the logical view cannot represent the requirements of the document and the user, and the retrieval results will be unconvincing to the user.\n<\/p><p>In order to improve the accuracy and efficiency of user retrieval, we build a model based on information retrieval and a domain ontology knowledge base. The ontology-based information retrieval system provides semantic retrieval, while the keyword-based information retrieval system calculates a better factor set in document processing, with better recall and precision results.\n<\/p><p>In order to accomplish this, a genetic algorithm was designed and implemented. A genetic algorithm is a kind of search method that refers to the evolution rule of the biological world. It mainly includes coding mechanisms and control parameters. The genetic algorithm provides a heuristic method which simulates the population evolution by searching through the solution space in each selection, crossover, and mutation to select an optimal factor set by combinations of factors. The option-weighted factor, tuned by a training set using genetic algorithms, is applied to a practical retrieval system.<sup id=\"rdp-ebb-cite_ref-LuImprov14_6-0\" class=\"reference\"><a href=\"#cite_note-LuImprov14-6\">[6]<\/a><\/sup>\n<\/p><p>Domain ontology was applied as the base of semantic representation to effectually represent user requirement and document semantics. Domain ontology involves the detailed description of domain conceptualization which expresses the abstract object, relation, and class in one vocabulary set.<sup id=\"rdp-ebb-cite_ref-ValletAnOnt05_7-0\" class=\"reference\"><a href=\"#cite_note-ValletAnOnt05-7\">[7]<\/a><\/sup>\n<\/p><p>Designing and implementing the information retrieval system was composed of two parts: document processing and document retrieval. In this information retrieval model, an ontology server is added to tag and index the retrieval sources based on ontology; the query conversion module implements semantic processing in users\u2019 needs and expanses the initial query on its synonym, hypernym, and its senses. The retrieval agent module uses the conversion of queries for retrieving the information source.\n<\/p><p>We've already provided an overview of an ontology-based information retrieval system. The next part introduces the relevant work and methods of this study. The third part discusses the design of an information retrieval model based on domain ontology. The fourth part details the experimental study and analyses of the results. The final part summarizes the full text and declares related issues that need further study.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Methods\">Methods<\/span><\/h2>\n<p>Faced with the problem of managing a large volume of data in a network, it remains vital for users to acquire information accurately and efficiently. So far, retrieval methods have been developed using various mathematical models. The classical information retrieval models include the Boolean model<sup id=\"rdp-ebb-cite_ref-8\" class=\"reference\"><a href=\"#cite_note-8\">[8]<\/a><\/sup>, probability model<sup id=\"rdp-ebb-cite_ref-JonesAProb00_9-0\" class=\"reference\"><a href=\"#cite_note-JonesAProb00-9\">[9]<\/a><\/sup>, vector model<sup id=\"rdp-ebb-cite_ref-WongGener85_10-0\" class=\"reference\"><a href=\"#cite_note-WongGener85-10\">[10]<\/a><\/sup>, binary independent retrieval model, and BM25 model. The following are the solutions of these models.\n<\/p><p>Suppose <i>k<sub>i<\/sub><\/i> is the index term, <i>d<sub>j<\/sub><\/i> is the document, <i>w<sub>i,j<\/sub><\/i>\u2009\u2265\u20090 is the weight of tuples (<i>k<sub>i<\/sub><\/i>, <i>d<sub>j<\/sub><\/i>), which is the significance of <i>k<sub>i<\/sub><\/i> to <i>d<sub>j<\/sub><\/i> semantic contents. Let <i>t<\/i> refer to the number of index terms. <i>K<\/i>\u2009=\u2009{<i>k<sub>1<\/sub><\/i>, \u2026, <i>k<sub>t<\/sub><\/i>} is index term set. If an index term does not appear in the document, then <i>w<sub>i,j<\/sub><\/i>\u2009=\u20090. So the document <i>d<sub>j<\/sub><\/i> is represented by an index term vector <span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c80a5e3f257035a65b740ec162c48afa14b86a46'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -1.005ex; width:2.457ex; height:3.509ex;\" \/><\/span>:\n<\/p><p><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b74a4b43d2bbe9c8b5b87c779fe514edf5733d06'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -1.005ex; width:27.966ex; height:3.509ex;\" \/><\/span>\n<\/p><p><br \/>\nThe Boolean model is a classical information retrieval (IR) model based on set theory and Boolean algebra. Boolean retrieval can be effective if a query requires unambiguous selection.<sup id=\"rdp-ebb-cite_ref-Baeza-YatesModern99_11-0\" class=\"reference\"><a href=\"#cite_note-Baeza-YatesModern99-11\">[11]<\/a><\/sup> But it can only result in whether the document is related or not related. The Boolean model lacks the ability to describe the situation that query words partially match a paper. The similarity result of document <i>d<sub>j<\/sub><\/i> and query <i>q<\/i> is binary, either 0 or 1. The binary value has limitations and the Boolean queries are hard to construct.\n<\/p><p>The VSM, which is proposed earlier by Salton, is based on the vector space model theory and vector linear algebra operation, which abstract the query conditions and text into vectors in the multidimensional vector space. The multi-keyword matching here can express the meaning of the text more.<sup id=\"rdp-ebb-cite_ref-TangTheRes10_1-1\" class=\"reference\"><a href=\"#cite_note-TangTheRes10-1\">[1]<\/a><\/sup> Compared with the Boolean model, the VSM calculates relevant document ranking by comparing the angle relating similarity between the vector of each document and the original query vector in the spatial representation.\n<\/p><p>The probabilistic model<sup id=\"rdp-ebb-cite_ref-MaAConn16_2-1\" class=\"reference\"><a href=\"#cite_note-MaAConn16-2\">[2]<\/a><\/sup> mainly relies on probabilistic operation and Bayes rules to match data information. The probabilistic model not only considers the internal relations between keywords and documents, but it also retrieves texts based on probability dependency. The model, usually based on a group of parameterized probability distributions, consumes the internal relation between keywords and documents and retrieves according to probabilistic dependency. The model requires strong independent assumptions for tractability.\n<\/p><p>The binary independence retrieval model<sup id=\"rdp-ebb-cite_ref-PremalathaText14_12-0\" class=\"reference\"><a href=\"#cite_note-PremalathaText14-12\">[12]<\/a><\/sup> is evolved from the probabilistic model with better performance. Assuming that document <i>D<\/i> and index term <i>q<\/i> is described in a two-valued vector (<i>x<sub>1<\/sub><\/i>, <i>x<sub>2<\/sub><\/i>, \u2026 <i>x<sub>n<\/sub><\/i>), if index term <i>k<sub>i<\/sub><\/i>\u2009\u2208\u2009<i>D<\/i>, then <i>x<sub>i<\/sub><\/i>\u2009=\u20091; otherwise, <i>x<sub>i<\/sub><\/i>\u2009=\u20090. The correlation function of index term and document are shown below.\n<\/p><p><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/322b2b74848559ef5b6e1b447cf7c28b27b28b19'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -2.671ex; width:31.607ex; height:6.509ex;\" \/><\/span>\n<\/p><p><br \/>\nHere, <i>p<sub>i<\/sub><\/i> =\u2009<i>r<sub>i<\/sub>\/r<\/i>, <i>q<sub>i<\/sub><\/i> =\u2009(<i>f<sub>i<\/sub><\/i> \u2212\u2009<i>r<sub>i<\/sub><\/i>)\/(<i>f<\/i>\u2009\u2212\u2009<i>r<\/i>), <i>f<\/i> refers to amount of documents in the training document set. <i>r<\/i> is the number of documents related to the user query in the training document set. <i>f<sub>i<\/sub><\/i> represents a number of documents, including index term <i>k<sub>i<\/sub><\/i> in the training document set. <i>R<sub>i<\/sub><\/i> is the number of documents, including <i>k<sub>i<\/sub><\/i> in <i>r<\/i> relation documents.\n<\/p><p>The Okapi BM25 model is called BM25, which is an algorithm based on the probabilistic retrieval model. The Okapi BM25 model<sup id=\"rdp-ebb-cite_ref-VoorheesTREC05_13-0\" class=\"reference\"><a href=\"#cite_note-VoorheesTREC05-13\">[13]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-PereiraContext05_14-0\" class=\"reference\"><a href=\"#cite_note-PereiraContext05-14\">[14]<\/a><\/sup> is a model developed from the probabilistic model that incorporates term frequency and length normalization. The local weights are computed as parameterized frequencies, including term frequency, document frequency, and global weights as RSJ weights. Local weights are based on a 2D Poisson model, while the global weights are based on the Robertson-Sp\u00e4rck-Jones Probabilistic Model. By reducing the number of parameters to be learned and approximated, based on these heuristic techniques, BM25 often achieves better performance compared to TF-IDF (term frequency\u2013inverse document frequency).\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Based_on_the_domain_ontology_information_retrieval_model\">Based on the domain ontology information retrieval model<\/span><\/h2>\n<p>The concept of domain ontology has a relation to other concepts simultaneously. The interrelation between concepts of the semantic relative network implements synonym expansion retrieval, semantic entailment expansion, and semantic correlation expansion. We introduce a domain ontology information retrieval model to apply ontology into the traditional information retrieval model by query expansion to improve efficiency.\n<\/p><p>An illustration of the structure for the information retrieval model is shown in Fig. 1.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Yu_JOnWireCommNet2019_2019.png\" class=\"image wiki-link\" data-key=\"b48865d6996bc6965db9c99de1cde5a0\"><img alt=\"Fig1 Yu JOnWireCommNet2019 2019.png\" src=\"https:\/\/www.limswiki.org\/images\/e\/ec\/Fig1_Yu_JOnWireCommNet2019_2019.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 1<\/b> An illustration of structure for information retrieval mode.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The system consists of two parts: ontology document processing (including domain ontology servers, data source, document process unit, and information database) and ontology document retrieval (including domain ontology server, query transition, custom process, and retrieval agent).\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Ontology_documents_processing\">Ontology documents processing<\/span><\/h3>\n<p>Document processing extracts useful information from an unstructured text message and establishes mapping relations between document terms and concepts based on domain ontology.<sup id=\"rdp-ebb-cite_ref-ZhangResearch08_15-0\" class=\"reference\"><a href=\"#cite_note-ZhangResearch08-15\">[15]<\/a><\/sup> Document processing is shown in Fig. 2.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Yu_JOnWireCommNet2019_2019.png\" class=\"image wiki-link\" data-key=\"15af619033e83be247501c6c5982d330\"><img alt=\"Fig2 Yu JOnWireCommNet2019 2019.png\" src=\"https:\/\/www.limswiki.org\/images\/a\/a7\/Fig2_Yu_JOnWireCommNet2019_2019.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 2<\/b> Ontology document processing<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>In the preprocessing procedure, each document in the document set implements vocabulary, analyzes words, and filters numbers, hyphens, and punctuation. Using a stop word list removes function words to leave useful words such as nouns and verbs.<sup id=\"rdp-ebb-cite_ref-KimAnEff15_16-0\" class=\"reference\"><a href=\"#cite_note-KimAnEff15-16\">[16]<\/a><\/sup> Extracting stem words and removing the prefix and suffix improve the accuracy of retrieval. Finally, determining certain words as an index element expresses literature content conception.\n<\/p><p>Annotating semantic on a retrieved object by analyzing characteristic vocabulary builds the mapping relation between words and concepts. First, characteristic words are extracted and the weight of each word is calculated by counting word frequency to distinguish the importance of words. In this paper, the genetic algorithm is used to calculate the best weighting factor. In the end, it is applied to the actual retrieval system.\n<\/p><p>The system automatically learns weighted factor by genetic algorithm. It is a heuristic method which simulates biological evolution processes and through factor mutation eliminates the non-ideal factor sets and leaves the optimal factor set. The algorithm tries to maximize the fitness function as a parameter estimation to search a population consisting of the fittest individual; in our case, those are the parameters of the weighted term in retrieving. In Fig. 3, the pseudo-code of genetic algorithm for weighted term frequency is described.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Yu_JOnWireCommNet2019_2019.png\" class=\"image wiki-link\" data-key=\"99672edf8469bf5c519844b8eb094c45\"><img alt=\"Fig3 Yu JOnWireCommNet2019 2019.png\" src=\"https:\/\/www.limswiki.org\/images\/1\/14\/Fig3_Yu_JOnWireCommNet2019_2019.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 3<\/b> Pseudo-code for select weight factor by genetic algorithm<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>This algorithm simulates the evolution process by gradually adjusting weight factor and eliminating factor combination with a low fitness value. If the fitness result for one combination is lower than the other one, this group will be likely excluded in the next generation. To avoid the local optimization, we select many original generations and decrease the unqualified group time by time. In each iteration, the factor interval lies in [<i>w<sub>i<\/sub><\/i> \u2212\u20090.2, <i>w<sub>i<\/sub><\/i>\u2009+\u20090.4] to lower the negative factors. Fitness function <i>P<\/i>(<i>t<\/i>) determines how fit an individual is with new weighted combination (<i>w<\/i>'<sub>tit<\/sub>,\u2009<i>w<\/i>'<sub>key<\/sub>,\u2009<i>w<\/i>'<sub>abs<\/sub>). The traditional factor set is replaced by <i>P<\/i>(<i>t<\/i>) with higher fitness, then calculated with a query word for the similarity of each paper, and the rank list is generated. The penalty function <i>f<\/i> is used to get the distance of the expert list.\n<\/p><p>Then, for each semantic meaning of ontology term, whether it exists in the extracting characteristic vocabulary is checked. If the semantic exists, the document and weight of the semantic term is calculated to manifest the text with semantic information.\n<\/p><p>After document feature extraction, a document index based on the concept to reflect the internal relation between text index terms is established, and ambiguity during annotation is excluded. An index based on the concept consists of feature words with their relation given by semantic parsing. Feature words connect through ontology instance and documents. The structure of the ontology concept index is shown in Fig. 4.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Yu_JOnWireCommNet2019_2019.png\" class=\"image wiki-link\" data-key=\"3e291af209eb69d2d587ba25d80b7a76\"><img alt=\"Fig4 Yu JOnWireCommNet2019 2019.png\" src=\"https:\/\/www.limswiki.org\/images\/c\/c8\/Fig4_Yu_JOnWireCommNet2019_2019.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 4<\/b> Index structure based on the ontology concept<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Ontology_document_retrieval\">Ontology document retrieval<\/span><\/h3>\n<p>The procedure of document retrieval is listed below:\n<\/p>\n<ol><li>The user inputs search words or phrases in the search interface, then the system removes function words and reserves the nouns and verbs. Term extraction from words is implemented to get semantic conceptual words and phrases. The result is passed to the query transition module.<\/li>\n<li>The query transition module sends the results to the ontology server to search for a corresponding semantic concept, including hypernym, hyponym, synonym, and conceptual meanings.<sup id=\"rdp-ebb-cite_ref-MesserlyInfo97_17-0\" class=\"reference\"><a href=\"#cite_note-MesserlyInfo97-17\">[17]<\/a><\/sup> If the word is not found in the ontology database, it prompts the user to adjust the retrieval strategy.<\/li>\n<li>For the matching concept in domain ontology, the query transition module implements search, semantic judgment, and query extension to add semantic information to the query. The module submits the query to a retrieval agent for searching. For words with an uncertain semantic message, it executes a keyword matching method to search.<\/li>\n<li>Handled by the custom process module, the user interface then lists query results according to exact word, synonym, hypernym, and hyponym words.<\/li><\/ol>\n<p>Before the retrieval process, the system executes semantic analysis for the user query request. A keyword is extracted from stop words, and the determination of whether or not the keyword belongs to the ontology database is made. Through combining concepts in the ontology library, more semantic information is obtained by semantic reasoning. The pseudo-code of a query using the semantic analysis algorithm is shown in Fig. 5.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig5_Yu_JOnWireCommNet2019_2019.png\" class=\"image wiki-link\" data-key=\"173915ef84410a10c09c51ca3a3a8cf8\"><img alt=\"Fig5 Yu JOnWireCommNet2019 2019.png\" src=\"https:\/\/www.limswiki.org\/images\/3\/30\/Fig5_Yu_JOnWireCommNet2019_2019.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 5<\/b> Pseudo-code for the query semantic analysis algorithm<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>After applying semantic analysis on the user request, semantic information is able to be used in the retrieval strategy. The pseudo-code of information retrieval algorithm is shown in Fig. 6.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig6_Yu_JOnWireCommNet2019_2019.png\" class=\"image wiki-link\" data-key=\"3471b47037b0ca27c6ae0755d3c127a2\"><img alt=\"Fig6 Yu JOnWireCommNet2019 2019.png\" src=\"https:\/\/www.limswiki.org\/images\/8\/8b\/Fig6_Yu_JOnWireCommNet2019_2019.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 6<\/b> Pseudo-code for the information retrieval algorithm<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Experiment_and_results\">Experiment and results<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"The_experimental_design_of_the_information_retrieval_model_based_on_ontology\">The experimental design of the information retrieval model based on ontology<\/span><\/h3>\n<p>In order to evaluate the performance of the information retrieval model based on ontology, it is necessary to use ontology tools for modelling, such as Prot\u00e9g\u00e9<sup id=\"rdp-ebb-cite_ref-Ke.C3.9FlerSemantic09_18-0\" class=\"reference\"><a href=\"#cite_note-Ke.C3.9FlerSemantic09-18\">[18]<\/a><\/sup> as an ontology modeling tool, ICTCLAS<sup id=\"rdp-ebb-cite_ref-CaoInfo06_19-0\" class=\"reference\"><a href=\"#cite_note-CaoInfo06-19\">[19]<\/a><\/sup> as a word segmentation tool, Jena [20] as a semantic parsing tool, and Lucene as a semantic indexing tool.<sup id=\"rdp-ebb-cite_ref-CastellsSelf05_20-0\" class=\"reference\"><a href=\"#cite_note-CastellsSelf05-20\">[20]<\/a><\/sup>\n<\/p><p>The data set contains 1000 scientific papers, as well as papers from the IEEE digital library, which are used to extract the core concepts in the domain ontology. Then the final conceptualization system is established. The literature is divided into 10 groups. Each group contains 100 papers related to a query subject or keywords (e.g., \"computer architecture\" and \"operating system\"). Therefore, 10 expert rank lists are available for retrieval.\n<\/p><p>The evaluation criterion considers the similarity of each paper towards every query word. For example, the mistaken sort term distance of the top neighboring papers is higher than the ones of the lowest papers. The formula below is used to collect the distance within rank list <i>R<\/i> and <i>R<\/i>':\n<\/p><p><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d9974178576e21b59dc0b3e130734059cb2fc623'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -8.505ex; width:40.017ex; height:15.509ex;\" \/><\/span>\n<\/p><p>Here, <i>n<\/i> represents the paper numbers in the rank list. The dis(<i>i<\/i>) represents the position distance for paper <i>i<\/i> in the rank list and expert rank list. <i>P<\/i>(<i>t<\/i>) represents the distance between the two rank lists of the denominator specification.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Analysis_of_experimental_results\">Analysis of experimental results<\/span><\/h3>\n<p>The genetic algorithm with simulated annealing method is compared in relation to iteration numbers and average distance of the rank list. The result is shown in Fig. 7. \n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig7_Yu_JOnWireCommNet2019_2019.png\" class=\"image wiki-link\" data-key=\"c32fbc126f45264fa8def4d5e8c38760\"><img alt=\"Fig7 Yu JOnWireCommNet2019 2019.png\" src=\"https:\/\/www.limswiki.org\/images\/b\/b3\/Fig7_Yu_JOnWireCommNet2019_2019.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 7<\/b> Comparison of the simulated annealing and genetic algorithm in average distance and iteration times<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The <i>X<\/i>-axis time is the number of iterations in two algorithms, and the <i>Y<\/i>-axis average distance is calculated by the formula prior, demonstrating the difference of the ranking list with the expert list. After 200 iterations, the average distance is close to overall optimal. The algorithm deduces the optimized weight combination of factors which are <i>w<sub>tit<\/sub><\/i>\u2009=\u20093, <i>w<sub>abs<\/sub><\/i>\u2009=\u20092, <i>w<sub>key<\/sub><\/i>\u2009=\u20090.6.\n<\/p><p>The different threshold similarity value \u03b6 is taken, in which \u03b6\u2009=\u20090.5 means sim(<i>S<sub>q<\/sub><\/i>,\u2009<i>S<sub>j<\/sub><\/i>)\u2009\u2265\u20090.55. Every experiment counts retrieval documents set results |<i>A<\/i>|, ontology relevant documents |<i>B<\/i>|, and user query relevant document in the retrieval set |<i>A<\/i>\u2009\u2229\u2009<i>B<\/i>| to calculate the precision and recall rate. The result is shown in Table 1.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"7\"><b>Table 1.<\/b> Precision and recall rate of ontology retrieval\n<\/td><\/tr>\n\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Threshold\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\" colspan=\"2\">\u03b6\u2009=\u20090.5\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\" colspan=\"2\">\u03b6\u2009=\u20090.55\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\" colspan=\"2\">\u03b6\u2009=\u20090.6\n<\/th><\/tr>\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Group num.\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Precison\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Recall\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Precision\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Recall\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Precision\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Recall\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">84.50%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">83.36%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">82.45%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">81.85%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">2\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">38.92%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">93.12%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">51.00%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">74.35%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">94.65%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">94.43%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">99.12%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">94.65%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">4\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">83.23%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">93.68%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">96.34%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">45.74%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">5\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">51.36%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">95.44%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">100.00%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Average\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">66.47%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">96.67%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">95.38%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">95.38%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">99.09%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">74.65%\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The precision rate improves with the threshold increasing. The precision rate reaches more than 99% when \u03b6\u2009=\u20090.6. However, the recall rate only reaches 74%, which means the query result lost the critical information.\n<\/p><p>When \u03b6\u2009=\u20090.5, the recall rate maintains a higher rate while precision remains low. Because of the system search, all the documents have ontology which relates with a query. The \u03b6\u2009=\u20090.55 balance both the precision rate and recall rate.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusion\">Conclusion<\/span><\/h2>\n<p>In order to better satisfy users\u2019 retrieval needs and optimize the performance of information retrieval, domain ontology is introduced into the information retrieval system. In this paper, an information retrieval model based on domain ontology is proposed. The system includes document processing and ontology document retrieval with the ontology server, information database, and query transition and retrieval agent modules. We present a genetic algorithm to calculate the optimum combination of weighted factors of word frequency. Based on the evaluation criterion, we applied the system to query documents and compare with expert lists. In the end, the genetic algorithm shortens the distance compared with simulated annealing, and the ontology retrieval model exhibits a better precision and recall rate to understand the users\u2019 requirements.\n<\/p><p>In the future, we wish to further implement an automatic or semi-automatic method such as data mining to an established ontology database to prevent the high difficulty in ontology establishment. And we may further implement modeling personalized query preferences and return retrieval results according to different user query demands.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Abbreviations\">Abbreviations<\/span><\/h2>\n<p><b>IR<\/b>: Information retrieval\n<\/p><p><b>VSM<\/b>: Vector space model\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Declarations\">Declarations<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h3>\n<p>This work is supported by the Science and Technology Research Project of the Department of Education of Jilin Province (Grant 201657).\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h4>\n<p>The Science and Technology Research Project of Department of Education of Jilin Province (Grant 201657).\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Availability_of_data_and_materials\">Availability of data and materials<\/span><\/h4>\n<p>The data are included in this published article.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Author.E2.80.99s_contributions\">Author\u2019s contributions<\/span><\/h4>\n<p>The manuscript was written through contributions of the author. The author read and approved the final manuscript.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Author.E2.80.99s_information\">Author\u2019s information<\/span><\/h4>\n<p>Binbin Yu: Ph.D. candidate, College of Computer Science and Technology, Jilin University. Lecturer, College of Information Technology and Media, Beihua University. His research interests include network security and so on.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h4>\n<p>The author declares that he has no competing interests.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-TangTheRes10-1\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-TangTheRes10_1-0\">1.0<\/a><\/sup> <sup><a href=\"#cite_ref-TangTheRes10_1-1\">1.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Tang, M.; Bian, Y.; Tao, F. (2010). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.cnki.com.cn\/Article_en\/CJFDTOTAL-QBZZ201005036.htm\" data-key=\"76f31c5346b0c75b2144691168d9b3b1\">\"The Research of Document Retrieval System Based on the Semantic Vector Space Model\"<\/a>. <i>Journal of Intelligence<\/i> <b>5<\/b> (29): 167\u201377<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/en.cnki.com.cn\/Article_en\/CJFDTOTAL-QBZZ201005036.htm\" data-key=\"76f31c5346b0c75b2144691168d9b3b1\">http:\/\/en.cnki.com.cn\/Article_en\/CJFDTOTAL-QBZZ201005036.htm<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Research+of+Document+Retrieval+System+Based+on+the+Semantic+Vector+Space+Model&rft.jtitle=Journal+of+Intelligence&rft.aulast=Tang%2C+M.%3B+Bian%2C+Y.%3B+Tao%2C+F.&rft.au=Tang%2C+M.%3B+Bian%2C+Y.%3B+Tao%2C+F.&rft.date=2010&rft.volume=5&rft.issue=29&rft.pages=167%E2%80%9377&rft_id=http%3A%2F%2Fen.cnki.com.cn%2FArticle_en%2FCJFDTOTAL-QBZZ201005036.htm&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MaAConn16-2\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MaAConn16_2-0\">2.0<\/a><\/sup> <sup><a href=\"#cite_ref-MaAConn16_2-1\">2.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Ma, C.; Liang, W.; Zheng, M. et al. (2016). \"A Connectivity-Aware Approximation Algorithm for Relay Node Placement in Wireless Sensor Networks\". <i>IEEE Sensors Journal<\/i> <b>16<\/b> (2): 515-528. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FJSEN.2015.2456931\" data-key=\"b807ec92d509e8c1ed2c3fe724d81b35\">10.1109\/JSEN.2015.2456931<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Connectivity-Aware+Approximation+Algorithm+for+Relay+Node+Placement+in+Wireless+Sensor+Networks&rft.jtitle=IEEE+Sensors+Journal&rft.aulast=Ma%2C+C.%3B+Liang%2C+W.%3B+Zheng%2C+M.+et+al.&rft.au=Ma%2C+C.%3B+Liang%2C+W.%3B+Zheng%2C+M.+et+al.&rft.date=2016&rft.volume=16&rft.issue=2&rft.pages=515-528&rft_id=info:doi\/10.1109%2FJSEN.2015.2456931&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-YangScient14-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-YangScient14_3-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Yang, X.Q.; Yang, D.; Yuan, M. (2014). \"Scientific Literature Retrieval Model Based on Weighted Term Frequency\". <i>Proceedings of the 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing<\/i>: 427\u2013430. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FIIH-MSP.2014.113\" data-key=\"281ce8cc7cc354cbb3c8e004c5c7607e\">10.1109\/IIH-MSP.2014.113<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scientific+Literature+Retrieval+Model+Based+on+Weighted+Term+Frequency&rft.jtitle=Proceedings+of+the+2014+Tenth+International+Conference+on+Intelligent+Information+Hiding+and+Multimedia+Signal+Processing&rft.aulast=Yang%2C+X.Q.%3B+Yang%2C+D.%3B+Yuan%2C+M.&rft.au=Yang%2C+X.Q.%3B+Yang%2C+D.%3B+Yuan%2C+M.&rft.date=2014&rft.pages=427%E2%80%93430&rft_id=info:doi\/10.1109%2FIIH-MSP.2014.113&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-XuDist16-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-XuDist16_4-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Xu, M.; Yang, Q.; Kwak, K.S. (2016). \"Distributed Topology Control With Lifetime Extension Based on Non-Cooperative Game for Wireless Sensor Networks\". <i>IEEE Sensors Journal<\/i> <b>16<\/b> (9): 3332-3342. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FJSEN.2016.2527056\" data-key=\"3abe0b4af08f5e3fc18dfdadf49e0a63\">10.1109\/JSEN.2016.2527056<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Distributed+Topology+Control+With+Lifetime+Extension+Based+on+Non-Cooperative+Game+for+Wireless+Sensor+Networks&rft.jtitle=IEEE+Sensors+Journal&rft.aulast=Xu%2C+M.%3B+Yang%2C+Q.%3B+Kwak%2C+K.S.&rft.au=Xu%2C+M.%3B+Yang%2C+Q.%3B+Kwak%2C+K.S.&rft.date=2016&rft.volume=16&rft.issue=9&rft.pages=3332-3342&rft_id=info:doi\/10.1109%2FJSEN.2016.2527056&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-YangOnto15-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-YangOnto15_5-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Yang, Y.; Du, J.P.; Ping, Y. (2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=3408856\" data-key=\"74f31de4ff64362958b8c96e345a996e\">\"Ontology-based intelligent information retrieval system\"<\/a>. <i>Journal of Software<\/i> <b>26<\/b> (7): 1675\u201387<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=3408856\" data-key=\"74f31de4ff64362958b8c96e345a996e\">https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=3408856<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ontology-based+intelligent+information+retrieval+system&rft.jtitle=Journal+of+Software&rft.aulast=Yang%2C+Y.%3B+Du%2C+J.P.%3B+Ping%2C+Y.&rft.au=Yang%2C+Y.%3B+Du%2C+J.P.%3B+Ping%2C+Y.&rft.date=2015&rft.volume=26&rft.issue=7&rft.pages=1675%E2%80%9387&rft_id=https%3A%2F%2Fmathscinet.ams.org%2Fmathscinet-getitem%3Fmr%3D3408856&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LuImprov14-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LuImprov14_6-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Lu, T.; Liang, M. (2014). \"Improvement of Text Feature Extraction with Genetic Algorithm\". <i>New Technology of Library and Information Service<\/i> <b>30<\/b> (4): 48\u201357. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.11925%2Finfotech.1003-3513.2014.04.08\" data-key=\"7bdd33695d836b86062d0007c64b2b3d\">10.11925\/infotech.1003-3513.2014.04.08<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improvement+of+Text+Feature+Extraction+with+Genetic+Algorithm&rft.jtitle=New+Technology+of+Library+and+Information+Service&rft.aulast=Lu%2C+T.%3B+Liang%2C+M.&rft.au=Lu%2C+T.%3B+Liang%2C+M.&rft.date=2014&rft.volume=30&rft.issue=4&rft.pages=48%E2%80%9357&rft_id=info:doi\/10.11925%2Finfotech.1003-3513.2014.04.08&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ValletAnOnt05-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ValletAnOnt05_7-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Vallet, D.; Fern\u00e1ndez, M.; Castells, P. (2005). \"An Ontology-Based Information Retrieval Model\". <i>Proceedings from ESWC 2005, The Semantic Web: Research and Applications<\/i>: 455\u201370. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F11431053_31\" data-key=\"b8b62d925f09ed22136f7be276fa0faa\">10.1007\/11431053_31<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Ontology-Based+Information+Retrieval+Model&rft.jtitle=Proceedings+from+ESWC+2005%2C+The+Semantic+Web%3A+Research+and+Applications&rft.aulast=Vallet%2C+D.%3B+Fern%C3%A1ndez%2C+M.%3B+Castells%2C+P.&rft.au=Vallet%2C+D.%3B+Fern%C3%A1ndez%2C+M.%3B+Castells%2C+P.&rft.date=2005&rft.pages=455%E2%80%9370&rft_id=info:doi\/10.1007%2F11431053_31&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-8\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Manning, C.D.; Raghavan, P.; Sch\u00fctze, H. (2008). <i>Introduction to Information Retrieval<\/i>. Cambridge University Press. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1017%2FCBO9780511809071\" data-key=\"65da3cc87853d0ca6749762a108cb88f\">10.1017\/CBO9780511809071<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780511809071.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Introduction+to+Information+Retrieval&rft.aulast=Manning%2C+C.D.%3B+Raghavan%2C+P.%3B+Sch%C3%BCtze%2C+H.&rft.au=Manning%2C+C.D.%3B+Raghavan%2C+P.%3B+Sch%C3%BCtze%2C+H.&rft.date=2008&rft.pub=Cambridge+University+Press&rft_id=info:doi\/10.1017%2FCBO9780511809071&rft.isbn=9780511809071&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JonesAProb00-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-JonesAProb00_9-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Jones, K.S.; Walker, S.; Robertson, S.E. (2000). \"A probabilistic model of information retrieval: Development and comparative experiments: Part 1\". <i>Information Processing & Management<\/i>: 779\u2013808. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2FS0306-4573%2800%2900015-7\" data-key=\"ef66d2a8a6ddf04b45d84d9aecb55415\">10.1016\/S0306-4573(00)00015-7<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+probabilistic+model+of+information+retrieval%3A+Development+and+comparative+experiments%3A+Part+1&rft.jtitle=Information+Processing+%26+Management&rft.aulast=Jones%2C+K.S.%3B+Walker%2C+S.%3B+Robertson%2C+S.E.&rft.au=Jones%2C+K.S.%3B+Walker%2C+S.%3B+Robertson%2C+S.E.&rft.date=2000&rft.pages=779%E2%80%93808&rft_id=info:doi\/10.1016%2FS0306-4573%2800%2900015-7&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WongGener85-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WongGener85_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wong, S.K.M.; Ziarko, W.; Wong, P.C.N. (1985). \"Generalized vector spaces model in information retrieval\". <i>Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval<\/i>: 18\u201325. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F253495.253506\" data-key=\"e04d6c6f427f75879f6831e646aad648\">10.1145\/253495.253506<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Generalized+vector+spaces+model+in+information+retrieval&rft.jtitle=Proceedings+of+the+8th+Annual+International+ACM+SIGIR+Conference+on+Research+and+Development+in+Information+Retrieval&rft.aulast=Wong%2C+S.K.M.%3B+Ziarko%2C+W.%3B+Wong%2C+P.C.N.&rft.au=Wong%2C+S.K.M.%3B+Ziarko%2C+W.%3B+Wong%2C+P.C.N.&rft.date=1985&rft.pages=18%E2%80%9325&rft_id=info:doi\/10.1145%2F253495.253506&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Baeza-YatesModern99-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Baeza-YatesModern99_11-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Baeza-Yates, R.; Ribeiro-Neto, B. (1999). <i>Modern Information Retrieval<\/i>. Addison Wesley. pp. 544. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780201398298.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Modern+Information+Retrieval&rft.aulast=Baeza-Yates%2C+R.%3B+Ribeiro-Neto%2C+B.&rft.au=Baeza-Yates%2C+R.%3B+Ribeiro-Neto%2C+B.&rft.date=1999&rft.pages=pp.%26nbsp%3B544&rft.pub=Addison+Wesley&rft.isbn=9780201398298&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PremalathaText14-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PremalathaText14_12-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Premalatha, R.; Srinivasan, S. (2014). \"Text processing in information retrieval system using vector space model\". <i>Proceedings from the 2014 International Conference on Information Communication and Embedded Systems<\/i>: 1\u20136. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FICICES.2014.7033837\" data-key=\"31e1200a6875b391c21434c2768b6218\">10.1109\/ICICES.2014.7033837<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Text+processing+in+information+retrieval+system+using+vector+space+model&rft.jtitle=Proceedings+from+the+2014+International+Conference+on+Information+Communication+and+Embedded+Systems&rft.aulast=Premalatha%2C+R.%3B+Srinivasan%2C+S.&rft.au=Premalatha%2C+R.%3B+Srinivasan%2C+S.&rft.date=2014&rft.pages=1%E2%80%936&rft_id=info:doi\/10.1109%2FICICES.2014.7033837&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-VoorheesTREC05-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-VoorheesTREC05_13-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Voorhees, E.M.; Harman, D.K., ed. (2005). <i>TREC: Experiment and Evaluation in Information Retrieval<\/i>. MIT Press. pp. 368. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780262220736.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=TREC%3A+Experiment+and+Evaluation+in+Information+Retrieval&rft.date=2005&rft.pages=pp.%26nbsp%3B368&rft.pub=MIT+Press&rft.isbn=9780262220736&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PereiraContext05-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PereiraContext05_14-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pereira, R.A.M.; Molinari, A.; Pasi, G. (2005). \"Contextual weighted representations and indexing models for the retrieval of HTML documents\". <i>Soft Computing<\/i> <b>9<\/b> (7): 481-92. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2Fs00500-004-0361-z\" data-key=\"a09850596fdfc0a81bc4f57778f368f6\">10.1007\/s00500-004-0361-z<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Contextual+weighted+representations+and+indexing+models+for+the+retrieval+of+HTML+documents&rft.jtitle=Soft+Computing&rft.aulast=Pereira%2C+R.A.M.%3B+Molinari%2C+A.%3B+Pasi%2C+G.&rft.au=Pereira%2C+R.A.M.%3B+Molinari%2C+A.%3B+Pasi%2C+G.&rft.date=2005&rft.volume=9&rft.issue=7&rft.pages=481-92&rft_id=info:doi\/10.1007%2Fs00500-004-0361-z&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ZhangResearch08-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ZhangResearch08_15-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Zhang, K.; Nan, K.; Ma, Y. (2008). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.oriprobe.com\/journals\/jsjyyyj\/2008_8.html\" data-key=\"4dad8c5fe2acedb9164ccb027e19a1f4\">\"Research on ontology-based information retrieval system models\"<\/a>. <i>Application Research of Computers<\/i> <b>8<\/b> (25): 2241-49<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.oriprobe.com\/journals\/jsjyyyj\/2008_8.html\" data-key=\"4dad8c5fe2acedb9164ccb027e19a1f4\">https:\/\/www.oriprobe.com\/journals\/jsjyyyj\/2008_8.html<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Research+on+ontology-based+information+retrieval+system+models&rft.jtitle=Application+Research+of+Computers&rft.aulast=Zhang%2C+K.%3B+Nan%2C+K.%3B+Ma%2C+Y.&rft.au=Zhang%2C+K.%3B+Nan%2C+K.%3B+Ma%2C+Y.&rft.date=2008&rft.volume=8&rft.issue=25&rft.pages=2241-49&rft_id=https%3A%2F%2Fwww.oriprobe.com%2Fjournals%2Fjsjyyyj%2F2008_8.html&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KimAnEff15-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KimAnEff15_16-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Kim, H.; Han, S.-W. (2015). \"An Efficient Sensor Deployment Scheme for Large-Scale Wireless Sensor Networks\". <i>IEEE Communications Letters<\/i> <b>19<\/b> (1): 98\u2013101. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FLCOMM.2014.2372015\" data-key=\"0cb61809e8650e10f52eb96b8fb55a4b\">10.1109\/LCOMM.2014.2372015<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Efficient+Sensor+Deployment+Scheme+for+Large-Scale+Wireless+Sensor+Networks&rft.jtitle=IEEE+Communications+Letters&rft.aulast=Kim%2C+H.%3B+Han%2C+S.-W.&rft.au=Kim%2C+H.%3B+Han%2C+S.-W.&rft.date=2015&rft.volume=19&rft.issue=1&rft.pages=98%E2%80%93101&rft_id=info:doi\/10.1109%2FLCOMM.2014.2372015&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MesserlyInfo97-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MesserlyInfo97_17-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Messerly, J.J.; Heidorn, G.E.; Richardson, S.D. et al. (07 March 1997). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/patents.google.com\/patent\/US6161084\" data-key=\"83ffc8d3200e77746b62c6122f95556d\">\"Information retrieval utilizing semantic representation of text by identifying hypernyms and indexing multiple tokenized semantic structures to a same passage of text\"<\/a>. <i>Google Patents<\/i><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/patents.google.com\/patent\/US6161084\" data-key=\"83ffc8d3200e77746b62c6122f95556d\">https:\/\/patents.google.com\/patent\/US6161084<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Information+retrieval+utilizing+semantic+representation+of+text+by+identifying+hypernyms+and+indexing+multiple+tokenized+semantic+structures+to+a+same+passage+of+text&rft.atitle=Google+Patents&rft.aulast=Messerly%2C+J.J.%3B+Heidorn%2C+G.E.%3B+Richardson%2C+S.D.+et+al.&rft.au=Messerly%2C+J.J.%3B+Heidorn%2C+G.E.%3B+Richardson%2C+S.D.+et+al.&rft.date=07+March+1997&rft_id=https%3A%2F%2Fpatents.google.com%2Fpatent%2FUS6161084&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Ke.C3.9FlerSemantic09-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Ke.C3.9FlerSemantic09_18-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Ke\u00dfler, C.; Raubal, M.; Wosniok, C. (2009). \"Semantic Rules for Context-Aware Geographical Information Retrieval\". <i>Proceedings from EuroSSC 2009 Smart Sensing and Context<\/i>: 77\u201392. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-642-04471-7_7\" data-key=\"630ccd1e3b65ecde125e6959027f65b0\">10.1007\/978-3-642-04471-7_7<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Semantic+Rules+for+Context-Aware+Geographical+Information+Retrieval&rft.jtitle=Proceedings+from+EuroSSC+2009+Smart+Sensing+and+Context&rft.aulast=Ke%C3%9Fler%2C+C.%3B+Raubal%2C+M.%3B+Wosniok%2C+C.&rft.au=Ke%C3%9Fler%2C+C.%3B+Raubal%2C+M.%3B+Wosniok%2C+C.&rft.date=2009&rft.pages=77%E2%80%9392&rft_id=info:doi\/10.1007%2F978-3-642-04471-7_7&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CaoInfo06-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CaoInfo06_19-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Cao, Y.-G.; Cao, Y.-Z.; Jin, M.-Z.; Liu, C. (2006). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.cnki.com.cn\/Article_en\/CJFDTOTAL-RJXB200603003.htm\" data-key=\"1926cd7d73b107b836aa38b303b734e5\">\"Information Retrieval Oriented Adaptive Chinese Word Segmentation System\"<\/a>. <i>Journal of Software<\/i> <b>3<\/b> (17)<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/en.cnki.com.cn\/Article_en\/CJFDTOTAL-RJXB200603003.htm\" data-key=\"1926cd7d73b107b836aa38b303b734e5\">http:\/\/en.cnki.com.cn\/Article_en\/CJFDTOTAL-RJXB200603003.htm<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Information+Retrieval+Oriented+Adaptive+Chinese+Word+Segmentation+System&rft.jtitle=Journal+of+Software&rft.aulast=Cao%2C+Y.-G.%3B+Cao%2C+Y.-Z.%3B+Jin%2C+M.-Z.%3B+Liu%2C+C.&rft.au=Cao%2C+Y.-G.%3B+Cao%2C+Y.-Z.%3B+Jin%2C+M.-Z.%3B+Liu%2C+C.&rft.date=2006&rft.volume=3&rft.issue=17&rft_id=http%3A%2F%2Fen.cnki.com.cn%2FArticle_en%2FCJFDTOTAL-RJXB200603003.htm&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CastellsSelf05-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CastellsSelf05_20-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Castells, P.; Fern\u00e1ndez, M.; Vallet, D. et al. (2005). \"Self-tuning Personalized Information Retrieval in an Ontology-Based Framework\". <i>Proceedings from On the Move to Meaningful Internet Systems 2005<\/i>: 977\u2013986. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F11575863_119\" data-key=\"579af4cda12b0df381a77723657789a0\">10.1007\/11575863_119<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Self-tuning+Personalized+Information+Retrieval+in+an+Ontology-Based+Framework&rft.jtitle=Proceedings+from+On+the+Move+to+Meaningful+Internet+Systems+2005&rft.aulast=Castells%2C+P.%3B+Fern%C3%A1ndez%2C+M.%3B+Vallet%2C+D.+et+al.&rft.au=Castells%2C+P.%3B+Fern%C3%A1ndez%2C+M.%3B+Vallet%2C+D.+et+al.&rft.date=2005&rft.pages=977%E2%80%93986&rft_id=info:doi\/10.1007%2F11575863_119&rfr_id=info:sid\/en.wikipedia.org:Journal:Research_on_information_retrieval_model_based_on_ontology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. Grammar and punctuation was edited to American English, and in some cases additional context was added to text when necessary. In some cases important information was missing from the references, and that information was added.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185657\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.591 seconds\nReal time usage: 1.690 seconds\nPreprocessor visited node count: 15922\/1000000\nPreprocessor generated node count: 35473\/1000000\nPost\u2010expand include size: 104023\/2097152 bytes\nTemplate argument size: 37829\/2097152 bytes\nHighest expansion depth: 15\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 509.195 1 - -total\n 80.20% 408.398 1 - Template:Reflist\n 67.13% 341.806 20 - Template:Citation\/core\n 57.68% 293.703 16 - Template:Cite_journal\n 13.28% 67.642 1 - Template:Infobox_journal_article\n 12.78% 65.052 1 - Template:Infobox\n 12.53% 63.817 3 - Template:Cite_book\n 7.82% 39.838 80 - Template:Infobox\/row\n 4.86% 24.736 16 - Template:Citation\/identifier\n 3.96% 20.158 1 - Template:Cite_web\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10891-0!*!0!!en!5!*!math=5 and timestamp 20190401185656 and revision id 34978\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Research_on_information_retrieval_model_based_on_ontology\">https:\/\/www.limswiki.org\/index.php\/Journal:Research_on_information_retrieval_model_based_on_ontology<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","15ab90bc3c6b03e3f0954255a3ab8dc7_images":["https:\/\/www.limswiki.org\/images\/e\/ec\/Fig1_Yu_JOnWireCommNet2019_2019.png","https:\/\/www.limswiki.org\/images\/a\/a7\/Fig2_Yu_JOnWireCommNet2019_2019.png","https:\/\/www.limswiki.org\/images\/1\/14\/Fig3_Yu_JOnWireCommNet2019_2019.png","https:\/\/www.limswiki.org\/images\/c\/c8\/Fig4_Yu_JOnWireCommNet2019_2019.png","https:\/\/www.limswiki.org\/images\/3\/30\/Fig5_Yu_JOnWireCommNet2019_2019.png","https:\/\/www.limswiki.org\/images\/8\/8b\/Fig6_Yu_JOnWireCommNet2019_2019.png","https:\/\/www.limswiki.org\/images\/b\/b3\/Fig7_Yu_JOnWireCommNet2019_2019.png"],"15ab90bc3c6b03e3f0954255a3ab8dc7_timestamp":1554145016,"6ee24d5f7bd1af8e24033922d437ffd0_type":"article","6ee24d5f7bd1af8e24033922d437ffd0_title":"Semantics for an integrative and immersive pipeline combining visualization and analysis of molecular data (Trellet et al. 2018)","6ee24d5f7bd1af8e24033922d437ffd0_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data","6ee24d5f7bd1af8e24033922d437ffd0_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Semantics for an integrative and immersive pipeline combining visualization and analysis of molecular data\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nSemantics for an integrative and immersive pipeline combining visualization and analysis of molecular dataJournal\n \nJournal of Integrative BioinformaticsAuthor(s)\n \nTrellet, Mikael; F\u00e9rey, Nicolas; Floty\u0144ski, Jakub; Baaden, Marc; Bourdot, PatrickAuthor affiliation(s)\n \nBijvoet Center for Biomolecular Research, Universit\u00e9 Paris Sud, Pozna\u0144 Univ. of Economics and Business, Laboratoire de Biochimie Th\u00e9oriquePrimary contact\n \nEmail: m dot e dot trellet at uu dot nlYear published\n \n2018Volume and issue\n \n15(2)Page(s)\n \n20180004DOI\n \n10.1515\/jib-2018-0004ISSN\n \n1613-4516Distribution license\n \nCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 InternationalWebsite\n \nhttps:\/\/www.degruyter.com\/view\/j\/jib.2018.15.issue-2\/jib-2018-0004\/jib-2018-0004.xmlDownload\n \nhttps:\/\/www.degruyter.com\/downloadpdf\/j\/jib.2018.15.issue-2\/jib-2018-0004\/jib-2018-0004.xml (PDF)\n\nContents\n\n1 Abstract \n2 Introduction \n3 Related works \n\n3.1 Semantic modeling formalism and semantic web \n3.2 Ontologies in bioinformatics \n\n\n4 Using a semantic representation to efficiently store, query, and link heterogeneous structural biology data \n\n4.1 Knowledge formalism choice \n4.2 Ontology for modeling of structural biology concepts \n4.3 Storing molecular data linked by a structural biology ontology \n\n\n5 Using semantic queries to support direct interactions for a new generation of molecular visualization applications \n\n5.1 From vocal keywords to application command \n\n5.1.1 Keyword recognition \n5.1.2 Keyword classification \n5.1.3 Command creation \n5.1.4 Performances \n5.1.5 Limits and perspectives \n\n\n5.2 Synchronizing interactive selections between 2D and 3D workspaces \n5.3 Semi-automated analyses triggered by direct interactions \n5.4 Platform architecture \n\n\n6 Scenario and evaluation \n\n6.1 Scenario \n6.2 Evaluation of high-level task completion based on hierarchical task analysis \n\n\n7 Conclusion \n8 Acknowledgements \n\n8.1 Conflict of interest \n\n\n9 References \n10 Notes \n\n\n\nAbstract \nThe advances made in recent years in the field of structural biology significantly increased the throughput and complexity of data that scientists have to deal with. Combining and analyzing such heterogeneous amounts of data became a crucial time consumer in the daily tasks of scientists. However, only few efforts have been made to offer scientists an alternative to the standard compartmentalized tools they use to explore their data and that involve a regular back and forth between them. We propose here an integrated pipeline especially designed for immersive environments, promoting direct interactions on semantically linked 2D and 3D heterogeneous data, displayed in a common working space. The creation of a semantic definition describing the content and the context of a molecular scene leads to the creation of an intelligent system where data are (1) combined through pre-existing or inferred links present in our hierarchical definition of the concepts, (2) enriched with suitable and adaptive analyses proposed to the user with respect to the current task and (3) interactively presented in a unique working environment to be explored.\nKeywords: virtual reality, semantics for interaction, structural biology\n\nIntroduction \nRecent years have seen a profound change in the way structural biologists interact with their data. New techniques that try to capture the structure and dynamics of bio-molecules have reached an extraordinary high throughput of structural data.[1][2] Scientists must try to combine and analyze data flows from different sources to draw their hypotheses and conclusions. However, despite this increasing complexity, they tend to rely mainly on compartmentalized tools to only visualize or analyze limited portions of their data. This situation leads to a constant back and forth between the different tools and their associated environments. Consequently, a significant amount of time is dedicated to the transformation of data to account for the heterogeneous input data types each tool is allowing.\nThe need for platforms capable of handling the intricate data flow is then strong. In structural biology, the numerical simulation process is now able to deal with very large and heterogeneous molecular structures. These molecular assemblies may be composed of several million particles and consist of many different types of molecules, including a biologically realistic environment. This overall complexity raises the need to go beyond common visualization solutions and move towards integrated exploration systems where visualization and analysis can be merged.\nImmersive environments play an important role in this context, providing both a better comprehension of the three-dimensional structure of molecules, and offering new interaction techniques to reduce the number of data manipulations executed by the experts (see Figure 1). A few studies took advantage of recent developments in virtual reality to enhance some structural biology tasks. Visualization is the first and most obvious task that was improved through new adaptive stereoscopic screens and immersive environments, plunging experts into the very center of their molecules.[3][4][5][6][7] Structure manipulations during specific docking experiments have been improved thanks to the use of haptic devices and audio feedback to drive a simulation.[8] However, if 3D objects can rather easily be represented and manipulated in such environments, the integration of analytical values (energies, distance to reference, etc.)\u20142D by nature\u2014leads to a certain complexity and is not a solved problem yet. As a consequence, no specific development has been made to set up an immersive platform where the expert could manipulate data coming from different sources to accelerate and improve the development of new hypotheses.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Immersive, augmented reality, and screen wall environments used for molecular visualization: (A) EVE platform, a multi-user CAVE-system composed of 4 screens (LIMSI-CNRS\/VENISE team, Orsay), (B) Microsoft Hololens and (C) screen wall of 8.3 m2 composed of 12 screens at full HD resolution with 120 Hz refresh rate in stereoscopy (IBPC-CNRS\/LBT, Paris).\n\n\n\nThis lack of development can also be partly explained by the significant differences between the data handled by the 3D visualization software packages and the analytical tools. On one side, 3D visualization solutions such as PyMol[9], VMD[10], and UnityMol[11] explore and manipulate 3D structure coordinates composing the molecular complex that will be displayed. The scene seen by the user is composed of 3D objects reporting the overall shape of a particular molecule and its environment at a particular state. This scene is static if we are interested in only one state of a given molecule, but is often dynamic when a whole simulated trajectory of conformational changes over time is considered. Analysis tools, on the other side, handle raw numbers, vectors, and matrices in various formats and dimensions, from various input sources depending on the analysis pipeline used to generate them. Their outputs are graphical representations of trends or comparisons between parameters or properties in 1 to N dimensions formatted in a way that experts can quickly understand and use such information to guide their hypotheses.\nSome of the aforementioned software do provide tools to gather analyses as static plots aside the 3D visualization space. Interactivity is limited and flexibility mainly depends on the user capability to create and tune scripts to improve the information displayed. We believe that a major improvement of tools available today would bring into play a scenario where the 3D visualization of a molecular event is coupled to monitoring the evolution of analytical properties, e.g., sub-elements such as distance variations and progression of simulation parameters, into a single working environment. The expert would be able to see any action performed in one space (either 3D visualization or analysis) with a coherent graphical impact on the second space to filter or highlight the parameter or sub-ensemble of objects targeted by the expert.\nWe have developed a pipeline that aims to bring within the same immersive environment the visualization and analysis of heterogeneous data coming from molecular simulations. This pipeline addresses the lack of integrated tools efficiently combining the stereoscopic visualization of 3D objects and the representation\/interaction with their associated physicochemical and geometric properties (both 2D and 3D) generated by standard analysis tools and that are either combined to the 3D objects (shape, colour, etc.) or displayed on a dedicated space integrated in the working environment (second mobile screen, 2D integration in the virtual scene, etc.).\nIn this pipeline, we systematically combine structural and analytical data by using a semantic definition of the content (scientific data) and the context (immersive environments and interfaces). Such a high-level definition can be translated into an ontology from which instances or individuals of ontological concepts can then be created from real data to build a database of linked data for a defined phenomenon. On top of the data collection, an extensive list of possible interactions and actions defined in the ontology and based on the provided data can be computed and presented to the user.\nThe creation of a semantic definition describing the content and the context of a molecular scene in immersion leads to the creation of an intelligent system where data and 3D molecular representations are (1) combined through pre-existing or inferred links present in our hierarchical definition of the concepts, (2) enriched with suitable and adaptive analyses proposed to the user with respect to the current task, and (3) manipulated by direct interaction allowing to both perform 3D visualization and exploration as well as analysis in a unique immersive environment.\nOur method narrows the need for complex interactions by considering what actions the user can perform with the data he is currently manipulating and the means of interaction his immersive environment provides.\nWe will highlight our developments and the first outcomes of our work through three main sections: the first section attempts to provide a complete background of the usage of semantics in the fields of VR\/AR systems and structural biology. In the second section we will describe and justify our implementation choices and how we linked the different technologies highlighted in the previous section. Finally, in a third section, we will show several applications of our platform and its capabilities to address the issues raised previously.\n\nRelated works \nWe present here the state of the art in the two fields related to this paper: the semantic formalism chosen to represent the data and how semantic representations are applied in bioinformatics.\n\nSemantic modeling formalism and semantic web \nFrom classical logic to description logic, from which was derived the \"conceptual graph\" representation introduced by Sowa[12], many semantic formalisms were used to embed knowledge into applications in order to query and perform reasoning about them.\nThe conceptual graph formalism represents concepts and properties such as connected graphs and allows complex operations on them. However, it quickly reaches some limitations in terms of performances and implementation flexibility. Classical logic is another well-known formalism but is not broadly used in biology and suffers a lack of implementation tools and libraries. A semantic network limits itself to the representation of concepts and their relations through directed or undirected graphs. It is lacking the possibility to reason over the concepts and their links, reasoning that our intended platform needs. The different requirements of our platform, coupled with our aim to make it as generic as possible, made us choose to use description logics as a formalism for knowledge representation and more precisely the semantic web as underlying standard for the creation of our ontology and the associated knowledge base.\nThe semantic web has been created by the World Wide Web Consortium under the lead of Tim Berners-Lee, with the aim to share semantic data on the web.[13] It is broadly used by the biggest web companies to uniformly store and share data. It belongs to the family of description logics that use the notions of concepts, roles, and individuals. The concepts are represented by the sub-ensemble of elements in a specific universe, the roles are the links between the elements, and the individuals are the elements of the universe. Each layer of the semantic web (ontology, experimental data, querying process, etc.) has been associated to a language or a format.\nThe following four standards create the core of the semantic web and act as the layers evoked previously: the Resource Description Framework (RDF)[14], the Resource Description Framework Schema (RDFS)[15], the Web Ontology Language (OWL)[16], and SPARQL.[17] Whereas the first three standards enable semantic descriptions of data in the form of ontologies and knowledge bases, the last standard enables queries to ontologies and knowledge bases (see Figure 2).\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. Web semantics and its different layers. This figure describes the main format classically used for each layer: RDF, RDFS, OWL, SPARQL, etc. Source : http:\/\/www.w3.org\/2001\/sw\/\n\n\n\nRDF is a data model, which allows the creation of statements to describe resources. Each statement is a triple comprised of: a subject (resource described by the statement), a predicate (property of the subject), and an object (literal value or resource identified by a URI, which describes the subject). An example of a triple is: <#Molecule, #has-charge, -1>\nRDFS and OWL are semantic web standards that extend the expressiveness of RDF by providing additional concepts. RDFS provides hierarchies of classes and properties as well as property domains and ranges. OWL, built upon RDF and RDFS, provides symmetry, transitivity, equivalence, and restrictions of properties as well as operations on sets of resources. In turn, SPARQL is a query language for ontologies and knowledge bases built using RDF, RDFS, and OWL. Conceptually, in terms of possible operations on data, SPARQL is similar to SQL, as it enables data selection, insertion, update, and removal.\nIn the semantic web, two types of statements are distinguished. Terminological statements (T-Box) specify conceptualization, classes and properties of resources[18], without describing any particular resources. Assertion statements (A-Box) specify utilization, particular resources (also called individuals or objects), which are instances of classes described by properties with particular values assigned. For example, a T-Box specifies different classes of molecules (different chemical compounds) and properties that can be used to describe them (e.g., charge and the number of neutrons), while an A-Box specifies particular molecules (instances of the classes) with given charges. In this paper, an ontology is a T-Box, while a knowledge base is the union of a T-Box and an A-Box. Ontologies and knowledge bases constitute the foundation of the semantic web across diverse domains and applications. In particular, ontologies can specify schemes of molecular descriptions, while knowledge bases\u2014particular descriptions (instances of such schemes) with individual objects\u2014are used for analysis and visualization. Due to the use of the standards encoded in XML or equivalent formats, ontologies and knowledge bases are interpretable to software, making them intelligible to users. Moreover, since RDFS and OWL are built upon description logics, which are formal knowledge representation techniques, ontologies and knowledge bases can be subject to reasoning, which is a process of inferring implicit (tacit) properties of resources (which have not been explicitly specified by the author) on the basis of their explicitly specified properties.\nFor instance, from the following triples explicitly specified by the content author:\n<my:is-composed-of> <my:is-a> <owl:TransitiveProperty>\n<my:Protein> <my:is-composed-of> <my:Amino-acid>\n<my:Amino-acid> <my:is-composed-of> <my:Atom>\nthe following statement can be inferred by software:\n<my:Protein> <my:is-composed-of> <my:Atom>\nHere, thanks to the definition of property \u201cis-composed-of\u201d as transitive, we can infer that atoms, that compose amino acids, compose as well a protein since amino acids compose proteins. The second statement does not need to be added to the ontology since automatically inferred. This reduces significantly the number of statements to store in the database and potentially allows for more complex inferences.\n\nOntologies in bioinformatics \nOn the application side, the use of ontologies in order to standardize knowledge in scientific fields underwent an important and spontaneous growth at the end of the 1990s.[19] Bioinformatics, tightly anchored in structural biology, has used ontologies for a long time. The most significant example is the fast-growing genomic field, in which it became impossible to handle data flow without a proper and standardized organization of the data.[20] The tool Gene Ontology[21] regroups genomic data into a uniform format and a knowledge base. Currently, it is one of the most referred to ontologies in the literature. Rabattu et al.[22] propose an approach to spatio-temporal reasoning on semantic descriptions of an evolving human embryo. Several biological databases or organizations such as UniProtKB and the Open Biomedical Ontologies[23] provide ways to access data or ontologies under RDF or OWL format to allow their use in expert tools or specific pipelines. One can also note the open-source project Bio2RDF[24] that aims to build and provide the largest network of \"Linked Data for the Life Sciences\" using semantic web approaches.\nOnly a few expert software packages based on ontologies have been developed for structural biology. Avogadro[25] and DIVE[26] appear as exceptions, implementing, in different ways, a semantic description of data that can be manipulated in these environments. Avogadro uses the Chemical Markup Language (CML)[27] as the format for describing data semantics, and it adds a semantic description layer on top of the data being described. However, the tool leverages neither ontologies nor other knowledge representation formalisms, thus it does not permit reasoning on the described data.\nDIVE partially creates ontologies and datasets derived from the input data upon loading. Pre-formatted input in a row\/column representation are converted into a SQL-like structure where rows are individuals and columns properties. This data representation conforms to a common data model that the software libraries use. Therefore, creation of links between data values and concepts are possible, and different DIVE components for data presentation (analyses, 3D visualization, etc.) as well as links and relationships between dataset elements can be queried. In addition, DIVE includes a powerful and generic ontology creator directly depending on the type of the input data. However, reasoning on ontologies in DIVE is limited to inheritance between classes. Consequently, only a few ontological relationships are available: is-a, contains, is-part-of, and bound-by. There is no notion of cardinality or logical operators to define the concept classes. Then, it is not possible, for instance, to force the presence of a property, or to impose that only a fixed number of values are associated to a specific property (e.g., a molecule must have at least one atom, an Alanine side-chain has a minimum of three atoms and a maximum of four atoms, etc.). These limitations render the DIVE environment insufficient to solve the problem stated in this paper.\n\nUsing a semantic representation to efficiently store, query, and link heterogeneous structural biology data \nSeveral important choices have been made to integrate the different technologies required for the establishment of a platform that would allow a proper 3D immersion of users together with an accurate and intelligent way to interact with their data. Our platform heavily relies on the ontology\/knowledge base couple. The way to represent and access the data present in the databases is of a crucial importance, and this point led us to ask ourselves the question of the most appropriate formalism for the data representation.\n\nKnowledge formalism choice \nThe formalism of knowledge representation used in our approach must address the following three rules to properly fit our platform needs:\n\n Hierarchical data representation via concepts and properties\n Advanced reasoning possibility in order to extend the ontology or the dataset ruled by the ontology\n Efficient query time on the data to stay within interaction time\nWe mentioned previously that several formalisms exist to create ontologies and define databases. A quick comparison of these formalisms, complementary to their introduction in the previous section, can be found in Table 1.\n\n\n\n\n\n\n\nTable 1. Comparison of different knowledge representation formalisms with respect to key criteria\n\n\nFormalism\n\nDomain description\n\nReasoning on knowledge\n\nBig data management\n\nEfficient\n\nImplementation flexibility\n\n\nConceptual graphs\n\nX\n\nX\n\n-\n\nX\n\n-\n\n\nSemantic networks\n\nX\n\n-\n\nX\n\nX\n\n-\n\n\nClassical logics\n\nX\n\nX\n\nX\n\nX\n\n-\n\n\nDescription logics\n\nX\n\nX\n\nX\n\nX\n\n-\n\n\n\nOur first implementation of a semantic representation of knowledge in molecular biology was applied through conceptual graphs (CG) within Cogitant\u2019s software.[28] The use of CGs through the Cogitant API quickly proved to be incompatible with the constraints of the interactive context. This limitation had already been highlighted by the work of Yannick Dennemont[29] with the Prolog CG API, limitations confirmed by our own experience with the Cogitant library in C++. The need for high performance imposed by the interactive context has led us to the path of description logic and semantic web for the representation of knowledge and the efficient extraction of information within a massive fact base to support Visual Analytics functionalities in molecular biology.\n\nOntology for modeling of structural biology concepts \nAn OWL-based ontology was implemented as the core of the platform, thereby creating a broad description of concepts an expert has to interact with during his\/her visualization and analysis activities. We previously mentioned that several bio-ontologies already exist. We extended one of them, a bio-ontology describing amino acids and their biophysical and geometrical properties to define the molecular objects and principles manipulated in structural biology. Each component structuring molecular complexes and each associated property coming from various common bio-informatics tools have been systematically defined and added to this ontology. However, since needs may vary, we have designed this ontology such that it could easily be updated and enriched with new concepts. A tiny subpart of our ontology is illustrated in Figure 3. Our ontology has been designed around five categories, addressing five different parts of our platform:\n\n Biomolecular knowledge \u2013 Field-related concepts and objects in structural biology\n 3D structure representation \u2013 Concepts related to the representation and visualization of 3D molecular complexes\n 2D data representation \u2013 Concepts related to the representation of numerical analyses and their results\n 3D interactions \u2013 Concepts related to the interactions in 3D environments\n 2D interactions \u2013 Concepts related to the interactions in 2D environments\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 3. A part of our structural biology ontology used in our application\n\n\n\nThe separation of the categories does not induce the absence of relationships between them. For instance, the \u201cAtom\u201d concept belongs to the \"Biomolecular\" knowledge category but is directly linked to the \u201cSphere\u201d concept from 3D structure representation. The whole network of connections will then permit reasoning on the ontology in order to support the advanced interactivity level required in our platform.\nConcepts and properties among the 3D structure representation and 2D data representation categories gather the graphical elements that allow for the representation of the \"Biomolecular\" knowledge category. Shape, colors, but also graph types are notions defined in these two categories. It is worth noting that analytical concepts are defined by graphical or abstract elements that play a role in the creation and visualization of an analytical result. However, we voluntarily chose not to define the different calculations and analyses related to molecular simulation data because of their high complexity and their heterogeneous nature varying significantly between the range of available specialized tools. This choice does not imply that the results of such analyses will not be used among the platform, merely that it is not relevant to include their definition in the ontology. The values of their results are nevertheless defined in the ontology under the form of properties of the individuals they bring to play.\nIn addition to the biomolecular concepts and representations previously cited, we also defined every concept around the interaction between the user and the data they will directly or indirectly manipulate. These interactions include commands proposed by most of the common visualization software packages and analysis tools.\nOur full ontology is publicly available online.\n\nStoring molecular data linked by a structural biology ontology \nOnce we set up our ontology, it was possible to feed the database by adding biological information gathered by the expert. The new information has to fit the vocabulary and classification defined by the rules present in the ontology in order to be adequately stored in the database. This combination of ontology and knowledge base will form the RDF database (as illustrated later in this article in Figure 6).\nThe description of a molecular system is constructed from the analysis of any biological information that can be described by a character chain or a value and that corresponds to a concept or property identified in the ontology. Each information will be exhaustively gathered in the RDF database as triples. Within the scope of our study, we focused on numerical molecular simulations. These simulations output time series of static snapshots of the molecular system at a regular time step. The Hamiltonian of the simulated model will drive the system towards specific states that experts try to decipher in order to understand underlying molecular mechanisms. The whole simulation creates a trajectory where each state, at a precise time, is associated to a snapshot. Our ontology defines a snapshot by the \"model\" concept. A model gathers all the atom coordinates of the molecular system at a defined time step. In order to distinguish the different components of a system, these components are identified by \"chain,\" another concept of our ontology. Each chain in the system is composed of a sequence of \"residues\" (also known as amino-acids in proteins). The different inference rules present in the ontology save us to specify all the links between the different hierarchical components of a specific model explicitly. As a result, a residue that belongs to a specific chain will be automatically associated to the corresponding model where the chain appears. Similarly, group of atoms, the smallest entities of a molecular structure at our scale, constitute residues and are then directly linked to chains and models.\nEvery geometrical property (position, angle, distance, etc.), physicochemical property (solvent accessibility, partial charge, bond, etc.), or analytical property (interaction energy, RMSD, temperature, etc.) is then integrated in the database and associated to individuals created from 3D structures (Model\/Chain\/Residue\/Atom) for each step of the simulation. As a reminder, any individual is an instance of concepts defined in the ontology. Individuals and their properties form the population of the molecular database.\n\nUsing semantic queries to support direct interactions for a new generation of molecular visualization applications \nOnce all data has been integrated in the RDF database, it is necessary to set up an interrogation system able to retrieve the data for visualization and processing following interaction events in the working space. Our implementation of the query system mainly relies on the usage of SPARQL, as introduced before, and provides several ways to address the different needs of our platform.\n\nFrom vocal keywords to application command \nThe richness and flexibility of SPARQL queries allowed us to design a keyword to command interpretation engine that aims to transform a list of keywords into a comprehensive application command triggering an action in the working space.\nOne of the most-widely used interactive techniques in immersive environments is the vocal command. Based on a vocal recognition process, it consists in translating a sentence or a group of words said by the user into an application command. Vocal commands have the strong advantage that they can be associated with gestures to express complex multimodal commands.\nMost of the actions identified in our platform involve a structural group designated by the expert. These structural groups can be characterized by identifiers having a biological meaning (for example residue ids are, by convention, numbered from one extremity of the chain to the other), unique identifiers in the RDF database, or via their properties. The interpretation of commands vocalized by the expert with natural language using a specific field-related vocabulary requires a representation carrying the complexity of the knowledge and linking the objects targeted by the user to the virtual objects involved in the interaction.\nFor this purpose, we set up a process that takes as input a vocal command of the user and translates it into an application command for the operating system. This procedure can be divided in three main parts:\n\n Recognition of keywords from a vocal command\n Keyword classification into a decomposed command structure\n Creation of the final and operational command\nOur conceptualization effort and the use of the ontology mainly focused on the second part. Parts one and three are more implementation oriented and will not be deeply described.\n\nKeyword recognition \nWe are using the keyword spotting capability of Sphinx[30], a vocal recognition toolkit, to recognize keywords. Based on a dictionary created from the ontology list of concepts, it aims to detect any word said by the user that would match a word present in the dictionary.\n\nKeyword classification \nEach keyword recognized in the previous step is assigned to a category. This classification is based on our ontology splitting, which identifies five categories of words that can be found in a vocal command, semantically modeled as:\n\n Action\n Component\n Identifier\n Property\n Representation\nThis classification is achieved through successive SPARQL queries to the ontology. Action, Component, Property, and Representation categories have their own concepts and can be identified by a unique word (\u201cHide,\u201d \u201cChain,\u201d \u201cCharged,\u201d \u201cSphere,\u201d etc.). At the opposite, the Identifier category is linked to a concept instance from the Component category. A biological identifier is very likely to be redundant because of the repetition of the molecular system at each time step. Therefore it is mandatory to pair an identifier with a component in the keywords in order to validate its presence. Without component, any identifier is withdrawn from the list. If the identifier and the associated component exist in the database, the couple is validated.\nSPARQL commands use the ASK operator to define whether a keyword belongs to a category or not. This operator takes one or several triples and returns a boolean that reflects whether the ensemble of triples is true or not with respect to the database. Some examples of queries can be found below:\nASK {my:Hide rdfs:subClassOf my:Action}\nASK {my:Alanine rdfs:subClassOf my:Biological_component}\nASK {my:Cartoon rdfs:subClassOf my:Representation}\nASK {my:Aliphatic rdfs:subClassOf my:Property}\nReasoning and inference rules are automatically used in SPARQL queries. For instance, the following query:\nASK {my:Alanine rdfs:subClassOf my:Biological_component}\nwill output true despite the absence of an explicit direct link between the two concepts (Alanine and Biological_component) since AminoAcid, Residue and Molecule are located between the two concepts (see Figure 4).\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 4. Extract from our OWL ontology for the Alanine concept\n\n\n\nCommand creation \nOnce each keyword is validated and associated to a category, e.g., identified as a concept of the database (or as an individual for identifiers) and eventually grouped with another keyword, it forms a syntactic group. Each syntactic group carries an information corresponding to a specific part of the application command.\nIn our platform, a vocal command is composed by a succession of syntactic groups linked between them to create an action query to the immersive platform. It is possible to describe the type of command that was defined in the following manner:\naction\u2003[parameter]+,\u2003(\u2003structural_group\u2003[identifier]+\u2003)+\nSyntactic groups between [] are optional, whereas others are mandatory. The + indicates the possibility to have 0, 1, or several occurrences of the syntactic group. Finally, () indicates a bloc of syntactic groups. This command architecture is present in our ontology under the form of pre-required concepts associated to the action concepts. For instance, the action concept \"Color\" requires a property of \"Colors\" type and a structural component to work with. These elements of information are then stored in the ontology, rendering them automatically checkable by the engine to detect whether all requirements are fulfilled for a specific action. This feature simplifies the definition of other actions in the ontology as the changes that have to be applied to the engine are minimal, typically either no or minor changes. The checking process will stay the same as long as the action is well-defined within the ontology.\nAt the same level as for an action, a structural group is always mandatory to trigger a command. The different ways to obtain a structural sub-ensemble are:\n\n Component only: every individual that belongs to the concept will be taken into account\n Combination of a component and an ensemble of identifiers: coherency checking between component and identifiers\n Property only: every individual that possesses the property will be taken into account\n Combination of a component and a property: coherency checking between component and property\nThe structural group always refers to a group of individuals in order to disambiguate the results between the commands. This disambiguation implies that final commands are more complex. The hierarchical classification between structural components (Model\/Chain\/Residue\/Atom) has a significant impact on the results of a given command. Indeed, the nature of structural components targeted by an action will be compared to the nature of the structural components currently studied. Depending on whether the command individual will be of a higher or lower hierarchical order, the command might trigger an action either on a subpart of the displayed scene (for lower classified individuals) or as a scene composition changer (for higher or equal classified individuals). For instance, if only two models are studied when a vocal command is transmitted, putative amino acids individually targeted by an action will be the ones that belong to the two displayed models. If the individuals targeted by the command action would have been models, different from the displayed ones, an update of the displayed molecular complexes would have occurred first.\nOnce the different checks for the command coherency and validity have been carried out, the command is sent to both spaces (visualization and analysis) in order to synchronize the visual results.\n\nPerformances \nThe performance of our interpretation engine has been tested on several simple and complex voice commands, and execution times have been calculated (see Table 2). In order to clarify the results table, we performed the tests on an RDF database containing information from a molecular simulation of a 19-amino-acids peptide whose primary sequence is KETAAAKFERQHMDSSTSA. This structure was artificially created with PyMol[9] and a short MD using GROMACS[28] was used to simulate the newly created system and get a short trajectory. The ontology used here is the one created for our platform. We place ourselves in a context where the hierarchic structural level of the environment is amino acid, mainly to take advantage of the many properties associated with this hierarchical level in the ontology and thus be able to avoid complex commands. The syntax of the commands is adapted to be interpreted by the PyMol software. Finally, these tests were carried out independently of the SPHINX software in order to be able to compare them among themselves without any side-effects of the vocal interpreter\u2019s performance. The set of input keywords was then provided manually for each test.\n\r\n\n\n\n\n\n\n\n\nTable 2. Example of commands used to evaluate performance of the inference engine for voice recognition\n\n\nKeywords\n\nExpected command\n\nGenerated command\n\nCompletion time\n\n\nHide, Lines, Model, 128\n\nHide lines, residue 1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16+17+18+19 and model 128\n\nHide lines, residue 1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16+17+18+19 and model 128\n\nApprox. 54 milliseconds\n\n\nColor, Alanine, Blue\n\nColor blue, residue 4+5+6+19\n\nColor blue, residue 4+5+6+19\n\nApprox. 72 milliseconds\n\n\nShow, Secondary_structure, Residue, [2,5], Cartoon\n\nShow cartoon, residue 2+3+4+5\n\nShow secondary_structure, residue 2+3+4+5\n\nApprox. 56 milliseconds\n\n\nShow, Positive, Residue, Polar, Sphere, Chain, A\n\nShow sphere, residue 1+7+10 and chain A\n\nShow sphere, residue 1+2+7+9+10+11+12+14 and chain A\n\nApprox. 550 milliseconds\n\n\n\nAs we can see in Table 2, the overall precision of the interpretation engine is rather good, and only the last generated command significantly differs from the expected command reported in the table (5th line, 2nd column).\nOne could argue that the third command shows only a partial match between the expected and generated commands. However, we can observe that the engine successfully identified the concepts of \u201cSecondary structure\u201d and \u201cCartoon\u201d as equivalent (as illustrated in Figure 3) but chose to keep only the former, only based on its position in the keyword list, to create the query. In this case, \u201cCartoon\u201d refers directly to a particular visual representation, whereas \u201cSecondary structure\u201d is more related to a biological concept, the spatial arrangement of consecutive residues within a protein. The addition of a filter to define what representation keywords are allowed at the software level would be necessary to remove any command ambiguity.\nThe fourth and last command was supposed to show, as spheres, all residues that were both polar and positive. The difference in the list of residue IDs present there is due to a lack of a logical connector between the two properties. The engine interpreted this lack of connector as a logical \u201cOR\u201d instead of the expected \u201cAND\u201d and then output all residues that were either positive or polar (or both). This error points to the problem of interpretation by keyword when logical connectors must be used. It is then necessary to take these two possibilities into account and add their interpretation within the reference engine.\n\nLimits and perspectives \nOur interpretation engine is able to convert a wide range of keyword lists, ordered and unordered, into a functional and understandable software command for a specific molecular viewer. It does, however, have some limitations that provide interesting opportunities for future work. We have seen that the integration of the concept of logical connectors is essential in order to be able to handle multiple filter situations on individuals. These logical connectors can hardly fit in with our actual ontology, not really belonging to any of the five definition sets around which it was built. But logical operations are possible in SPARQL, which implements logical operators such as AND, OR, UNION, etc. Then the missing part lies at the interpretation engine that needs to incorporate those keywords and properly handle them to form the SPARQL command that will query the database.\nIt is important to note that the efficiency of the inference engine also depends on the quality of keywords collected by the speech recognition step. In this example this relates to our implementation but, more generally, to the generative step of these keywords. An absence of one or more keywords or the recognition of an erroneous keyword are errors that can be considered as common. In order to allow for a more pedagogical and intelligent way to provide a command than a simple error feedback and invitation to repeat the command, it is possible to use the knowledge accumulated in the ontology to provide the user with a controlled subset of relevant keywords to complete the command. This feature participates in the effort to provide an informed interaction mode between the expert and his visualization space, thus facilitating user experience. In the same spirit, the ability to provide the expert with a finite number of identifiers to perform his selection could anticipate certain user errors. It would therefore be possible to disambiguate a keyword identified as non-compliant with what would be expected or complete a partial command for which one or more keywords would be missing.\n\nSynchronizing interactive selections between 2D and 3D workspaces \nWe have seen in the previous section that our interpretation engine is able to translate a list of vocalized keywords into an application command, but it provides further possibilities through its semantic-based architecture. Each interaction of the user with a structural group, a property, or an analytical value is ultimately translated into a list of individuals and their associated representations. This capability allows to not only execute commands within the dedicated software but also to synchronize the visual and analytical spaces between each other. As a consequence, each command that involved a selection is not only interpreted by the software but also by the platform that passes on the selection information to all spaces and their components (e.g., plots, graphs, etc.). (See Figure 5.)\nAny selection made by the user triggers an event transmitted to a management module, resulting in an adaptation of the visualization to highlight the individual(s) selected.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 5. 3D structure visualization and analytical plot of residue distance to the center of mass for the KETAAAKFERQHMDSSTSA peptide in two different spaces of the same environment. The highlighted selection is the result of the 2nd command from Table 2.\n\n\n\nBeyond its highlighting impact, a selection also reduces the user focus to a subset of individuals, both in the analysis space and the visualization one. It is possible to adapt this focus according to the user\u2019s needs by modifying the context level at which they want their selection to appear. Three levels of contextualization are possible:\n\n No context \u2013 The selection of individual(s) leads to the unique visualization of these individuals in the visualization and analysis space and therefore hides any unselected individuals.\n Weak context \u2013 The selection of individual(s) highlights these individuals in the workspaces and reduces the perception of other individuals of the dataset (grey color, transparency, simplified visual rendering, etc.).\n Strong context \u2013 The selection of individual(s) is only perceived through a simple emphasis on these individuals in the work spaces. Any other individual will also appear with visual parameters close to the selected individuals.\nThese different levels make it possible either to highlight the differences between the selection and the rest of the data set, or to set up a streamlined working environment on a selection of interest to the user. These levels apply to both the visual and analytic parts through visual rendering systems specific to each space.\n\nSemi-automated analyses triggered by direct interactions \nAlthough the majority of the data is present in the database created by the user, a regular work session often requires additional data, for example resulting from post-simulation calculations and therefore missing from the original database. These calculations are usually managed within scripts, sometimes linked to simulation tools, and executed outside the visualization loop as a result of the observation of a particular phenomenon during the exploration or following other analyses already performed beforehand. In order not to overload the database and leave the user in control of the analyses he wants to perform, we have set up the possibility of launching some semi-automated analyses during the working session.\nSPARQL query language allows, in addition to querying a database, to modify, delete, or add data to the database. This possibility allows to feed the database with the results of analyses launched during the working session of a user. A list of analyses has been compiled and an ontological definition has been defined for each of them. This definition provides the type of data used as input and the type of data output. Thus, with respect to the desired analysis, our platform will propose a filtered choice of individuals to be selected whose type match the data type expected. In the same way, the values generated as output of the analysis are automatically entered into the database with respect to their ontological definition.\nA \"distance\" tool requires, for example, two individuals of the same hierarchical level, or a selection of individuals of higher hierarchical level, between which these distances will be calculated. It is possible to classify these analyses into two categories:\n\n Simple analyses group together analyses that generate a value that can be added directly to the properties of the individuals concerned. These include solvent accessibility, hydrophobicity, energy, and so on.\n Complex analyses are the result of a property describing a relationship between two individuals and thus requiring knowledge of these individuals to be perceptible. Complex analyses linking two individuals; the distance between two atoms, the RMSD between two sets of individuals, the angle between two chains, etc., are just some of the complex analyses that link two individuals.\nWhile simple analyses simply add a property and the associated value to an individual, complex analyses must create a particular instance of one of the \"analysis\" concepts of the ontology. This concept will bring together the information\/definition needed to understand it. For example, the ontology\u2019s distance (analysis type) concept will store any calculated distance between two individuals for a selection of defined parent structures. The value of the distance, the URI of the two individuals involved, and all the structures within which the calculation was carried out will be properties of a distance instance and will be accessible only through that instance. The difference between a SPARQL query accessing values from a simple analysis and the SPARQL query accessing values from a complex analysis is illustrated below:\nSELECT DISTINCT ?temp WHERE {my:MODEL_161 my:temperature ?temp}\nSELECT DISTINCT ?distance WHERE {?indiv rdf:type my:Distance . ?indiv my:objectA my:RES_3622 . ?indiv my:objectB my:RES_3626 . ?indiv my:distance ?distance}\n\nPlatform architecture \nThe different components highlighted in the previous sections must efficiently communicate with each other to provide realistic feedback to the users. Our platform architecture, both from a hardware and a software perspective, had to be carefully planned to ensure that all tasks performed by the users are treated within an interactive time-frame (on the order of magnitude of a second for the analyses). Our platform design is based on a complex software architecture. In the diagram shown in Figure 6, we deliberately placed it in the middle of a double-sided communication loop connecting the visualization space to the analysis space. Our database is hosted on a local server accessible from the network to guarantee privileged and optimized access to our data. All communications are optimized to reduce the latency between a request triggered by the front-end sensors, its translation into a query in the database together with the treatment and transformation of the query results in the back-end, and finally the response presented to the user, once again at the front-end level.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 6. Software and hardware architecture of our platform as UML deployment diagram\n\n\n\nScenario and evaluation \nScenario \nTo illustrate the full-capacity of our platform architecture, we chose a typical example of a molecular system study. This example sets up a local visualization solution coupled to a distant web server where interactive graphs can be created. Both spaces can be rendered in an immersive environment, either in the same screen space or split on one 3D screen for the visualization and a tablet providing analysis results through a web server (see Figure 9, later). We assume, as it is the case in real studies, that the expert knows the molecular system well and can therefore interact vocally or by selecting elements in one of the spaces.\nOur scenario studies the results of a molecular dynamics[31][32] experiment applied to a protein. We are voluntarily skipping the MD parametrization details since this was setup as a proof-of-concept and follows a very standard protocol.\nIn the first step of our scenario, the analytical space (web server) triggers a SPARQL query to retrieve every numerical value from our database. A list containing all the values will then be created and presented to the user for each structural component level (Model\/Chain\/Residue\/Atom) as illustrated in Figure 7. Once the data values are gathered, the expert will choose which structural component hierarchy he is interested in and which combination of properties he wants to plot in its analytical space.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 7. Query results showing all present numerical values from the database for each representation level available (Model\/Chain\/Residue\/Atom)\n\n\n\nSeveral queries will retrieve the property values that will be plotted thanks to the graphs library D3.js. Here the RMSD of each model with respect to the starting conformation has been plotted. On the X-axis we see the time step corresponding to each model of the MD trajectory and on the Y-axis their associated RMSD value.\nSeveral models of interest can be selected, either via a vocal command or by direct selection with the 2D interactive plots, as shown in the first step of Figure 8. We selected here the three lowest RMSD models (including the reference). The selection is synchronized over all previously created scatter plots and will trigger a synchronous visualization of the individuals in the visual space (see second step of Figure 8).\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 8. On the right, analytical space where interactive plotting are added upon user actions in the visualization space or through the available menus. On the left, visualization space, each object is displayed synchronously with the selected individuals of the analytical space.\n\n\n\nThe expert may then switch to the visualization space and select some elements of the displayed structures he would like to focus on. We selected here three residues from the three different models. These sub-elements of the current models will be sent to the analytical space that will ask the expert for the properties to be plotted. As in the previous step, a list of available numerical values associated to the residues will be provided. Once the choice is made, the selection will be highlighted in the analytical space as shown in the third step of Figure 8. We chose here to display the solvent exposed area with respect to the residue IDs. In blue, the three residues we have selected in the visualization space are displayed as mentioned in the fourth step of Figure 8.\nNew graphs can be added at runtime and synchronized with the current ones. However, it is important to note that a full synchronization between the visualization and analytical spaces requires the same hierarchy of structural elements to be selected in both spaces. If a new selection is made at a model level, any graphs of lower hierarchy will be reset with the new selected models and the visualization will be reset with the new models at the same time.\n\nEvaluation of high-level task completion based on hierarchical task analysis \nThe evaluation process started from the observation that the systematic evaluation of field-related tasks is rather complicated to set up for four reasons. (1) Usage and nature of the evaluated tools, in particular in molecular visualization, differ between experts. (2) Implementation and adaptation of our developments over a representative sample of the tools is complex and very time-consuming. (3) Our approach is biased since it is based on the execution of expert tasks. (4) In order to apply standard statistic methods for evaluation, it is necessary to gather enough participants, yet the number of experts in our application field is rather limited.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 9. Platform illustration in a hybrid environment made of a 3D immersive CAVE2 system (EVL\/UIC, Chicago) together with a graphical tablet for 2D analytical representation.\n\n\n\nWe therefore propose an evaluation method that is more theoretically oriented than empirical: the HTA method (for \u201cHierarchical Task Analysis\u201d).[33] The HTA method consists of a division of a primary task into several sub-tasks. Each sub-task can be subdivided again until the sub-tasks reach a degree of precision sufficient to have their execution time evaluated accurately. This method is particularly useful to compare similar tasks performed under different conditions. It allows to evaluate both the task methodology with respect to specific conditions and the performance of the conditions for a specific task. HTA requires only one expert to compare the different sub-task execution times (see Figure 10).\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 10. Subdivision by HTA of an expert task performed (A) in normal conditions and (B) within our platform setup\n\n\n\nWe evaluate here a typical task that a structural biologist would perform on a daily basis. Here we asked experts to measure the diameter of the main pore of a transmembrane protein in two different setups. One is a typical setup where visualization software and some analysis files are available with an atomic model of the transmembrane protein. The second setup involves our platform, where the visualization software is now connected to a web page where interactive graphs can be displayed. The expert can interact with both spaces through two different devices connecting locally and where network latency is negligible. Tests have been made with a laptop where an instance of PyMol was running and a tablet where 2D plots were displayed within a web browser.\nThe task can be divided into three distinct steps. It requires a first step in the processing of analytical data where the lowest energy model will be sought among models more than 10 \u00c5 of RMSD distant from the reference model, this distance reflecting significant conformational changes. When the model concerned is identified, it should be visualized in order to see the pore and be able to select its ends. The third and final step consists of calculating the distance between two atoms on either side of the pore.\nThere is a significantly shorter runtime when using our platform (19 seconds) compared to a standard use of the analysis and visualization tools (29 seconds). The first step of analysis is the stage where the difference is most important and is highlighted by the orange sub-tags in the HTA graph in Figure 10. This difference can be explained by the use of interactive graphs to visualize RMSD and energy values for all models. The interactive graph and the selection tools associated (vocal recognition or manual selection) allow to quickly query all models more than 10 \u00c5 away from the reference. Identifying the model with the lowest energy is then a really quick visual analysis of the energy graph. On the opposite, use of standard tools in command-line is more complicated because it requires a more complex visual analysis. It is indeed more tedious to find a minimum value while going over a text file than by looking at a cloud of dots.\nThen, the synchronization of the plot selections within the analytical space allows us to even further shorten the time required to find the lowest energy model in the second plot among the ones selected in the first.\nLoading the model into the visualization software is also made easier in the platform since our application makes it possible to automatically pass on the selection of the model from a plot directly into the visualization space. The similar steps, shown in green in Figure 10, involve execution times and are therefore independent of the working conditions in which the sub-tasks are performed.\n\nConclusion \nImmersive virtual reality is yet used only sparsely to explore biomolecules, which may be due to limitations imposed by several important constraints.\nOn the one hand, applications usable in virtual reality do not offer enough interaction modalities adapted to the immersive context to access the essential and usual features of molecular visualization software. In such a context, paradigms of direct interaction are lacking, both to make selections directly on the 3D representation of the molecule, and through complex criteria, to interactively change the different modes of molecular representations used to represent these selections. Until now, these selection tasks have to be made by the usual means like mouse and keyboard.\nOn the other hand, the impossibility of performing other analysis, pre- and post-processing tasks or visualizing these analysis results, closer to the field of information visualization rather than 3D visualization, forces the user to come back systematically to an office context.\nTo address these issues, we have set up a semantic layer over an immersive environment dedicated to the interactive visualization and analysis of molecular simulation data. This setup was achieved through the implementation of an ontology describing both structural biology and interaction concepts manipulated by the experts during a study process. As a result, we believe that our pipeline might be a solid base for immersive analytics studies applied to structural biology. In the same vein as projects by Chandler et al.[34][35] we successfully combine several immersive views over a particular phenomenon.\nOur architecture, built around heterogeneous components, achieves to bring together visualization and analytical spaces thanks to a common ontology-driven module that maintains a perfect synchronization between the different representations of the same elements in the two spaces. One strength of the platform is its independence regarding the visualization technology used for both spaces. Combinations are numerous, from a CAVE system coupled to a tablet to a VR headset showcasing a room where each wall would display either a 3D structure or some analysis. Our semantic layer lies beneath the visualization technology used and only provides bridges between heterogeneous tools aiming at exploring molecular structures on one side and complex analyses on the other.\nThe knowledge provided by the ontology can also significantly improve the interactive capability of the platform by proposing contextualized analysis choices to the user, adapted to the types of elements in his current focus. All along the study process, a set of specific analyses, non redundant with the ones already performed, can be interactively chosen to populate the database. A simple definition of analyses in the ontology, adding input and output types, is sufficient to decide whether an analysis is pertinent or not for a precise selection, and whether the resulting values are already present in the database or not.\nThe reasoning capability of the ontology allowed us to develop an efficient interpretation engine that can transform a vocal command composed of keywords into an application command. This framework paves the way for a multimodal supervision tool that would use the high-level description of the manipulated elements, as well as the heterogeneous interaction natures, to merge inputs and create intelligent and complex commands in line with the work of M.E. Latoschik.[36][37] The RDF\/RDFS\/OWL model coupled to the SPARQL language allows to enunciate rules of inference, which is particularly important for the decision taking process in collaborative contexts. In these contexts, two users may trigger a multimodal command, in a conjoint way, that can be difficult to interpret without proper rules. An effort would then have to be made to integrate these rules in a future supervisor of the input modality, based on the semantic model, considering users as elements of modality in a multimodal interaction.\nOur approach is a proof of concept application and is available as a GitHub repository, but it opens the way to a new generation of scientific tools. We illustrated our developments through the field of structural biology but it is worth to note that the generic nature of the semantic web allows to extend our developments to most scientific fields where a tight coupling between visualization and analyses is important. We especially target to integrate all the concepts described in this paper in new molecular visualization tools such as UnityMol[38], which allows a more comfortable code integration compared to classical molecular visualization application.\n\nAcknowledgements \nThe authors wish to thank Xavier Martinez for UnityMol pictures courtesy. This work was supported in part by the French national agency research project Exaviz (ANR-11-MONU-0003) and by the \u201cInitiative d\u2019Excellence\u201d program from the French State (grant \u201cDYNAMO\u201d, ANR-11-LABX-0011-01; equipment grants Digiscope, ANR-10-EQPX-0026 and Cacsice, ANR-11-EQPX-0008).\n\nConflict of interest \nAuthors state no conflict of interest. All authors have read the journal\u2019s publication ethics and publication malpractice statement available at the journal\u2019s website and hereby confirm that they comply with all its parts applicable to the present scientific work.\n\nReferences \n\n\n\u2191 Zhao, G.; Perilla, J.R.; Yufenyuy, E.L. et al. (2013). \"Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics\". Nature 497 (7451): 643\u20136. doi:10.1038\/nature12162. PMC PMC3729984. PMID 23719463. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3729984 .   \n\n\u2191 Zhang, J.; Ma, J.; Liu, D. et al. (2017). \"Structure of phycobilisome from the red alga Griffithsia pacifica\". Nature 551 (7678): 57\u201363. doi:10.1038\/nature24278. PMID 29045394.   \n\n\u2191 van Dam, A.; Forsberg, A.S.; Laidlaw, D.H. et al. (2000). \"Immersive VR for scientific visualization: A progress report\". IEEE Computer Graphics and Applications 20 (6): 26\u201352. doi:10.1109\/38.888006.   \n\n\u2191 Stone. J.E.; Kohlmeyer, A.; Vandivort, K.L.; Schulten, K. (2010). \"Immersive molecular visualization and interactive modeling with commodity hardware\". Proceedings of the 6th International Conference on Advances in Visual Computing: 382\u201393. doi:10.1007\/978-3-642-17274-8_38.   \n\n\u2191 O'Donoghue, S.I.; Goodsell, D.S.; Frangakis, A.S. et al. (2010). \"Visualization of macromolecular structures\". Nature Methods 7 (3 Suppl.): S42\u201355. doi:10.1038\/nmeth.1427. PMID 20195256.   \n\n\u2191 Hirst, J.D.; Glowacki, D.R.; Baaden, M. et al. (2014). \"Molecular simulations and visualization: Introduction and overview\". Faraday Discussions 169: 9\u201322. doi:10.1039\/c4fd90024c. PMID 25285906.   \n\n\u2191 Goddard, T.D., Huang, C.C.; Meng, E.C. et al. (2018). \"UCSF ChimeraX: Meeting modern challenges in visualization and analysis\". Protein Science 27 (1): 14\u201325. doi:10.1002\/pro.3235. PMC PMC5734306. PMID 28710774. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5734306 .   \n\n\u2191 F\u00e9rey, N.; Nelson, J.; Martin, C. et al. (2009). \"Multisensory VR interaction for protein-docking in the CoRSAIRe project\". Virtual Reality 13: 273. doi:10.1007\/s10055-009-0136-z.   \n\n\u2191 9.0 9.1 DeLano, W. (04 September 2000). \"The PyMOL Molecular Graphics System\". http:\/\/pymol.sourceforge.net\/overview\/index.htm .   \n\n\u2191 Humphrey, W.; Dalke, A.; Schulten, K. et al. (1996). \"VMD: Visual molecular dynamics\". Journal of Molecular Graphics 14 (1): 33\u20138. doi:10.1016\/0263-7855(96)00018-5.   \n\n\u2191 Lv, Z.; Tek, A.; Da Silva, F. et al. (2013). \"Game on, science - How video game technology may help biologists tackle visualization challenges\". PLoS One 8 (3): e57990. doi:10.1371\/journal.pone.0057990. PMC PMC3590297. PMID 23483961. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3590297 .   \n\n\u2191 Sowa, J.F. (1984). Conceptual structures: Information processing in mind and machine. Addison-Wesley Longman Publishing Co. ISBN 0201144727.   \n\n\u2191 Berners-Lee, T.; Hendler, J.; Lassila, O. (2001). \"The Semantic Web\". Scientific American 284: 28\u201337.   \n\n\u2191 Cyganiak, R.; Wood, D.; Lanthaler, M., ed. (25 February 2014). \"RDF 1.1 Concepts and Abstract Syntax\". World Wide Web Consortium. https:\/\/www.w3.org\/TR\/rdf11-concepts\/ .   \n\n\u2191 Brickley, D.; Guha, R.V., ed. (25 February 2014). \"RDF Schema 1.1\". World Wide Web Consortium. https:\/\/www.w3.org\/TR\/rdf-schema\/ .   \n\n\u2191 Motik, B.; Patel-Schneider, P.F.; Parsia, B., ed. (11 December 2012). \"OWL 2 Web Ontology Language\". World Wide Web Consortium. https:\/\/www.w3.org\/TR\/owl2-syntax\/ .   \n\n\u2191 Harris, S.; Seaborne, A., ed. (21 March 2013). \"SPARQL 1.1 Query Language\". World Wide Web Consortium. https:\/\/www.w3.org\/TR\/sparql11-query\/ .   \n\n\u2191 De Giacomo, G.; Lenzerini, M. (1996). \"TBox and ABox Reasoning in Expressive Description Logics\". Proceedings of the Fifth International Conference on Principles of Knowledge Representation and Reasoning: 316\u201327. ISBN 1558604219.   \n\n\u2191 Schulze-Kremer, S. (2002). \"Ontologies for molecular biology and bioinformatics\". In Silico Biology 2 (3): 179\u201393. PMID 12542404.   \n\n\u2191 Schuurman, N.; Leszcynski, A. (2008). \"Ontologies for bioinformatics\". Bioinformatics and Biology Insights 2: 187\u2014200. PMC PMC2735951. PMID 19812775. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2735951 .   \n\n\u2191 The Gene Ontology Consortium, Ashburner, M.; Ball, C.A. et al. (2000). \"Gene ontology: Tool for the unification of biology\". Nature Genetics 25 (1): 25\u20139. doi:10.1038\/75556. PMC PMC3037419. PMID 10802651. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3037419 .   \n\n\u2191 Rabattu, P.Y.; Mass\u00e9, B.; Ulliana, F. et al. (2015). \"My Corporis Fabrica Embryo: An ontology-based 3D spatio-temporal modeling of human embryo development\". Journal of Biomedical Semantics 6: 36. doi:10.1186\/s13326-015-0034-0. PMC PMC4582726. PMID 26413258. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4582726 .   \n\n\u2191 Smith, B.; Ashburner, M.; Rosse, C. et al. (2007). \"The OBO Foundry: Coordinated evolution of ontologies to support biomedical data integration\". Nature Biotechnology 25 (11): 1251\u20135. doi:10.1038\/nbt1346. PMC PMC2814061. PMID 17989687. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2814061 .   \n\n\u2191 Belleau, F.; Nolin, M.A;. Tourigny, N. et al. (2008). \"Bio2RDF: towards a mashup to build bioinformatics knowledge systems\". Journal of Biomedical Informatics 41 (5): 706\u201316. doi:10.1016\/j.jbi.2008.03.004. PMID 18472304.   \n\n\u2191 Hanwell, M.D.; Curtis, D.E.; Lonie, D.C. et al. (2012). \"Avogadro: An advanced semantic chemical editor, visualization, and analysis platform\". Journal of Cheminformatics 4 (1): 17. doi:10.1186\/1758-2946-4-17. PMC PMC3542060. PMID 22889332. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3542060 .   \n\n\u2191 Rysavy, S.J.; Bromley, D.; Daggett, V. (2014). \"DIVE: A Graph-Based Visual-Analytics Framework for Big Data\". IEEE Computer Graphics and Applications 34 (2): 26\u201337. doi:10.1109\/MCG.2014.27.   \n\n\u2191 Rzepa, H. (2012). \"Chemical Markup Language\". CMLC. http:\/\/www.xml-cml.org\/ .   \n\n\u2191 28.0 28.1 Huang, X.; Alleva, F.; Hsiao-Wuen, H. et al. (1993). \"The SPHINX-II speech recognition system: An overview\". Computer Speech & Language 7 (2): 137\u2013148. doi:10.1006\/csla.1993.1007.   \n\n\u2191 Genest, D.; Salvat, E. (1998). \"A platform allowing typed nested graphs: How CoGITo became CoGITaNT\". Proceedings from the 1998 International Conference on Conceptual Structures: 1154\u201361. doi:10.1007\/BFb0054912.   \n\n\u2191 Dennemont, Y. (2013). \"Une assistance \u00e0 l'interaction 3D en r\u00e9alit\u00e9 virtuelle par un raisonnement s\u00e9mantique et une conscience du contexte\". Allen Institute for Artificial Intelligence. https:\/\/www.semanticscholar.org\/paper\/Une-assistance-%C3%A0-l'interaction-3D-en-r%C3%A9alit%C3%A9-par-un-Dennemont\/254289782f5feb44e0a0db19ea2f7661578241a1 .   \n\n\u2191 Abraham, M.J.; Murtola, T.; Schulz, R. et al. (2015). \"GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers\". SoftwareX 1\u20132: 19\u201325. doi:10.1016\/j.softx.2015.06.001.   \n\n\u2191 Dror, R.O.; Dirks, R.M.; Grossman, J.P. et al. (2012). \"Biomolecular simulation: A computational microscope for molecular biology\". Annual Review of Biophysics 41: 429\u201352. doi:10.1146\/annurev-biophys-042910-155245. PMID 22577825.   \n\n\u2191 Annett, J. (2003). \"Hierarchical Task Analysis\". In Hollnagel, E.. Handbook of Cognitive Task Design. 1 (1st ed.). pp. 17\u201335. ISBN 9780805840032.   \n\n\u2191 Chandler, T.; Cordell, M.; Czaudema, T. et al. (2015). \"Immersive Analytics\". Proceedings of Big Data Visual Analytics 2015 1: 1\u20138. doi:10.1109\/BDVA.2015.7314296.   \n\n\u2191 Sommer, B.; Barnes, D.G.; Boyd, S. et al. (2017). \"3D-Stereoscopic Immersive Analytics Projects at Monash University and University of Konstanz\". Proceedings of Electronic Imaging, Stereoscopic Displays and Applications XXVIII: 179\u201387, 189. doi:10.2352\/ISSN.2470-1173.2017.5.SDA-109.   \n\n\u2191 Wiebusch, D.; Latoschik, M.E. (2015). \"Decoupling the entity-component-system pattern using semantic traits for reusable realtime interactive systems\". IEEE 8th Workshop on Software Engineering and Architectures for Realtime Interactive Systems: 25\u201332. doi:10.1109\/SEARIS.2015.7854098.   \n\n\u2191 Gutierrez, M.; Vexo, F.; Thalmann, D. (2005). \"Semantics-based representation of virtual environments\". International Journal of Computer Applications in Technology 23 (2\u20134): 229\u201338. doi:10.1504\/IJCAT.2005.006484.   \n\n\u2191 Doutreligne, S.; Cragnolimi, T.; Pasquali, S. et al. (2014). \"UnityMol: Interactive scientific visualization for integrative biology\". IEEE 4th Symposium on Large Data Analysis and Visualization: 109\u201310. doi:10.1109\/LDAV.2014.7013213.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. Some grammar and punctuation was cleaned up to improve readability. In some cases important information was missing from the references, and that information was added. The original references after 27 were slightly out of order in the original; due to the way this wiki works, references are listed in the order they appear. The original shows a reference [33] for Perilla et al., but no inline citation for 33 exists anywhere in the text; it has been omitted for this version. Footnotes were turned into inline URLs. Figure 5 and 9 is shown in the original, but no reference was made to them in the text; a presumption was made where to put the inline reference for each figure for this version. Nothing else was changed in accordance with the NoDerivatives portion of the license.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\">https:\/\/www.limswiki.org\/index.php\/Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles (with rendered math)LIMSwiki journal articles on data analysisLIMSwiki journal articles on data visualization\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 6 March 2019, at 04:32.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 167 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","6ee24d5f7bd1af8e24033922d437ffd0_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Semantics for an integrative and immersive pipeline combining visualization and analysis of molecular data<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p>The advances made in recent years in the field of structural biology significantly increased the throughput and complexity of data that scientists have to deal with. Combining and <a href=\"https:\/\/www.limswiki.org\/index.php\/Data_analysis\" title=\"Data analysis\" class=\"wiki-link\" data-key=\"545c95e40ca67c9e63cd0a16042a5bd1\">analyzing<\/a> such heterogeneous amounts of data became a crucial time consumer in the daily tasks of scientists. However, only few efforts have been made to offer scientists an alternative to the standard compartmentalized tools they use to explore their data and that involve a regular back and forth between them. We propose here an integrated pipeline especially designed for immersive environments, promoting direct interactions on semantically linked 2D and 3D heterogeneous data, displayed in a common working space. The creation of a semantic definition describing the content and the context of a molecular scene leads to the creation of an intelligent system where data are (1) combined through pre-existing or inferred links present in our hierarchical definition of the concepts, (2) enriched with suitable and adaptive analyses proposed to the user with respect to the current task and (3) interactively presented in a unique working environment to be explored.\n<\/p><p><b>Keywords<\/b>: virtual reality, semantics for interaction, structural biology\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Recent years have seen a profound change in the way structural biologists interact with their data. New techniques that try to capture the structure and dynamics of bio-molecules have reached an extraordinary high throughput of structural data.<sup id=\"rdp-ebb-cite_ref-ZhaoMature13_1-0\" class=\"reference\"><a href=\"#cite_note-ZhaoMature13-1\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ZhangStructure17_2-0\" class=\"reference\"><a href=\"#cite_note-ZhangStructure17-2\">[2]<\/a><\/sup> Scientists must try to combine and analyze data flows from different sources to draw their hypotheses and conclusions. However, despite this increasing complexity, they tend to rely mainly on compartmentalized tools to only <a href=\"https:\/\/www.limswiki.org\/index.php\/Data_visualization\" title=\"Data visualization\" class=\"wiki-link\" data-key=\"4a3b86cba74bc7bb7471aa3fc2fcccc3\">visualize<\/a> or analyze limited portions of their data. This situation leads to a constant back and forth between the different tools and their associated environments. Consequently, a significant amount of time is dedicated to the transformation of data to account for the heterogeneous input data types each tool is allowing.\n<\/p><p>The need for platforms capable of handling the intricate data flow is then strong. In structural biology, the numerical simulation process is now able to deal with very large and heterogeneous molecular structures. These molecular assemblies may be composed of several million particles and consist of many different types of molecules, including a biologically realistic environment. This overall complexity raises the need to go beyond common visualization solutions and move towards integrated exploration systems where visualization and analysis can be merged.\n<\/p><p>Immersive environments play an important role in this context, providing both a better comprehension of the three-dimensional structure of molecules, and offering new interaction techniques to reduce the number of data manipulations executed by the experts (see Figure 1). A few studies took advantage of recent developments in virtual reality to enhance some structural biology tasks. Visualization is the first and most obvious task that was improved through new adaptive stereoscopic screens and immersive environments, plunging experts into the very center of their molecules.<sup id=\"rdp-ebb-cite_ref-vanDamImmersive00_3-0\" class=\"reference\"><a href=\"#cite_note-vanDamImmersive00-3\">[3]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-StoneImmersive10_4-0\" class=\"reference\"><a href=\"#cite_note-StoneImmersive10-4\">[4]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ODonoghueVisual10_5-0\" class=\"reference\"><a href=\"#cite_note-ODonoghueVisual10-5\">[5]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HirstMolec14_6-0\" class=\"reference\"><a href=\"#cite_note-HirstMolec14-6\">[6]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-GoddardUCSF18_7-0\" class=\"reference\"><a href=\"#cite_note-GoddardUCSF18-7\">[7]<\/a><\/sup> Structure manipulations during specific docking experiments have been improved thanks to the use of haptic devices and audio feedback to drive a simulation.<sup id=\"rdp-ebb-cite_ref-F.C3.A9reyMulti09_8-0\" class=\"reference\"><a href=\"#cite_note-F.C3.A9reyMulti09-8\">[8]<\/a><\/sup> However, if 3D objects can rather easily be represented and manipulated in such environments, the integration of analytical values (energies, distance to reference, etc.)\u20142D by nature\u2014leads to a certain complexity and is not a solved problem yet. As a consequence, no specific development has been made to set up an immersive platform where the expert could manipulate data coming from different sources to accelerate and improve the development of new hypotheses.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Trellet_JOfIntegBioinfo2018_15-2.jpg\" class=\"image wiki-link\" data-key=\"d598654d769e4b21abb4e43d9562b96d\"><img alt=\"Fig1 Trellet JOfIntegBioinfo2018 15-2.jpg\" src=\"https:\/\/www.limswiki.org\/images\/4\/41\/Fig1_Trellet_JOfIntegBioinfo2018_15-2.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Immersive, augmented reality, and screen wall environments used for molecular visualization: (A) EVE platform, a multi-user CAVE-system composed of 4 screens (LIMSI-CNRS\/VENISE team, Orsay), (B) Microsoft Hololens and (C) screen wall of 8.3 m<sup>2<\/sup> composed of 12 screens at full HD resolution with 120 Hz refresh rate in stereoscopy (IBPC-CNRS\/LBT, Paris).<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>This lack of development can also be partly explained by the significant differences between the data handled by the 3D visualization software packages and the analytical tools. On one side, 3D visualization solutions such as PyMol<sup id=\"rdp-ebb-cite_ref-DeLanoThePy00_9-0\" class=\"reference\"><a href=\"#cite_note-DeLanoThePy00-9\">[9]<\/a><\/sup>, VMD<sup id=\"rdp-ebb-cite_ref-HumphreyVMD96_10-0\" class=\"reference\"><a href=\"#cite_note-HumphreyVMD96-10\">[10]<\/a><\/sup>, and UnityMol<sup id=\"rdp-ebb-cite_ref-LvGame13_11-0\" class=\"reference\"><a href=\"#cite_note-LvGame13-11\">[11]<\/a><\/sup> explore and manipulate 3D structure coordinates composing the molecular complex that will be displayed. The scene seen by the user is composed of 3D objects reporting the overall shape of a particular molecule and its environment at a particular state. This scene is static if we are interested in only one state of a given molecule, but is often dynamic when a whole simulated trajectory of conformational changes over time is considered. Analysis tools, on the other side, handle raw numbers, vectors, and matrices in various formats and dimensions, from various input sources depending on the analysis pipeline used to generate them. Their outputs are graphical representations of trends or comparisons between parameters or properties in 1 to <i>N<\/i> dimensions formatted in a way that experts can quickly understand and use such <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> to guide their hypotheses.\n<\/p><p>Some of the aforementioned software do provide tools to gather analyses as static plots aside the 3D visualization space. Interactivity is limited and flexibility mainly depends on the user capability to create and tune scripts to improve the information displayed. We believe that a major improvement of tools available today would bring into play a scenario where the 3D visualization of a molecular event is coupled to monitoring the evolution of analytical properties, e.g., sub-elements such as distance variations and progression of simulation parameters, into a single working environment. The expert would be able to see any action performed in one space (either 3D visualization or analysis) with a coherent graphical impact on the second space to filter or highlight the parameter or sub-ensemble of objects targeted by the expert.\n<\/p><p>We have developed a pipeline that aims to bring within the same immersive environment the visualization and analysis of heterogeneous data coming from molecular simulations. This pipeline addresses the lack of integrated tools efficiently combining the stereoscopic visualization of 3D objects and the representation\/interaction with their associated physicochemical and geometric properties (both 2D and 3D) generated by standard analysis tools and that are either combined to the 3D objects (shape, colour, etc.) or displayed on a dedicated space integrated in the working environment (second mobile screen, 2D integration in the virtual scene, etc.).\n<\/p><p>In this pipeline, we systematically combine structural and analytical data by using a semantic definition of the content (scientific data) and the context (immersive environments and interfaces). Such a high-level definition can be translated into an ontology from which instances or individuals of ontological concepts can then be created from real data to build a database of linked data for a defined phenomenon. On top of the data collection, an extensive list of possible interactions and actions defined in the ontology and based on the provided data can be computed and presented to the user.\n<\/p><p>The creation of a semantic definition describing the content and the context of a molecular scene in immersion leads to the creation of an intelligent system where data and 3D molecular representations are (1) combined through pre-existing or inferred links present in our hierarchical definition of the concepts, (2) enriched with suitable and adaptive analyses proposed to the user with respect to the current task, and (3) manipulated by direct interaction allowing to both perform 3D visualization and exploration as well as analysis in a unique immersive environment.\n<\/p><p>Our method narrows the need for complex interactions by considering what actions the user can perform with the data he is currently manipulating and the means of interaction his immersive environment provides.\n<\/p><p>We will highlight our developments and the first outcomes of our work through three main sections: the first section attempts to provide a complete background of the usage of semantics in the fields of VR\/AR systems and structural biology. In the second section we will describe and justify our implementation choices and how we linked the different technologies highlighted in the previous section. Finally, in a third section, we will show several applications of our platform and its capabilities to address the issues raised previously.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Related_works\">Related works<\/span><\/h2>\n<p>We present here the state of the art in the two fields related to this paper: the semantic formalism chosen to represent the data and how semantic representations are applied in <a href=\"https:\/\/www.limswiki.org\/index.php\/Bioinformatics\" title=\"Bioinformatics\" class=\"wiki-link\" data-key=\"8f506695fdbb26e3f314da308f8c053b\">bioinformatics<\/a>.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Semantic_modeling_formalism_and_semantic_web\">Semantic modeling formalism and semantic web<\/span><\/h3>\n<p>From classical logic to description logic, from which was derived the \"conceptual graph\" representation introduced by Sowa<sup id=\"rdp-ebb-cite_ref-SowaConcept84_12-0\" class=\"reference\"><a href=\"#cite_note-SowaConcept84-12\">[12]<\/a><\/sup>, many semantic formalisms were used to embed knowledge into applications in order to query and perform reasoning about them.\n<\/p><p>The conceptual graph formalism represents concepts and properties such as connected graphs and allows complex operations on them. However, it quickly reaches some limitations in terms of performances and implementation flexibility. Classical logic is another well-known formalism but is not broadly used in biology and suffers a lack of implementation tools and libraries. A semantic network limits itself to the representation of concepts and their relations through directed or undirected graphs. It is lacking the possibility to reason over the concepts and their links, reasoning that our intended platform needs. The different requirements of our platform, coupled with our aim to make it as generic as possible, made us choose to use description logics as a formalism for knowledge representation and more precisely the semantic web as underlying standard for the creation of our ontology and the associated knowledge base.\n<\/p><p>The semantic web has been created by the World Wide Web Consortium under the lead of Tim Berners-Lee, with the aim to share semantic data on the web.<sup id=\"rdp-ebb-cite_ref-Berners-LeeTheSemantic01_13-0\" class=\"reference\"><a href=\"#cite_note-Berners-LeeTheSemantic01-13\">[13]<\/a><\/sup> It is broadly used by the biggest web companies to uniformly store and share data. It belongs to the family of description logics that use the notions of concepts, roles, and individuals. The concepts are represented by the sub-ensemble of elements in a specific universe, the roles are the links between the elements, and the individuals are the elements of the universe. Each layer of the semantic web (ontology, experimental data, querying process, etc.) has been associated to a language or a format.\n<\/p><p>The following four standards create the core of the semantic web and act as the layers evoked previously: the Resource Description Framework (RDF)<sup id=\"rdp-ebb-cite_ref-CyganiakRDF14_14-0\" class=\"reference\"><a href=\"#cite_note-CyganiakRDF14-14\">[14]<\/a><\/sup>, the Resource Description Framework Schema (RDFS)<sup id=\"rdp-ebb-cite_ref-BrickleyRDF14_15-0\" class=\"reference\"><a href=\"#cite_note-BrickleyRDF14-15\">[15]<\/a><\/sup>, the Web Ontology Language (OWL)<sup id=\"rdp-ebb-cite_ref-MotikOWL12_16-0\" class=\"reference\"><a href=\"#cite_note-MotikOWL12-16\">[16]<\/a><\/sup>, and SPARQL.<sup id=\"rdp-ebb-cite_ref-HarrisSPARQL13_17-0\" class=\"reference\"><a href=\"#cite_note-HarrisSPARQL13-17\">[17]<\/a><\/sup> Whereas the first three standards enable semantic descriptions of data in the form of ontologies and knowledge bases, the last standard enables queries to ontologies and knowledge bases (see Figure 2).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Trellet_JOfIntegBioinfo2018_15-2.jpg\" class=\"image wiki-link\" data-key=\"ce960339b6b97d59e0713622567ccdef\"><img alt=\"Fig2 Trellet JOfIntegBioinfo2018 15-2.jpg\" src=\"https:\/\/www.limswiki.org\/images\/d\/df\/Fig2_Trellet_JOfIntegBioinfo2018_15-2.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> Web semantics and its different layers. This figure describes the main format classically used for each layer: RDF, RDFS, OWL, SPARQL, etc. Source : <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.w3.org\/2001\/sw\/\" data-key=\"ed75c99c94673dc7016e09d26f2cd78b\">http:\/\/www.w3.org\/2001\/sw\/<\/a><\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>RDF is a data model, which allows the creation of statements to describe resources. Each statement is a triple comprised of: a subject (resource described by the statement), a predicate (property of the subject), and an object (literal value or resource identified by a URI, which describes the subject). An example of a triple is: <code><#Molecule, #has-charge, -1><\/code>\n<\/p><p>RDFS and OWL are semantic web standards that extend the expressiveness of RDF by providing additional concepts. RDFS provides hierarchies of classes and properties as well as property domains and ranges. OWL, built upon RDF and RDFS, provides symmetry, transitivity, equivalence, and restrictions of properties as well as operations on sets of resources. In turn, SPARQL is a query language for ontologies and knowledge bases built using RDF, RDFS, and OWL. Conceptually, in terms of possible operations on data, SPARQL is similar to SQL, as it enables data selection, insertion, update, and removal.\n<\/p><p>In the semantic web, two types of statements are distinguished. Terminological statements (T-Box) specify conceptualization, classes and properties of resources<sup id=\"rdp-ebb-cite_ref-GiacomoTBox96_18-0\" class=\"reference\"><a href=\"#cite_note-GiacomoTBox96-18\">[18]<\/a><\/sup>, without describing any particular resources. Assertion statements (A-Box) specify utilization, particular resources (also called individuals or objects), which are instances of classes described by properties with particular values assigned. For example, a T-Box specifies different classes of molecules (different chemical compounds) and properties that can be used to describe them (e.g., charge and the number of neutrons), while an A-Box specifies particular molecules (instances of the classes) with given charges. In this paper, an ontology is a T-Box, while a knowledge base is the union of a T-Box and an A-Box. Ontologies and knowledge bases constitute the foundation of the semantic web across diverse domains and applications. In particular, ontologies can specify schemes of molecular descriptions, while knowledge bases\u2014particular descriptions (instances of such schemes) with individual objects\u2014are used for analysis and visualization. Due to the use of the standards encoded in <a href=\"https:\/\/www.limswiki.org\/index.php\/XML\" title=\"XML\" class=\"mw-redirect wiki-link\" data-key=\"fda82e3b4db7e4b2856b016933a1d2d1\">XML<\/a> or equivalent formats, ontologies and knowledge bases are interpretable to software, making them intelligible to users. Moreover, since RDFS and OWL are built upon description logics, which are formal knowledge representation techniques, ontologies and knowledge bases can be subject to reasoning, which is a process of inferring implicit (tacit) properties of resources (which have not been explicitly specified by the author) on the basis of their explicitly specified properties.\n<\/p><p>For instance, from the following triples explicitly specified by the content author:\n<\/p><p><code><my:is-composed-of> <my:is-a> <owl:TransitiveProperty><\/code>\n<\/p><p><code><my:Protein> <my:is-composed-of> <my:Amino-acid><\/code>\n<\/p><p><code><my:Amino-acid> <my:is-composed-of> <my:Atom><\/code>\n<\/p><p>the following statement can be inferred by software:\n<\/p><p><code><my:Protein> <my:is-composed-of> <my:Atom><\/code>\n<\/p><p>Here, thanks to the definition of property \u201cis-composed-of\u201d as transitive, we can infer that atoms, that compose amino acids, compose as well a protein since amino acids compose proteins. The second statement does not need to be added to the ontology since automatically inferred. This reduces significantly the number of statements to store in the database and potentially allows for more complex inferences.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Ontologies_in_bioinformatics\">Ontologies in bioinformatics<\/span><\/h3>\n<p>On the application side, the use of ontologies in order to standardize knowledge in scientific fields underwent an important and spontaneous growth at the end of the 1990s.<sup id=\"rdp-ebb-cite_ref-Schulze-KremerOnto02_19-0\" class=\"reference\"><a href=\"#cite_note-Schulze-KremerOnto02-19\">[19]<\/a><\/sup> Bioinformatics, tightly anchored in structural biology, has used ontologies for a long time. The most significant example is the fast-growing <a href=\"https:\/\/www.limswiki.org\/index.php\/Genomics\" title=\"Genomics\" class=\"wiki-link\" data-key=\"96a82dabf51cf9510dd00c5a03396c44\">genomic field<\/a>, in which it became impossible to handle data flow without a proper and standardized organization of the data.<sup id=\"rdp-ebb-cite_ref-SchuurmanOnto08_20-0\" class=\"reference\"><a href=\"#cite_note-SchuurmanOnto08-20\">[20]<\/a><\/sup> The tool Gene Ontology<sup id=\"rdp-ebb-cite_ref-GOCGene00_21-0\" class=\"reference\"><a href=\"#cite_note-GOCGene00-21\">[21]<\/a><\/sup> regroups genomic data into a uniform format and a knowledge base. Currently, it is one of the most referred to ontologies in the literature. Rabattu <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-RabattuMyCorporis15_22-0\" class=\"reference\"><a href=\"#cite_note-RabattuMyCorporis15-22\">[22]<\/a><\/sup> propose an approach to spatio-temporal reasoning on semantic descriptions of an evolving human embryo. Several biological databases or organizations such as <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.uniprot.org\/uniprot\/\" data-key=\"290b6a8a2b000bb4c8d888178a78e57f\">UniProtKB<\/a> and the Open Biomedical Ontologies<sup id=\"rdp-ebb-cite_ref-SmithTheOBO07_23-0\" class=\"reference\"><a href=\"#cite_note-SmithTheOBO07-23\">[23]<\/a><\/sup> provide ways to access data or ontologies under RDF or OWL format to allow their use in expert tools or specific pipelines. One can also note the open-source project Bio2RDF<sup id=\"rdp-ebb-cite_ref-BelleauBio2RDF_24-0\" class=\"reference\"><a href=\"#cite_note-BelleauBio2RDF-24\">[24]<\/a><\/sup> that aims to build and provide the largest network of \"Linked Data for the Life Sciences\" using semantic web approaches.\n<\/p><p>Only a few expert software packages based on ontologies have been developed for structural biology. Avogadro<sup id=\"rdp-ebb-cite_ref-HanwellAvo12_25-0\" class=\"reference\"><a href=\"#cite_note-HanwellAvo12-25\">[25]<\/a><\/sup> and DIVE<sup id=\"rdp-ebb-cite_ref-RysavyDIVE14_26-0\" class=\"reference\"><a href=\"#cite_note-RysavyDIVE14-26\">[26]<\/a><\/sup> appear as exceptions, implementing, in different ways, a semantic description of data that can be manipulated in these environments. Avogadro uses the Chemical Markup Language (CML)<sup id=\"rdp-ebb-cite_ref-RzepaCML12_27-0\" class=\"reference\"><a href=\"#cite_note-RzepaCML12-27\">[27]<\/a><\/sup> as the format for describing data semantics, and it adds a semantic description layer on top of the data being described. However, the tool leverages neither ontologies nor other knowledge representation formalisms, thus it does not permit reasoning on the described data.\n<\/p><p>DIVE partially creates ontologies and datasets derived from the input data upon loading. Pre-formatted input in a row\/column representation are converted into a SQL-like structure where rows are individuals and columns properties. This data representation conforms to a common data model that the software libraries use. Therefore, creation of links between data values and concepts are possible, and different DIVE components for data presentation (analyses, 3D visualization, etc.) as well as links and relationships between dataset elements can be queried. In addition, DIVE includes a powerful and generic ontology creator directly depending on the type of the input data. However, reasoning on ontologies in DIVE is limited to inheritance between classes. Consequently, only a few ontological relationships are available: is-a, contains, is-part-of, and bound-by. There is no notion of cardinality or logical operators to define the concept classes. Then, it is not possible, for instance, to force the presence of a property, or to impose that only a fixed number of values are associated to a specific property (e.g., a molecule must have at least one atom, an Alanine side-chain has a minimum of three atoms and a maximum of four atoms, etc.). These limitations render the DIVE environment insufficient to solve the problem stated in this paper.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Using_a_semantic_representation_to_efficiently_store.2C_query.2C_and_link_heterogeneous_structural_biology_data\">Using a semantic representation to efficiently store, query, and link heterogeneous structural biology data<\/span><\/h2>\n<p>Several important choices have been made to integrate the different technologies required for the establishment of a platform that would allow a proper 3D immersion of users together with an accurate and intelligent way to interact with their data. Our platform heavily relies on the ontology\/knowledge base couple. The way to represent and access the data present in the databases is of a crucial importance, and this point led us to ask ourselves the question of the most appropriate formalism for the data representation.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Knowledge_formalism_choice\">Knowledge formalism choice<\/span><\/h3>\n<p>The formalism of knowledge representation used in our approach must address the following three rules to properly fit our platform needs:\n<\/p>\n<ol><li> Hierarchical data representation via concepts and properties<\/li>\n<li> Advanced reasoning possibility in order to extend the ontology or the dataset ruled by the ontology<\/li>\n<li> Efficient query time on the data to stay within interaction time<\/li><\/ol>\n<p>We mentioned previously that several formalisms exist to create ontologies and define databases. A quick comparison of these formalisms, complementary to their introduction in the previous section, can be found in Table 1.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"6\"><b>Table 1.<\/b> Comparison of different knowledge representation formalisms with respect to key criteria\n<\/td><\/tr>\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Formalism\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Domain description\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Reasoning on knowledge\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Big data management\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Efficient\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Implementation flexibility\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Conceptual graphs\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">-\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">-\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Semantic networks\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">-\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">-\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Classical logics\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">-\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Description logics\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">X\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">-\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Our first implementation of a semantic representation of knowledge in molecular biology was applied through conceptual graphs (CG) within Cogitant\u2019s software.<sup id=\"rdp-ebb-cite_ref-HuangTheSPHINX93_28-0\" class=\"reference\"><a href=\"#cite_note-HuangTheSPHINX93-28\">[28]<\/a><\/sup> The use of CGs through the Cogitant API quickly proved to be incompatible with the constraints of the interactive context. This limitation had already been highlighted by the work of Yannick Dennemont<sup id=\"rdp-ebb-cite_ref-GenestAPlat98_29-0\" class=\"reference\"><a href=\"#cite_note-GenestAPlat98-29\">[29]<\/a><\/sup> with the Prolog CG API, limitations confirmed by our own experience with the Cogitant library in C++. The need for high performance imposed by the interactive context has led us to the path of description logic and semantic web for the representation of knowledge and the efficient extraction of information within a massive fact base to support Visual Analytics functionalities in molecular biology.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Ontology_for_modeling_of_structural_biology_concepts\">Ontology for modeling of structural biology concepts<\/span><\/h3>\n<p>An OWL-based ontology was implemented as the core of the platform, thereby creating a broad description of concepts an expert has to interact with during his\/her visualization and analysis activities. We previously mentioned that several bio-ontologies already exist. We extended one of them, a bio-ontology <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/bioportal.bioontology.org\/ontologies\/AMINO-ACID\" data-key=\"42bf682d4ac71763f75730831215b72f\">describing amino acids<\/a> and their biophysical and geometrical properties to define the molecular objects and principles manipulated in structural biology. Each component structuring molecular complexes and each associated property coming from various common bio-informatics tools have been systematically defined and added to this ontology. However, since needs may vary, we have designed this ontology such that it could easily be updated and enriched with new concepts. A tiny subpart of our ontology is illustrated in Figure 3. Our ontology has been designed around five categories, addressing five different parts of our platform:\n<\/p>\n<ul><li> Biomolecular knowledge \u2013 Field-related concepts and objects in structural biology<\/li>\n<li> 3D structure representation \u2013 Concepts related to the representation and visualization of 3D molecular complexes<\/li>\n<li> 2D data representation \u2013 Concepts related to the representation of numerical analyses and their results<\/li>\n<li> 3D interactions \u2013 Concepts related to the interactions in 3D environments<\/li>\n<li> 2D interactions \u2013 Concepts related to the interactions in 2D environments<\/li><\/ul>\n<p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Trellet_JOfIntegBioinfo2018_15-2.jpg\" class=\"image wiki-link\" data-key=\"7ee91955376e5744292ee5e5038b0f3d\"><img alt=\"Fig3 Trellet JOfIntegBioinfo2018 15-2.jpg\" src=\"https:\/\/www.limswiki.org\/images\/a\/af\/Fig3_Trellet_JOfIntegBioinfo2018_15-2.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 3.<\/b> A part of our structural biology ontology used in our application<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The separation of the categories does not induce the absence of relationships between them. For instance, the \u201cAtom\u201d concept belongs to the \"Biomolecular\" knowledge category but is directly linked to the \u201cSphere\u201d concept from 3D structure representation. The whole network of connections will then permit reasoning on the ontology in order to support the advanced interactivity level required in our platform.\n<\/p><p>Concepts and properties among the 3D structure representation and 2D data representation categories gather the graphical elements that allow for the representation of the \"Biomolecular\" knowledge category. Shape, colors, but also graph types are notions defined in these two categories. It is worth noting that analytical concepts are defined by graphical or abstract elements that play a role in the creation and visualization of an analytical result. However, we voluntarily chose not to define the different calculations and analyses related to molecular simulation data because of their high complexity and their heterogeneous nature varying significantly between the range of available specialized tools. This choice does not imply that the results of such analyses will not be used among the platform, merely that it is not relevant to include their definition in the ontology. The values of their results are nevertheless defined in the ontology under the form of properties of the individuals they bring to play.\n<\/p><p>In addition to the biomolecular concepts and representations previously cited, we also defined every concept around the interaction between the user and the data they will directly or indirectly manipulate. These interactions include commands proposed by most of the common visualization software packages and analysis tools.\n<\/p><p>Our full ontology is <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/raw.githubusercontent.com\/mtrellet\/PyMol_Interactive_Plotting\/master\/data\/visual_analytics_owl.ttl\" data-key=\"f60e25c2d0c8907c2acefc37c5d3a710\">publicly available online<\/a>.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Storing_molecular_data_linked_by_a_structural_biology_ontology\">Storing molecular data linked by a structural biology ontology<\/span><\/h3>\n<p>Once we set up our ontology, it was possible to feed the database by adding biological information gathered by the expert. The new information has to fit the vocabulary and classification defined by the rules present in the ontology in order to be adequately stored in the database. This combination of ontology and knowledge base will form the RDF database (as illustrated later in this article in Figure 6).\n<\/p><p>The description of a molecular system is constructed from the analysis of any biological information that can be described by a character chain or a value and that corresponds to a concept or property identified in the ontology. Each information will be exhaustively gathered in the RDF database as triples. Within the scope of our study, we focused on numerical molecular simulations. These simulations output time series of static snapshots of the molecular system at a regular time step. The Hamiltonian of the simulated model will drive the system towards specific states that experts try to decipher in order to understand underlying molecular mechanisms. The whole simulation creates a trajectory where each state, at a precise time, is associated to a snapshot. Our ontology defines a snapshot by the \"model\" concept. A model gathers all the atom coordinates of the molecular system at a defined time step. In order to distinguish the different components of a system, these components are identified by \"chain,\" another concept of our ontology. Each chain in the system is composed of a sequence of \"residues\" (also known as amino-acids in proteins). The different inference rules present in the ontology save us to specify all the links between the different hierarchical components of a specific model explicitly. As a result, a residue that belongs to a specific chain will be automatically associated to the corresponding model where the chain appears. Similarly, group of atoms, the smallest entities of a molecular structure at our scale, constitute residues and are then directly linked to chains and models.\n<\/p><p>Every geometrical property (position, angle, distance, etc.), physicochemical property (solvent accessibility, partial charge, bond, etc.), or analytical property (interaction energy, RMSD, temperature, etc.) is then integrated in the database and associated to individuals created from 3D structures (Model\/Chain\/Residue\/Atom) for each step of the simulation. As a reminder, any individual is an instance of concepts defined in the ontology. Individuals and their properties form the population of the molecular database.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Using_semantic_queries_to_support_direct_interactions_for_a_new_generation_of_molecular_visualization_applications\">Using semantic queries to support direct interactions for a new generation of molecular visualization applications<\/span><\/h2>\n<p>Once all data has been integrated in the RDF database, it is necessary to set up an interrogation system able to retrieve the data for visualization and processing following interaction events in the working space. Our implementation of the query system mainly relies on the usage of SPARQL, as introduced before, and provides several ways to address the different needs of our platform.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"From_vocal_keywords_to_application_command\">From vocal keywords to application command<\/span><\/h3>\n<p>The richness and flexibility of SPARQL queries allowed us to design a keyword to command interpretation engine that aims to transform a list of keywords into a comprehensive application command triggering an action in the working space.\n<\/p><p>One of the most-widely used interactive techniques in immersive environments is the vocal command. Based on a vocal recognition process, it consists in translating a sentence or a group of words said by the user into an application command. Vocal commands have the strong advantage that they can be associated with gestures to express complex multimodal commands.\n<\/p><p>Most of the actions identified in our platform involve a structural group designated by the expert. These structural groups can be characterized by identifiers having a biological meaning (for example residue ids are, by convention, numbered from one extremity of the chain to the other), unique identifiers in the RDF database, or via their properties. The interpretation of commands vocalized by the expert with natural language using a specific field-related vocabulary requires a representation carrying the complexity of the knowledge and linking the objects targeted by the user to the virtual objects involved in the interaction.\n<\/p><p>For this purpose, we set up a process that takes as input a vocal command of the user and translates it into an application command for the operating system. This procedure can be divided in three main parts:\n<\/p>\n<ol><li> Recognition of keywords from a vocal command<\/li>\n<li> Keyword classification into a decomposed command structure<\/li>\n<li> Creation of the final and operational command<\/li><\/ol>\n<p>Our conceptualization effort and the use of the ontology mainly focused on the second part. Parts one and three are more implementation oriented and will not be deeply described.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Keyword_recognition\">Keyword recognition<\/span><\/h4>\n<p>We are using the keyword spotting capability of <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/cmusphinx.github.io\/\" data-key=\"91b0dad9741cfb112a433ac793932894\">Sphinx<\/a><sup id=\"rdp-ebb-cite_ref-DennemontUneAss13_30-0\" class=\"reference\"><a href=\"#cite_note-DennemontUneAss13-30\">[30]<\/a><\/sup>, a vocal recognition toolkit, to recognize keywords. Based on a dictionary created from the ontology list of concepts, it aims to detect any word said by the user that would match a word present in the dictionary.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Keyword_classification\">Keyword classification<\/span><\/h4>\n<p>Each keyword recognized in the previous step is assigned to a category. This classification is based on our ontology splitting, which identifies five categories of words that can be found in a vocal command, semantically modeled as:\n<\/p>\n<ul><li> Action<\/li>\n<li> Component<\/li>\n<li> Identifier<\/li>\n<li> Property<\/li>\n<li> Representation<\/li><\/ul>\n<p>This classification is achieved through successive SPARQL queries to the ontology. Action, Component, Property, and Representation categories have their own concepts and can be identified by a unique word (\u201cHide,\u201d \u201cChain,\u201d \u201cCharged,\u201d \u201cSphere,\u201d etc.). At the opposite, the Identifier category is linked to a concept instance from the Component category. A biological identifier is very likely to be redundant because of the repetition of the molecular system at each time step. Therefore it is mandatory to pair an identifier with a component in the keywords in order to validate its presence. Without component, any identifier is withdrawn from the list. If the identifier and the associated component exist in the database, the couple is validated.\n<\/p><p>SPARQL commands use the ASK operator to define whether a keyword belongs to a category or not. This operator takes one or several triples and returns a boolean that reflects whether the ensemble of triples is true or not with respect to the database. Some examples of queries can be found below:\n<\/p><p><code>ASK {my:Hide rdfs:subClassOf my:Action}<\/code>\n<\/p><p><code>ASK {my:Alanine rdfs:subClassOf my:Biological_component}<\/code>\n<\/p><p><code>ASK {my:Cartoon rdfs:subClassOf my:Representation}<\/code>\n<\/p><p><code>ASK {my:Aliphatic rdfs:subClassOf my:Property}<\/code>\n<\/p><p>Reasoning and inference rules are automatically used in SPARQL queries. For instance, the following query:\n<\/p><p><code>ASK {my:Alanine rdfs:subClassOf my:Biological_component}<\/code>\n<\/p><p>will output true despite the absence of an explicit direct link between the two concepts (Alanine and Biological_component) since AminoAcid, Residue and Molecule are located between the two concepts (see Figure 4).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Trellet_JOfIntegBioinfo2018_15-2.jpg\" class=\"image wiki-link\" data-key=\"ebaf4352eda561d02100fe0852b2589f\"><img alt=\"Fig4 Trellet JOfIntegBioinfo2018 15-2.jpg\" src=\"https:\/\/www.limswiki.org\/images\/b\/bb\/Fig4_Trellet_JOfIntegBioinfo2018_15-2.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 4.<\/b> Extract from our OWL ontology for the Alanine concept<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h4><span class=\"mw-headline\" id=\"Command_creation\">Command creation<\/span><\/h4>\n<p>Once each keyword is validated and associated to a category, e.g., identified as a concept of the database (or as an individual for identifiers) and eventually grouped with another keyword, it forms a syntactic group. Each syntactic group carries an information corresponding to a specific part of the application command.\n<\/p><p>In our platform, a vocal command is composed by a succession of syntactic groups linked between them to create an action query to the immersive platform. It is possible to describe the type of command that was defined in the following manner:\n<\/p><p><tt>action\u2003[parameter]<sup>+<\/sup>,\u2003(\u2003structural_group\u2003[identifier]<sup>+<\/sup>\u2003)<sup>+<\/sup><\/tt>\n<\/p><p>Syntactic groups between [] are optional, whereas others are mandatory. The + indicates the possibility to have 0, 1, or several occurrences of the syntactic group. Finally, () indicates a bloc of syntactic groups. This command architecture is present in our ontology under the form of pre-required concepts associated to the action concepts. For instance, the action concept \"Color\" requires a property of \"Colors\" type and a structural component to work with. These elements of information are then stored in the ontology, rendering them automatically checkable by the engine to detect whether all requirements are fulfilled for a specific action. This feature simplifies the definition of other actions in the ontology as the changes that have to be applied to the engine are minimal, typically either no or minor changes. The checking process will stay the same as long as the action is well-defined within the ontology.\n<\/p><p>At the same level as for an action, a structural group is always mandatory to trigger a command. The different ways to obtain a structural sub-ensemble are:\n<\/p>\n<ol><li> Component only: every individual that belongs to the concept will be taken into account<\/li>\n<li> Combination of a component and an ensemble of identifiers: coherency checking between component and identifiers<\/li>\n<li> Property only: every individual that possesses the property will be taken into account<\/li>\n<li> Combination of a component and a property: coherency checking between component and property<\/li><\/ol>\n<p>The structural group always refers to a group of individuals in order to disambiguate the results between the commands. This disambiguation implies that final commands are more complex. The hierarchical classification between structural components (Model\/Chain\/Residue\/Atom) has a significant impact on the results of a given command. Indeed, the nature of structural components targeted by an action will be compared to the nature of the structural components currently studied. Depending on whether the command individual will be of a higher or lower hierarchical order, the command might trigger an action either on a subpart of the displayed scene (for lower classified individuals) or as a scene composition changer (for higher or equal classified individuals). For instance, if only two models are studied when a vocal command is transmitted, putative amino acids individually targeted by an action will be the ones that belong to the two displayed models. If the individuals targeted by the command action would have been models, different from the displayed ones, an update of the displayed molecular complexes would have occurred first.\n<\/p><p>Once the different checks for the command coherency and validity have been carried out, the command is sent to both spaces (visualization and analysis) in order to synchronize the visual results.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Performances\">Performances<\/span><\/h4>\n<p>The performance of our interpretation engine has been tested on several simple and complex voice commands, and execution times have been calculated (see Table 2). In order to clarify the results table, we performed the tests on an RDF database containing information from a molecular simulation of a 19-amino-acids peptide whose primary sequence is KETAAAKFERQHMDSSTSA. This structure was artificially created with PyMol<sup id=\"rdp-ebb-cite_ref-DeLanoThePy00_9-1\" class=\"reference\"><a href=\"#cite_note-DeLanoThePy00-9\">[9]<\/a><\/sup> and a short MD using GROMACS<sup id=\"rdp-ebb-cite_ref-HuangTheSPHINX93_28-1\" class=\"reference\"><a href=\"#cite_note-HuangTheSPHINX93-28\">[28]<\/a><\/sup> was used to simulate the newly created system and get a short trajectory. The ontology used here is the one created for our platform. We place ourselves in a context where the hierarchic structural level of the environment is amino acid, mainly to take advantage of the many properties associated with this hierarchical level in the ontology and thus be able to avoid complex commands. The syntax of the commands is adapted to be interpreted by the PyMol software. Finally, these tests were carried out independently of the SPHINX software in order to be able to compare them among themselves without any side-effects of the vocal interpreter\u2019s performance. The set of input keywords was then provided manually for each test.\n<\/p><p><br \/>\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"4\"><b>Table 2.<\/b> Example of commands used to evaluate performance of the inference engine for voice recognition\n<\/td><\/tr>\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Keywords\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Expected command\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Generated command\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Completion time\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Hide, Lines, Model, 128\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Hide lines, residue 1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16+17+18+19 and model 128\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Hide lines, residue 1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16+17+18+19 and model 128\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Approx. 54 milliseconds\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Color, Alanine, Blue\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Color blue, residue 4+5+6+19\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Color blue, residue 4+5+6+19\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Approx. 72 milliseconds\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Show, Secondary_structure, Residue, [2,5], Cartoon\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Show cartoon, residue 2+3+4+5\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Show secondary_structure, residue 2+3+4+5\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Approx. 56 milliseconds\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Show, Positive, Residue, Polar, Sphere, Chain, A\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Show sphere, residue 1+7+10 and chain A\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Show sphere, residue 1+2+7+9+10+11+12+14 and chain A\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Approx. 550 milliseconds\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>As we can see in Table 2, the overall precision of the interpretation engine is rather good, and only the last generated command significantly differs from the expected command reported in the table (5th line, 2nd column).\n<\/p><p>One could argue that the third command shows only a partial match between the expected and generated commands. However, we can observe that the engine successfully identified the concepts of \u201cSecondary structure\u201d and \u201cCartoon\u201d as equivalent (as illustrated in Figure 3) but chose to keep only the former, only based on its position in the keyword list, to create the query. In this case, \u201cCartoon\u201d refers directly to a particular visual representation, whereas \u201cSecondary structure\u201d is more related to a biological concept, the spatial arrangement of consecutive residues within a protein. The addition of a filter to define what representation keywords are allowed at the software level would be necessary to remove any command ambiguity.\n<\/p><p>The fourth and last command was supposed to show, as spheres, all residues that were both polar and positive. The difference in the list of residue IDs present there is due to a lack of a logical connector between the two properties. The engine interpreted this lack of connector as a logical \u201cOR\u201d instead of the expected \u201cAND\u201d and then output all residues that were either positive or polar (or both). This error points to the problem of interpretation by keyword when logical connectors must be used. It is then necessary to take these two possibilities into account and add their interpretation within the reference engine.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Limits_and_perspectives\">Limits and perspectives<\/span><\/h4>\n<p>Our interpretation engine is able to convert a wide range of keyword lists, ordered and unordered, into a functional and understandable software command for a specific molecular viewer. It does, however, have some limitations that provide interesting opportunities for future work. We have seen that the integration of the concept of logical connectors is essential in order to be able to handle multiple filter situations on individuals. These logical connectors can hardly fit in with our actual ontology, not really belonging to any of the five definition sets around which it was built. But logical operations are possible in SPARQL, which implements logical operators such as AND, OR, UNION, etc. Then the missing part lies at the interpretation engine that needs to incorporate those keywords and properly handle them to form the SPARQL command that will query the database.\n<\/p><p>It is important to note that the efficiency of the inference engine also depends on the quality of keywords collected by the speech recognition step. In this example this relates to our implementation but, more generally, to the generative step of these keywords. An absence of one or more keywords or the recognition of an erroneous keyword are errors that can be considered as common. In order to allow for a more pedagogical and intelligent way to provide a command than a simple error feedback and invitation to repeat the command, it is possible to use the knowledge accumulated in the ontology to provide the user with a controlled subset of relevant keywords to complete the command. This feature participates in the effort to provide an informed interaction mode between the expert and his visualization space, thus facilitating user experience. In the same spirit, the ability to provide the expert with a finite number of identifiers to perform his selection could anticipate certain user errors. It would therefore be possible to disambiguate a keyword identified as non-compliant with what would be expected or complete a partial command for which one or more keywords would be missing.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Synchronizing_interactive_selections_between_2D_and_3D_workspaces\">Synchronizing interactive selections between 2D and 3D workspaces<\/span><\/h3>\n<p>We have seen in the previous section that our interpretation engine is able to translate a list of vocalized keywords into an application command, but it provides further possibilities through its semantic-based architecture. Each interaction of the user with a structural group, a property, or an analytical value is ultimately translated into a list of individuals and their associated representations. This capability allows to not only execute commands within the dedicated software but also to synchronize the visual and analytical spaces between each other. As a consequence, each command that involved a selection is not only interpreted by the software but also by the platform that passes on the selection information to all spaces and their components (e.g., plots, graphs, etc.). (See Figure 5.)\n<\/p><p>Any selection made by the user triggers an event transmitted to a management module, resulting in an adaptation of the visualization to highlight the individual(s) selected.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig5_Trellet_JOfIntegBioinfo2018_15-2.jpg\" class=\"image wiki-link\" data-key=\"7e5826600323a41302edee24b1d0aa86\"><img alt=\"Fig5 Trellet JOfIntegBioinfo2018 15-2.jpg\" src=\"https:\/\/www.limswiki.org\/images\/1\/13\/Fig5_Trellet_JOfIntegBioinfo2018_15-2.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 5.<\/b> 3D structure visualization and analytical plot of residue distance to the center of mass for the KETAAAKFERQHMDSSTSA peptide in two different spaces of the same environment. The highlighted selection is the result of the 2nd command from Table 2.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Beyond its highlighting impact, a selection also reduces the user focus to a subset of individuals, both in the analysis space and the visualization one. It is possible to adapt this focus according to the user\u2019s needs by modifying the context level at which they want their selection to appear. Three levels of contextualization are possible:\n<\/p>\n<ul><li> No context \u2013 The selection of individual(s) leads to the unique visualization of these individuals in the visualization and analysis space and therefore hides any unselected individuals.<\/li>\n<li> Weak context \u2013 The selection of individual(s) highlights these individuals in the workspaces and reduces the perception of other individuals of the dataset (grey color, transparency, simplified visual rendering, etc.).<\/li>\n<li> Strong context \u2013 The selection of individual(s) is only perceived through a simple emphasis on these individuals in the work spaces. Any other individual will also appear with visual parameters close to the selected individuals.<\/li><\/ul>\n<p>These different levels make it possible either to highlight the differences between the selection and the rest of the data set, or to set up a streamlined working environment on a selection of interest to the user. These levels apply to both the visual and analytic parts through visual rendering systems specific to each space.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Semi-automated_analyses_triggered_by_direct_interactions\">Semi-automated analyses triggered by direct interactions<\/span><\/h3>\n<p>Although the majority of the data is present in the database created by the user, a regular work session often requires additional data, for example resulting from post-simulation calculations and therefore missing from the original database. These calculations are usually managed within scripts, sometimes linked to simulation tools, and executed outside the visualization loop as a result of the observation of a particular phenomenon during the exploration or following other analyses already performed beforehand. In order not to overload the database and leave the user in control of the analyses he wants to perform, we have set up the possibility of launching some semi-automated analyses during the working session.\n<\/p><p>SPARQL query language allows, in addition to querying a database, to modify, delete, or add data to the database. This possibility allows to feed the database with the results of analyses launched during the working session of a user. A list of analyses has been compiled and an ontological definition has been defined for each of them. This definition provides the type of data used as input and the type of data output. Thus, with respect to the desired analysis, our platform will propose a filtered choice of individuals to be selected whose type match the data type expected. In the same way, the values generated as output of the analysis are automatically entered into the database with respect to their ontological definition.\n<\/p><p>A \"distance\" tool requires, for example, two individuals of the same hierarchical level, or a selection of individuals of higher hierarchical level, between which these distances will be calculated. It is possible to classify these analyses into two categories:\n<\/p>\n<ul><li> Simple analyses group together analyses that generate a value that can be added directly to the properties of the individuals concerned. These include solvent accessibility, hydrophobicity, energy, and so on.<\/li>\n<li> Complex analyses are the result of a property describing a relationship between two individuals and thus requiring knowledge of these individuals to be perceptible. Complex analyses linking two individuals; the distance between two atoms, the RMSD between two sets of individuals, the angle between two chains, etc., are just some of the complex analyses that link two individuals.<\/li><\/ul>\n<p>While simple analyses simply add a property and the associated value to an individual, complex analyses must create a particular instance of one of the \"analysis\" concepts of the ontology. This concept will bring together the information\/definition needed to understand it. For example, the ontology\u2019s distance (analysis type) concept will store any calculated distance between two individuals for a selection of defined parent structures. The value of the distance, the URI of the two individuals involved, and all the structures within which the calculation was carried out will be properties of a distance instance and will be accessible only through that instance. The difference between a SPARQL query accessing values from a simple analysis and the SPARQL query accessing values from a complex analysis is illustrated below:\n<\/p><p><code>SELECT DISTINCT ?temp WHERE {my:MODEL_161 my:temperature ?temp}<\/code>\n<\/p><p><code>SELECT DISTINCT ?distance WHERE {?indiv rdf:type my:Distance . ?indiv my:objectA my:RES_3622 . ?indiv my:objectB my:RES_3626 . ?indiv my:distance ?distance}<\/code>\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Platform_architecture\">Platform architecture<\/span><\/h3>\n<p>The different components highlighted in the previous sections must efficiently communicate with each other to provide realistic feedback to the users. Our platform architecture, both from a hardware and a software perspective, had to be carefully planned to ensure that all tasks performed by the users are treated within an interactive time-frame (on the order of magnitude of a second for the analyses). Our platform design is based on a complex software architecture. In the diagram shown in Figure 6, we deliberately placed it in the middle of a double-sided communication loop connecting the visualization space to the analysis space. Our database is hosted on a local server accessible from the network to guarantee privileged and optimized access to our data. All communications are optimized to reduce the latency between a request triggered by the front-end sensors, its translation into a query in the database together with the treatment and transformation of the query results in the back-end, and finally the response presented to the user, once again at the front-end level.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig6_Trellet_JOfIntegBioinfo2018_15-2.jpg\" class=\"image wiki-link\" data-key=\"adcfa6e966605733892d916185866c4d\"><img alt=\"Fig6 Trellet JOfIntegBioinfo2018 15-2.jpg\" src=\"https:\/\/www.limswiki.org\/images\/1\/1d\/Fig6_Trellet_JOfIntegBioinfo2018_15-2.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 6.<\/b> Software and hardware architecture of our platform as UML deployment diagram<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Scenario_and_evaluation\">Scenario and evaluation<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Scenario\">Scenario<\/span><\/h3>\n<p>To illustrate the full-capacity of our platform architecture, we chose a typical example of a molecular system study. This example sets up a local visualization solution coupled to a distant web server where interactive graphs can be created. Both spaces can be rendered in an immersive environment, either in the same screen space or split on one 3D screen for the visualization and a tablet providing analysis results through a web server (see Figure 9, later). We assume, as it is the case in real studies, that the expert knows the molecular system well and can therefore interact vocally or by selecting elements in one of the spaces.\n<\/p><p>Our scenario studies the results of a molecular dynamics<sup id=\"rdp-ebb-cite_ref-AbrahamGROMACS15_31-0\" class=\"reference\"><a href=\"#cite_note-AbrahamGROMACS15-31\">[31]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-DrorBiomol12_32-0\" class=\"reference\"><a href=\"#cite_note-DrorBiomol12-32\">[32]<\/a><\/sup> experiment applied to a protein. We are voluntarily skipping the MD parametrization details since this was setup as a proof-of-concept and follows a very standard protocol.\n<\/p><p>In the first step of our scenario, the analytical space (web server) triggers a SPARQL query to retrieve every numerical value from our database. A list containing all the values will then be created and presented to the user for each structural component level (Model\/Chain\/Residue\/Atom) as illustrated in Figure 7. Once the data values are gathered, the expert will choose which structural component hierarchy he is interested in and which combination of properties he wants to plot in its analytical space.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig7_Trellet_JOfIntegBioinfo2018_15-2.jpg\" class=\"image wiki-link\" data-key=\"dfe1e41dbe2286c7fd68cdd02e23e3a2\"><img alt=\"Fig7 Trellet JOfIntegBioinfo2018 15-2.jpg\" src=\"https:\/\/www.limswiki.org\/images\/c\/c0\/Fig7_Trellet_JOfIntegBioinfo2018_15-2.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 7.<\/b> Query results showing all present numerical values from the database for each representation level available (Model\/Chain\/Residue\/Atom)<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Several queries will retrieve the property values that will be plotted thanks to the graphs library <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/d3js.org\/\" data-key=\"60d264620d6e336595a95461e46415a4\">D3.js<\/a>. Here the RMSD of each model with respect to the starting conformation has been plotted. On the X-axis we see the time step corresponding to each model of the MD trajectory and on the Y-axis their associated RMSD value.\n<\/p><p>Several models of interest can be selected, either via a vocal command or by direct selection with the 2D interactive plots, as shown in the first step of Figure 8. We selected here the three lowest RMSD models (including the reference). The selection is synchronized over all previously created scatter plots and will trigger a synchronous visualization of the individuals in the visual space (see second step of Figure 8).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig8_Trellet_JOfIntegBioinfo2018_15-2.jpg\" class=\"image wiki-link\" data-key=\"8b3e08fd0546bb61e6a13fac4b2f5d16\"><img alt=\"Fig8 Trellet JOfIntegBioinfo2018 15-2.jpg\" src=\"https:\/\/www.limswiki.org\/images\/e\/e3\/Fig8_Trellet_JOfIntegBioinfo2018_15-2.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 8.<\/b> On the right, analytical space where interactive plotting are added upon user actions in the visualization space or through the available menus. On the left, visualization space, each object is displayed synchronously with the selected individuals of the analytical space.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The expert may then switch to the visualization space and select some elements of the displayed structures he would like to focus on. We selected here three residues from the three different models. These sub-elements of the current models will be sent to the analytical space that will ask the expert for the properties to be plotted. As in the previous step, a list of available numerical values associated to the residues will be provided. Once the choice is made, the selection will be highlighted in the analytical space as shown in the third step of Figure 8. We chose here to display the solvent exposed area with respect to the residue IDs. In blue, the three residues we have selected in the visualization space are displayed as mentioned in the fourth step of Figure 8.\n<\/p><p>New graphs can be added at runtime and synchronized with the current ones. However, it is important to note that a full synchronization between the visualization and analytical spaces requires the same hierarchy of structural elements to be selected in both spaces. If a new selection is made at a model level, any graphs of lower hierarchy will be reset with the new selected models and the visualization will be reset with the new models at the same time.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Evaluation_of_high-level_task_completion_based_on_hierarchical_task_analysis\">Evaluation of high-level task completion based on hierarchical task analysis<\/span><\/h3>\n<p>The evaluation process started from the observation that the systematic evaluation of field-related tasks is rather complicated to set up for four reasons. (1) Usage and nature of the evaluated tools, in particular in molecular visualization, differ between experts. (2) Implementation and adaptation of our developments over a representative sample of the tools is complex and very time-consuming. (3) Our approach is biased since it is based on the execution of expert tasks. (4) In order to apply standard statistic methods for evaluation, it is necessary to gather enough participants, yet the number of experts in our application field is rather limited.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig9_Trellet_JOfIntegBioinfo2018_15-2.jpg\" class=\"image wiki-link\" data-key=\"a380a6547ae7c570abdb5fb43fe5fcf7\"><img alt=\"Fig9 Trellet JOfIntegBioinfo2018 15-2.jpg\" src=\"https:\/\/www.limswiki.org\/images\/e\/e0\/Fig9_Trellet_JOfIntegBioinfo2018_15-2.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 9.<\/b> Platform illustration in a hybrid environment made of a 3D immersive CAVE2 system (EVL\/UIC, Chicago) together with a graphical tablet for 2D analytical representation.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>We therefore propose an evaluation method that is more theoretically oriented than empirical: the HTA method (for \u201cHierarchical Task Analysis\u201d).<sup id=\"rdp-ebb-cite_ref-AnnettHier03_33-0\" class=\"reference\"><a href=\"#cite_note-AnnettHier03-33\">[33]<\/a><\/sup> The HTA method consists of a division of a primary task into several sub-tasks. Each sub-task can be subdivided again until the sub-tasks reach a degree of precision sufficient to have their execution time evaluated accurately. This method is particularly useful to compare similar tasks performed under different conditions. It allows to evaluate both the task methodology with respect to specific conditions and the performance of the conditions for a specific task. HTA requires only one expert to compare the different sub-task execution times (see Figure 10).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig10_Trellet_JOfIntegBioinfo2018_15-2.jpg\" class=\"image wiki-link\" data-key=\"3bd7b1a9cb1ac7cd93ac811e1185cc98\"><img alt=\"Fig10 Trellet JOfIntegBioinfo2018 15-2.jpg\" src=\"https:\/\/www.limswiki.org\/images\/0\/03\/Fig10_Trellet_JOfIntegBioinfo2018_15-2.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 10.<\/b> Subdivision by HTA of an expert task performed (A) in normal conditions and (B) within our platform setup<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>We evaluate here a typical task that a structural biologist would perform on a daily basis. Here we asked experts to measure the diameter of the main pore of a transmembrane protein in two different setups. One is a typical setup where visualization software and some analysis files are available with an atomic model of the transmembrane protein. The second setup involves our platform, where the visualization software is now connected to a web page where interactive graphs can be displayed. The expert can interact with both spaces through two different devices connecting locally and where network latency is negligible. Tests have been made with a laptop where an instance of PyMol was running and a tablet where 2D plots were displayed within a web browser.\n<\/p><p>The task can be divided into three distinct steps. It requires a first step in the processing of analytical data where the lowest energy model will be sought among models more than 10 \u00c5 of RMSD distant from the reference model, this distance reflecting significant conformational changes. When the model concerned is identified, it should be visualized in order to see the pore and be able to select its ends. The third and final step consists of calculating the distance between two atoms on either side of the pore.\n<\/p><p>There is a significantly shorter runtime when using our platform (19 seconds) compared to a standard use of the analysis and visualization tools (29 seconds). The first step of analysis is the stage where the difference is most important and is highlighted by the orange sub-tags in the HTA graph in Figure 10. This difference can be explained by the use of interactive graphs to visualize RMSD and energy values for all models. The interactive graph and the selection tools associated (vocal recognition or manual selection) allow to quickly query all models more than 10 \u00c5 away from the reference. Identifying the model with the lowest energy is then a really quick visual analysis of the energy graph. On the opposite, use of standard tools in command-line is more complicated because it requires a more complex visual analysis. It is indeed more tedious to find a minimum value while going over a text file than by looking at a cloud of dots.\n<\/p><p>Then, the synchronization of the plot selections within the analytical space allows us to even further shorten the time required to find the lowest energy model in the second plot among the ones selected in the first.\n<\/p><p>Loading the model into the visualization software is also made easier in the platform since our application makes it possible to automatically pass on the selection of the model from a plot directly into the visualization space. The similar steps, shown in green in Figure 10, involve execution times and are therefore independent of the working conditions in which the sub-tasks are performed.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusion\">Conclusion<\/span><\/h2>\n<p>Immersive virtual reality is yet used only sparsely to explore biomolecules, which may be due to limitations imposed by several important constraints.\n<\/p><p>On the one hand, applications usable in virtual reality do not offer enough interaction modalities adapted to the immersive context to access the essential and usual features of molecular visualization software. In such a context, paradigms of direct interaction are lacking, both to make selections directly on the 3D representation of the molecule, and through complex criteria, to interactively change the different modes of molecular representations used to represent these selections. Until now, these selection tasks have to be made by the usual means like mouse and keyboard.\n<\/p><p>On the other hand, the impossibility of performing other analysis, pre- and post-processing tasks or visualizing these analysis results, closer to the field of information visualization rather than 3D visualization, forces the user to come back systematically to an office context.\n<\/p><p>To address these issues, we have set up a semantic layer over an immersive environment dedicated to the interactive visualization and analysis of molecular simulation data. This setup was achieved through the implementation of an ontology describing both structural biology and interaction concepts manipulated by the experts during a study process. As a result, we believe that our pipeline might be a solid base for immersive analytics studies applied to structural biology. In the same vein as projects by Chandler <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-ChandlerImmersive15_34-0\" class=\"reference\"><a href=\"#cite_note-ChandlerImmersive15-34\">[34]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-Sommer3D17_35-0\" class=\"reference\"><a href=\"#cite_note-Sommer3D17-35\">[35]<\/a><\/sup> we successfully combine several immersive views over a particular phenomenon.\n<\/p><p>Our architecture, built around heterogeneous components, achieves to bring together visualization and analytical spaces thanks to a common ontology-driven module that maintains a perfect synchronization between the different representations of the same elements in the two spaces. One strength of the platform is its independence regarding the visualization technology used for both spaces. Combinations are numerous, from a CAVE system coupled to a tablet to a VR headset showcasing a room where each wall would display either a 3D structure or some analysis. Our semantic layer lies beneath the visualization technology used and only provides bridges between heterogeneous tools aiming at exploring molecular structures on one side and complex analyses on the other.\n<\/p><p>The knowledge provided by the ontology can also significantly improve the interactive capability of the platform by proposing contextualized analysis choices to the user, adapted to the types of elements in his current focus. All along the study process, a set of specific analyses, non redundant with the ones already performed, can be interactively chosen to populate the database. A simple definition of analyses in the ontology, adding input and output types, is sufficient to decide whether an analysis is pertinent or not for a precise selection, and whether the resulting values are already present in the database or not.\n<\/p><p>The reasoning capability of the ontology allowed us to develop an efficient interpretation engine that can transform a vocal command composed of keywords into an application command. This framework paves the way for a multimodal supervision tool that would use the high-level description of the manipulated elements, as well as the heterogeneous interaction natures, to merge inputs and create intelligent and complex commands in line with the work of M.E. Latoschik.<sup id=\"rdp-ebb-cite_ref-WiebuschDecoup15_36-0\" class=\"reference\"><a href=\"#cite_note-WiebuschDecoup15-36\">[36]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-GuttierezSemantics05_37-0\" class=\"reference\"><a href=\"#cite_note-GuttierezSemantics05-37\">[37]<\/a><\/sup> The RDF\/RDFS\/OWL model coupled to the SPARQL language allows to enunciate rules of inference, which is particularly important for the decision taking process in collaborative contexts. In these contexts, two users may trigger a multimodal command, in a conjoint way, that can be difficult to interpret without proper rules. An effort would then have to be made to integrate these rules in a future supervisor of the input modality, based on the semantic model, considering users as elements of modality in a multimodal interaction.\n<\/p><p>Our approach is a proof of concept application and is available as a <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/github.com\/mtrellet\/PyMol_Interactive_Plotting\" data-key=\"3a06067d9da375addf2a598135b26050\">GitHub repository<\/a>, but it opens the way to a new generation of scientific tools. We illustrated our developments through the field of structural biology but it is worth to note that the generic nature of the semantic web allows to extend our developments to most scientific fields where a tight coupling between visualization and analyses is important. We especially target to integrate all the concepts described in this paper in new molecular visualization tools such as UnityMol<sup id=\"rdp-ebb-cite_ref-DoutreligneUnity14_38-0\" class=\"reference\"><a href=\"#cite_note-DoutreligneUnity14-38\">[38]<\/a><\/sup>, which allows a more comfortable code integration compared to classical molecular visualization application.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>The authors wish to thank Xavier Martinez for UnityMol pictures courtesy. This work was supported in part by the French national agency research project Exaviz (ANR-11-MONU-0003) and by the \u201cInitiative d\u2019Excellence\u201d program from the French State (grant \u201cDYNAMO\u201d, ANR-11-LABX-0011-01; equipment grants Digiscope, ANR-10-EQPX-0026 and Cacsice, ANR-11-EQPX-0008).\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Conflict_of_interest\">Conflict of interest<\/span><\/h3>\n<p>Authors state no conflict of interest. All authors have read the journal\u2019s publication ethics and publication malpractice statement available at the journal\u2019s website and hereby confirm that they comply with all its parts applicable to the present scientific work.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-ZhaoMature13-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ZhaoMature13_1-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Zhao, G.; Perilla, J.R.; Yufenyuy, E.L. et al. (2013). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3729984\" data-key=\"26ce04756838a11f8785921c47549d43\">\"Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics\"<\/a>. <i>Nature<\/i> <b>497<\/b> (7451): 643\u20136. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnature12162\" data-key=\"ca66837b5da9e12e48307f1e2fdd9e7d\">10.1038\/nature12162<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3729984\/\" data-key=\"b3531d7164dadfa9aedcb057d6a01075\">PMC3729984<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23719463\" data-key=\"83b3731c6a8fb8ea10c818d8d7f10a8e\">23719463<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3729984\" data-key=\"26ce04756838a11f8785921c47549d43\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3729984<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mature+HIV-1+capsid+structure+by+cryo-electron+microscopy+and+all-atom+molecular+dynamics&rft.jtitle=Nature&rft.aulast=Zhao%2C+G.%3B+Perilla%2C+J.R.%3B+Yufenyuy%2C+E.L.+et+al.&rft.au=Zhao%2C+G.%3B+Perilla%2C+J.R.%3B+Yufenyuy%2C+E.L.+et+al.&rft.date=2013&rft.volume=497&rft.issue=7451&rft.pages=643%E2%80%936&rft_id=info:doi\/10.1038%2Fnature12162&rft_id=info:pmc\/PMC3729984&rft_id=info:pmid\/23719463&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3729984&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ZhangStructure17-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ZhangStructure17_2-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Zhang, J.; Ma, J.; Liu, D. et al. (2017). \"Structure of phycobilisome from the red alga Griffithsia pacifica\". <i>Nature<\/i> <b>551<\/b> (7678): 57\u201363. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnature24278\" data-key=\"e2aabafd82f9ba1a87dabbf505445286\">10.1038\/nature24278<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/29045394\" data-key=\"1bcaec3db5ace4e49e38f0a36f95b7b5\">29045394<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Structure+of+phycobilisome+from+the+red+alga+Griffithsia+pacifica&rft.jtitle=Nature&rft.aulast=Zhang%2C+J.%3B+Ma%2C+J.%3B+Liu%2C+D.+et+al.&rft.au=Zhang%2C+J.%3B+Ma%2C+J.%3B+Liu%2C+D.+et+al.&rft.date=2017&rft.volume=551&rft.issue=7678&rft.pages=57%E2%80%9363&rft_id=info:doi\/10.1038%2Fnature24278&rft_id=info:pmid\/29045394&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-vanDamImmersive00-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-vanDamImmersive00_3-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">van Dam, A.; Forsberg, A.S.; Laidlaw, D.H. et al. (2000). \"Immersive VR for scientific visualization: A progress report\". <i>IEEE Computer Graphics and Applications<\/i> <b>20<\/b> (6): 26\u201352. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2F38.888006\" data-key=\"843007730afc4498822248a266eacd6f\">10.1109\/38.888006<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Immersive+VR+for+scientific+visualization%3A+A+progress+report&rft.jtitle=IEEE+Computer+Graphics+and+Applications&rft.aulast=van+Dam%2C+A.%3B+Forsberg%2C+A.S.%3B+Laidlaw%2C+D.H.+et+al.&rft.au=van+Dam%2C+A.%3B+Forsberg%2C+A.S.%3B+Laidlaw%2C+D.H.+et+al.&rft.date=2000&rft.volume=20&rft.issue=6&rft.pages=26%E2%80%9352&rft_id=info:doi\/10.1109%2F38.888006&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-StoneImmersive10-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-StoneImmersive10_4-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Stone. J.E.; Kohlmeyer, A.; Vandivort, K.L.; Schulten, K. (2010). \"Immersive molecular visualization and interactive modeling with commodity hardware\". <i>Proceedings of the 6th International Conference on Advances in Visual Computing<\/i>: 382\u201393. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-642-17274-8_38\" data-key=\"dbdc5108c942ec55aeec887cbbdf8dd1\">10.1007\/978-3-642-17274-8_38<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Immersive+molecular+visualization+and+interactive+modeling+with+commodity+hardware&rft.jtitle=Proceedings+of+the+6th+International+Conference+on+Advances+in+Visual+Computing&rft.aulast=Stone.+J.E.%3B+Kohlmeyer%2C+A.%3B+Vandivort%2C+K.L.%3B+Schulten%2C+K.&rft.au=Stone.+J.E.%3B+Kohlmeyer%2C+A.%3B+Vandivort%2C+K.L.%3B+Schulten%2C+K.&rft.date=2010&rft.pages=382%E2%80%9393&rft_id=info:doi\/10.1007%2F978-3-642-17274-8_38&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ODonoghueVisual10-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ODonoghueVisual10_5-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">O'Donoghue, S.I.; Goodsell, D.S.; Frangakis, A.S. et al. (2010). \"Visualization of macromolecular structures\". <i>Nature Methods<\/i> <b>7<\/b> (3 Suppl.): S42\u201355. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnmeth.1427\" data-key=\"e8a59b095db34056c03bf09dfc90b3be\">10.1038\/nmeth.1427<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20195256\" data-key=\"2a092edb2a19a32a3a771ade7fefd202\">20195256<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Visualization+of+macromolecular+structures&rft.jtitle=Nature+Methods&rft.aulast=O%27Donoghue%2C+S.I.%3B+Goodsell%2C+D.S.%3B+Frangakis%2C+A.S.+et+al.&rft.au=O%27Donoghue%2C+S.I.%3B+Goodsell%2C+D.S.%3B+Frangakis%2C+A.S.+et+al.&rft.date=2010&rft.volume=7&rft.issue=3+Suppl.&rft.pages=S42%E2%80%9355&rft_id=info:doi\/10.1038%2Fnmeth.1427&rft_id=info:pmid\/20195256&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HirstMolec14-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HirstMolec14_6-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hirst, J.D.; Glowacki, D.R.; Baaden, M. et al. (2014). \"Molecular simulations and visualization: Introduction and overview\". <i>Faraday Discussions<\/i> <b>169<\/b>: 9\u201322. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1039%2Fc4fd90024c\" data-key=\"06bacc94c9456d049bf8080d6d023a5e\">10.1039\/c4fd90024c<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25285906\" data-key=\"660f6a529a07e1e241f6308284fbc41a\">25285906<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Molecular+simulations+and+visualization%3A+Introduction+and+overview&rft.jtitle=Faraday+Discussions&rft.aulast=Hirst%2C+J.D.%3B+Glowacki%2C+D.R.%3B+Baaden%2C+M.+et+al.&rft.au=Hirst%2C+J.D.%3B+Glowacki%2C+D.R.%3B+Baaden%2C+M.+et+al.&rft.date=2014&rft.volume=169&rft.pages=9%E2%80%9322&rft_id=info:doi\/10.1039%2Fc4fd90024c&rft_id=info:pmid\/25285906&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GoddardUCSF18-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GoddardUCSF18_7-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Goddard, T.D., Huang, C.C.; Meng, E.C. et al. (2018). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5734306\" data-key=\"8fe14fec86e7abc39daed57a04c19c4c\">\"UCSF ChimeraX: Meeting modern challenges in visualization and analysis\"<\/a>. <i>Protein Science<\/i> <b>27<\/b> (1): 14\u201325. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1002%2Fpro.3235\" data-key=\"ff35cdde614fe6d59f14e78b3fabbe28\">10.1002\/pro.3235<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5734306\/\" data-key=\"96a5004dffec460179408d949420e0e5\">PMC5734306<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/28710774\" data-key=\"eea7936bc8042996f8de9e2f15b7ee93\">28710774<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5734306\" data-key=\"8fe14fec86e7abc39daed57a04c19c4c\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5734306<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=UCSF+ChimeraX%3A+Meeting+modern+challenges+in+visualization+and+analysis&rft.jtitle=Protein+Science&rft.aulast=Goddard%2C+T.D.%2C+Huang%2C+C.C.%3B+Meng%2C+E.C.+et+al.&rft.au=Goddard%2C+T.D.%2C+Huang%2C+C.C.%3B+Meng%2C+E.C.+et+al.&rft.date=2018&rft.volume=27&rft.issue=1&rft.pages=14%E2%80%9325&rft_id=info:doi\/10.1002%2Fpro.3235&rft_id=info:pmc\/PMC5734306&rft_id=info:pmid\/28710774&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5734306&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-F.C3.A9reyMulti09-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-F.C3.A9reyMulti09_8-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">F\u00e9rey, N.; Nelson, J.; Martin, C. et al. (2009). \"Multisensory VR interaction for protein-docking in the CoRSAIRe project\". <i>Virtual Reality<\/i> <b>13<\/b>: 273. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2Fs10055-009-0136-z\" data-key=\"96bfc89a757c13ea69c9087cc219f757\">10.1007\/s10055-009-0136-z<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multisensory+VR+interaction+for+protein-docking+in+the+CoRSAIRe+project&rft.jtitle=Virtual+Reality&rft.aulast=F%C3%A9rey%2C+N.%3B+Nelson%2C+J.%3B+Martin%2C+C.+et+al.&rft.au=F%C3%A9rey%2C+N.%3B+Nelson%2C+J.%3B+Martin%2C+C.+et+al.&rft.date=2009&rft.volume=13&rft.pages=273&rft_id=info:doi\/10.1007%2Fs10055-009-0136-z&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DeLanoThePy00-9\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-DeLanoThePy00_9-0\">9.0<\/a><\/sup> <sup><a href=\"#cite_ref-DeLanoThePy00_9-1\">9.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">DeLano, W. (04 September 2000). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/pymol.sourceforge.net\/overview\/index.htm\" data-key=\"62e6ec4b35ff6589298bbd65d265d75e\">\"The PyMOL Molecular Graphics System\"<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/pymol.sourceforge.net\/overview\/index.htm\" data-key=\"62e6ec4b35ff6589298bbd65d265d75e\">http:\/\/pymol.sourceforge.net\/overview\/index.htm<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+PyMOL+Molecular+Graphics+System&rft.atitle=&rft.aulast=DeLano%2C+W.&rft.au=DeLano%2C+W.&rft.date=04+September+2000&rft_id=http%3A%2F%2Fpymol.sourceforge.net%2Foverview%2Findex.htm&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HumphreyVMD96-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HumphreyVMD96_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Humphrey, W.; Dalke, A.; Schulten, K. et al. (1996). \"VMD: Visual molecular dynamics\". <i>Journal of Molecular Graphics<\/i> <b>14<\/b> (1): 33\u20138. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2F0263-7855%2896%2900018-5\" data-key=\"7ce858aa6d8f67efa06a0d387160afb2\">10.1016\/0263-7855(96)00018-5<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=VMD%3A+Visual+molecular+dynamics&rft.jtitle=Journal+of+Molecular+Graphics&rft.aulast=Humphrey%2C+W.%3B+Dalke%2C+A.%3B+Schulten%2C+K.+et+al.&rft.au=Humphrey%2C+W.%3B+Dalke%2C+A.%3B+Schulten%2C+K.+et+al.&rft.date=1996&rft.volume=14&rft.issue=1&rft.pages=33%E2%80%938&rft_id=info:doi\/10.1016%2F0263-7855%2896%2900018-5&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LvGame13-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LvGame13_11-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Lv, Z.; Tek, A.; Da Silva, F. et al. (2013). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3590297\" data-key=\"df75cf562397d7bd0f79a82775cc3edb\">\"Game on, science - How video game technology may help biologists tackle visualization challenges\"<\/a>. <i>PLoS One<\/i> <b>8<\/b> (3): e57990. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pone.0057990\" data-key=\"6fabb3d1d077f3eafaa4b2b1ff3ab995\">10.1371\/journal.pone.0057990<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3590297\/\" data-key=\"0757907bf29943d65e48e591c4c0be1d\">PMC3590297<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23483961\" data-key=\"54fc205456afbc8d6b4b5ae02931e503\">23483961<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3590297\" data-key=\"df75cf562397d7bd0f79a82775cc3edb\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3590297<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Game+on%2C+science+-+How+video+game+technology+may+help+biologists+tackle+visualization+challenges&rft.jtitle=PLoS+One&rft.aulast=Lv%2C+Z.%3B+Tek%2C+A.%3B+Da+Silva%2C+F.+et+al.&rft.au=Lv%2C+Z.%3B+Tek%2C+A.%3B+Da+Silva%2C+F.+et+al.&rft.date=2013&rft.volume=8&rft.issue=3&rft.pages=e57990&rft_id=info:doi\/10.1371%2Fjournal.pone.0057990&rft_id=info:pmc\/PMC3590297&rft_id=info:pmid\/23483961&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3590297&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SowaConcept84-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SowaConcept84_12-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Sowa, J.F. (1984). <i>Conceptual structures: Information processing in mind and machine<\/i>. Addison-Wesley Longman Publishing Co. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 0201144727.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Conceptual+structures%3A+Information+processing+in+mind+and+machine&rft.aulast=Sowa%2C+J.F.&rft.au=Sowa%2C+J.F.&rft.date=1984&rft.pub=Addison-Wesley+Longman+Publishing+Co&rft.isbn=0201144727&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Berners-LeeTheSemantic01-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Berners-LeeTheSemantic01_13-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Berners-Lee, T.; Hendler, J.; Lassila, O. (2001). \"The Semantic Web\". <i>Scientific American<\/i> <b>284<\/b>: 28\u201337.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Semantic+Web&rft.jtitle=Scientific+American&rft.aulast=Berners-Lee%2C+T.%3B+Hendler%2C+J.%3B+Lassila%2C+O.&rft.au=Berners-Lee%2C+T.%3B+Hendler%2C+J.%3B+Lassila%2C+O.&rft.date=2001&rft.volume=284&rft.pages=28%E2%80%9337&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CyganiakRDF14-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CyganiakRDF14_14-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Cyganiak, R.; Wood, D.; Lanthaler, M., ed. (25 February 2014). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.w3.org\/TR\/rdf11-concepts\/\" data-key=\"90369aecf4a9908f6bfaca19e8707b51\">\"RDF 1.1 Concepts and Abstract Syntax\"<\/a>. World Wide Web Consortium<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.w3.org\/TR\/rdf11-concepts\/\" data-key=\"90369aecf4a9908f6bfaca19e8707b51\">https:\/\/www.w3.org\/TR\/rdf11-concepts\/<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=RDF+1.1+Concepts+and+Abstract+Syntax&rft.atitle=&rft.date=25+February+2014&rft.pub=World+Wide+Web+Consortium&rft_id=https%3A%2F%2Fwww.w3.org%2FTR%2Frdf11-concepts%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BrickleyRDF14-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BrickleyRDF14_15-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Brickley, D.; Guha, R.V., ed. (25 February 2014). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.w3.org\/TR\/rdf-schema\/\" data-key=\"9b149e906d2e8f7490d8e08e75b0361b\">\"RDF Schema 1.1\"<\/a>. World Wide Web Consortium<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.w3.org\/TR\/rdf-schema\/\" data-key=\"9b149e906d2e8f7490d8e08e75b0361b\">https:\/\/www.w3.org\/TR\/rdf-schema\/<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=RDF+Schema+1.1&rft.atitle=&rft.date=25+February+2014&rft.pub=World+Wide+Web+Consortium&rft_id=https%3A%2F%2Fwww.w3.org%2FTR%2Frdf-schema%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MotikOWL12-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MotikOWL12_16-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Motik, B.; Patel-Schneider, P.F.; Parsia, B., ed. (11 December 2012). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.w3.org\/TR\/owl2-syntax\/\" data-key=\"f49337e5f644dbdf6e4e3550978c8ed5\">\"OWL 2 Web Ontology Language\"<\/a>. World Wide Web Consortium<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.w3.org\/TR\/owl2-syntax\/\" data-key=\"f49337e5f644dbdf6e4e3550978c8ed5\">https:\/\/www.w3.org\/TR\/owl2-syntax\/<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=OWL+2+Web+Ontology+Language&rft.atitle=&rft.date=11+December+2012&rft.pub=World+Wide+Web+Consortium&rft_id=https%3A%2F%2Fwww.w3.org%2FTR%2Fowl2-syntax%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HarrisSPARQL13-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HarrisSPARQL13_17-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Harris, S.; Seaborne, A., ed. (21 March 2013). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.w3.org\/TR\/sparql11-query\/\" data-key=\"82300a47b55e13c29fd0a540a8fd2994\">\"SPARQL 1.1 Query Language\"<\/a>. World Wide Web Consortium<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.w3.org\/TR\/sparql11-query\/\" data-key=\"82300a47b55e13c29fd0a540a8fd2994\">https:\/\/www.w3.org\/TR\/sparql11-query\/<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=SPARQL+1.1+Query+Language&rft.atitle=&rft.date=21+March+2013&rft.pub=World+Wide+Web+Consortium&rft_id=https%3A%2F%2Fwww.w3.org%2FTR%2Fsparql11-query%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GiacomoTBox96-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GiacomoTBox96_18-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">De Giacomo, G.; Lenzerini, M. (1996). \"TBox and ABox Reasoning in Expressive Description Logics\". <i>Proceedings of the Fifth International Conference on Principles of Knowledge Representation and Reasoning<\/i>: 316\u201327. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 1558604219.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=TBox+and+ABox+Reasoning+in+Expressive+Description+Logics&rft.jtitle=Proceedings+of+the+Fifth+International+Conference+on+Principles+of+Knowledge+Representation+and+Reasoning&rft.aulast=De+Giacomo%2C+G.%3B+Lenzerini%2C+M.&rft.au=De+Giacomo%2C+G.%3B+Lenzerini%2C+M.&rft.date=1996&rft.pages=316%E2%80%9327&rft.isbn=1558604219&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Schulze-KremerOnto02-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Schulze-KremerOnto02_19-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Schulze-Kremer, S. (2002). \"Ontologies for molecular biology and bioinformatics\". <i>In Silico Biology<\/i> <b>2<\/b> (3): 179\u201393. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/12542404\" data-key=\"3d8fcfdf68caaeecbb001cfd6f42ca7c\">12542404<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ontologies+for+molecular+biology+and+bioinformatics&rft.jtitle=In+Silico+Biology&rft.aulast=Schulze-Kremer%2C+S.&rft.au=Schulze-Kremer%2C+S.&rft.date=2002&rft.volume=2&rft.issue=3&rft.pages=179%E2%80%9393&rft_id=info:pmid\/12542404&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SchuurmanOnto08-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SchuurmanOnto08_20-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Schuurman, N.; Leszcynski, A. (2008). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2735951\" data-key=\"a240e118316421a7417fde3ad76ea00a\">\"Ontologies for bioinformatics\"<\/a>. <i>Bioinformatics and Biology Insights<\/i> <b>2<\/b>: 187\u2014200. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2735951\/\" data-key=\"7a2923f50df7317285bc179f82e24000\">PMC2735951<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/19812775\" data-key=\"66daa7e4dd356b336d188305cee5e58e\">19812775<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2735951\" data-key=\"a240e118316421a7417fde3ad76ea00a\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2735951<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ontologies+for+bioinformatics&rft.jtitle=Bioinformatics+and+Biology+Insights&rft.aulast=Schuurman%2C+N.%3B+Leszcynski%2C+A.&rft.au=Schuurman%2C+N.%3B+Leszcynski%2C+A.&rft.date=2008&rft.volume=2&rft.pages=187%E2%80%94200&rft_id=info:pmc\/PMC2735951&rft_id=info:pmid\/19812775&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC2735951&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GOCGene00-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GOCGene00_21-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">The Gene Ontology Consortium, Ashburner, M.; Ball, C.A. et al. (2000). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3037419\" data-key=\"49448e34cb09513c0519502f6294d617\">\"Gene ontology: Tool for the unification of biology\"<\/a>. <i>Nature Genetics<\/i> <b>25<\/b> (1): 25\u20139. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1038%2F75556\" data-key=\"14b282d31d13538dd4b9d0844a3be11f\">10.1038\/75556<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3037419\/\" data-key=\"75a84bde67690fe2ff39f00f109fe6f6\">PMC3037419<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/10802651\" data-key=\"8789dc73118e4e112a6429e80e55dceb\">10802651<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3037419\" data-key=\"49448e34cb09513c0519502f6294d617\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3037419<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Gene+ontology%3A+Tool+for+the+unification+of+biology&rft.jtitle=Nature+Genetics&rft.aulast=The+Gene+Ontology+Consortium%2C+Ashburner%2C+M.%3B+Ball%2C+C.A.+et+al.&rft.au=The+Gene+Ontology+Consortium%2C+Ashburner%2C+M.%3B+Ball%2C+C.A.+et+al.&rft.date=2000&rft.volume=25&rft.issue=1&rft.pages=25%E2%80%939&rft_id=info:doi\/10.1038%2F75556&rft_id=info:pmc\/PMC3037419&rft_id=info:pmid\/10802651&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3037419&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RabattuMyCorporis15-22\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RabattuMyCorporis15_22-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Rabattu, P.Y.; Mass\u00e9, B.; Ulliana, F. et al. (2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4582726\" data-key=\"2daa63405376c7430d94d8dae9405b29\">\"My Corporis Fabrica Embryo: An ontology-based 3D spatio-temporal modeling of human embryo development\"<\/a>. <i>Journal of Biomedical Semantics<\/i> <b>6<\/b>: 36. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1186%2Fs13326-015-0034-0\" data-key=\"5a0f70570103d62fcf51696c7d63af6e\">10.1186\/s13326-015-0034-0<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4582726\/\" data-key=\"7075dae9ab5154315c1933e3df9daab9\">PMC4582726<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26413258\" data-key=\"f17641e572e004ef1abb445f9887bb59\">26413258<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4582726\" data-key=\"2daa63405376c7430d94d8dae9405b29\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4582726<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=My+Corporis+Fabrica+Embryo%3A+An+ontology-based+3D+spatio-temporal+modeling+of+human+embryo+development&rft.jtitle=Journal+of+Biomedical+Semantics&rft.aulast=Rabattu%2C+P.Y.%3B+Mass%C3%A9%2C+B.%3B+Ulliana%2C+F.+et+al.&rft.au=Rabattu%2C+P.Y.%3B+Mass%C3%A9%2C+B.%3B+Ulliana%2C+F.+et+al.&rft.date=2015&rft.volume=6&rft.pages=36&rft_id=info:doi\/10.1186%2Fs13326-015-0034-0&rft_id=info:pmc\/PMC4582726&rft_id=info:pmid\/26413258&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4582726&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SmithTheOBO07-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SmithTheOBO07_23-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Smith, B.; Ashburner, M.; Rosse, C. et al. (2007). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2814061\" data-key=\"fe2836b6dedc2f6ad6b4ab3e0935834d\">\"The OBO Foundry: Coordinated evolution of ontologies to support biomedical data integration\"<\/a>. <i>Nature Biotechnology<\/i> <b>25<\/b> (11): 1251\u20135. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnbt1346\" data-key=\"e7d48850f61ee2d1b8e809b50a62576f\">10.1038\/nbt1346<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2814061\/\" data-key=\"749cce53281afcb42350248630fd6a61\">PMC2814061<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17989687\" data-key=\"07c7fafba153d1cf0e415e1ab5c663a8\">17989687<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2814061\" data-key=\"fe2836b6dedc2f6ad6b4ab3e0935834d\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2814061<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+OBO+Foundry%3A+Coordinated+evolution+of+ontologies+to+support+biomedical+data+integration&rft.jtitle=Nature+Biotechnology&rft.aulast=Smith%2C+B.%3B+Ashburner%2C+M.%3B+Rosse%2C+C.+et+al.&rft.au=Smith%2C+B.%3B+Ashburner%2C+M.%3B+Rosse%2C+C.+et+al.&rft.date=2007&rft.volume=25&rft.issue=11&rft.pages=1251%E2%80%935&rft_id=info:doi\/10.1038%2Fnbt1346&rft_id=info:pmc\/PMC2814061&rft_id=info:pmid\/17989687&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC2814061&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BelleauBio2RDF-24\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BelleauBio2RDF_24-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Belleau, F.; Nolin, M.A;. Tourigny, N. et al. (2008). \"Bio2RDF: towards a mashup to build bioinformatics knowledge systems\". <i>Journal of Biomedical Informatics<\/i> <b>41<\/b> (5): 706\u201316. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.jbi.2008.03.004\" data-key=\"6257d62ee8253a593e20606ee2dad1d4\">10.1016\/j.jbi.2008.03.004<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/18472304\" data-key=\"21297fdccdc57738b3fb69d4ff98ab23\">18472304<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Bio2RDF%3A+towards+a+mashup+to+build+bioinformatics+knowledge+systems&rft.jtitle=Journal+of+Biomedical+Informatics&rft.aulast=Belleau%2C+F.%3B+Nolin%2C+M.A%3B.+Tourigny%2C+N.+et+al.&rft.au=Belleau%2C+F.%3B+Nolin%2C+M.A%3B.+Tourigny%2C+N.+et+al.&rft.date=2008&rft.volume=41&rft.issue=5&rft.pages=706%E2%80%9316&rft_id=info:doi\/10.1016%2Fj.jbi.2008.03.004&rft_id=info:pmid\/18472304&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HanwellAvo12-25\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HanwellAvo12_25-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hanwell, M.D.; Curtis, D.E.; Lonie, D.C. et al. (2012). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3542060\" data-key=\"bfd41b3cff27981bc383dfbc651b1df7\">\"Avogadro: An advanced semantic chemical editor, visualization, and analysis platform\"<\/a>. <i>Journal of Cheminformatics<\/i> <b>4<\/b> (1): 17. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1186%2F1758-2946-4-17\" data-key=\"97138de3fb1676150eca85b7edc982eb\">10.1186\/1758-2946-4-17<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3542060\/\" data-key=\"a243f4fe06dc4ae9037a000536ba993b\">PMC3542060<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22889332\" data-key=\"733096d01fc2887abaa1711224e57fb2\">22889332<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3542060\" data-key=\"bfd41b3cff27981bc383dfbc651b1df7\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3542060<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Avogadro%3A+An+advanced+semantic+chemical+editor%2C+visualization%2C+and+analysis+platform&rft.jtitle=Journal+of+Cheminformatics&rft.aulast=Hanwell%2C+M.D.%3B+Curtis%2C+D.E.%3B+Lonie%2C+D.C.+et+al.&rft.au=Hanwell%2C+M.D.%3B+Curtis%2C+D.E.%3B+Lonie%2C+D.C.+et+al.&rft.date=2012&rft.volume=4&rft.issue=1&rft.pages=17&rft_id=info:doi\/10.1186%2F1758-2946-4-17&rft_id=info:pmc\/PMC3542060&rft_id=info:pmid\/22889332&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3542060&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RysavyDIVE14-26\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RysavyDIVE14_26-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Rysavy, S.J.; Bromley, D.; Daggett, V. (2014). \"DIVE: A Graph-Based Visual-Analytics Framework for Big Data\". <i>IEEE Computer Graphics and Applications<\/i> <b>34<\/b> (2): 26\u201337. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FMCG.2014.27\" data-key=\"30989dd4914b7a3583340f0aeb78cc22\">10.1109\/MCG.2014.27<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DIVE%3A+A+Graph-Based+Visual-Analytics+Framework+for+Big+Data&rft.jtitle=IEEE+Computer+Graphics+and+Applications&rft.aulast=Rysavy%2C+S.J.%3B+Bromley%2C+D.%3B+Daggett%2C+V.&rft.au=Rysavy%2C+S.J.%3B+Bromley%2C+D.%3B+Daggett%2C+V.&rft.date=2014&rft.volume=34&rft.issue=2&rft.pages=26%E2%80%9337&rft_id=info:doi\/10.1109%2FMCG.2014.27&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RzepaCML12-27\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RzepaCML12_27-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Rzepa, H. (2012). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.xml-cml.org\/\" data-key=\"e125c971a8a7064cafe041dbded3efce\">\"Chemical Markup Language\"<\/a>. CMLC<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.xml-cml.org\/\" data-key=\"e125c971a8a7064cafe041dbded3efce\">http:\/\/www.xml-cml.org\/<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Chemical+Markup+Language&rft.atitle=&rft.aulast=Rzepa%2C+H.&rft.au=Rzepa%2C+H.&rft.date=2012&rft.pub=CMLC&rft_id=http%3A%2F%2Fwww.xml-cml.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HuangTheSPHINX93-28\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-HuangTheSPHINX93_28-0\">28.0<\/a><\/sup> <sup><a href=\"#cite_ref-HuangTheSPHINX93_28-1\">28.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Huang, X.; Alleva, F.; Hsiao-Wuen, H. et al. (1993). \"The SPHINX-II speech recognition system: An overview\". <i>Computer Speech & Language<\/i> <b>7<\/b> (2): 137\u2013148. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1006%2Fcsla.1993.1007\" data-key=\"baf52418c3699b18a3cf4e9ba3805b7d\">10.1006\/csla.1993.1007<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+SPHINX-II+speech+recognition+system%3A+An+overview&rft.jtitle=Computer+Speech+%26+Language&rft.aulast=Huang%2C+X.%3B+Alleva%2C+F.%3B+Hsiao-Wuen%2C+H.+et+al.&rft.au=Huang%2C+X.%3B+Alleva%2C+F.%3B+Hsiao-Wuen%2C+H.+et+al.&rft.date=1993&rft.volume=7&rft.issue=2&rft.pages=137%E2%80%93148&rft_id=info:doi\/10.1006%2Fcsla.1993.1007&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GenestAPlat98-29\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GenestAPlat98_29-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Genest, D.; Salvat, E. (1998). \"A platform allowing typed nested graphs: How CoGITo became CoGITaNT\". <i>Proceedings from the 1998 International Conference on Conceptual Structures<\/i>: 1154\u201361. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2FBFb0054912\" data-key=\"803204ef46d2b99b08b6396cfaba9310\">10.1007\/BFb0054912<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+platform+allowing+typed+nested+graphs%3A+How+CoGITo+became+CoGITaNT&rft.jtitle=Proceedings+from+the+1998+International+Conference+on+Conceptual+Structures&rft.aulast=Genest%2C+D.%3B+Salvat%2C+E.&rft.au=Genest%2C+D.%3B+Salvat%2C+E.&rft.date=1998&rft.pages=1154%E2%80%9361&rft_id=info:doi\/10.1007%2FBFb0054912&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DennemontUneAss13-30\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DennemontUneAss13_30-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Dennemont, Y. (2013). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.semanticscholar.org\/paper\/Une-assistance-%C3%A0-l'interaction-3D-en-r%C3%A9alit%C3%A9-par-un-Dennemont\/254289782f5feb44e0a0db19ea2f7661578241a1\" data-key=\"06fb3db4eb8083d7e6d9747d121676b4\">\"Une assistance \u00e0 l'interaction 3D en r\u00e9alit\u00e9 virtuelle par un raisonnement s\u00e9mantique et une conscience du contexte\"<\/a>. Allen Institute for Artificial Intelligence<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.semanticscholar.org\/paper\/Une-assistance-%C3%A0-l'interaction-3D-en-r%C3%A9alit%C3%A9-par-un-Dennemont\/254289782f5feb44e0a0db19ea2f7661578241a1\" data-key=\"06fb3db4eb8083d7e6d9747d121676b4\">https:\/\/www.semanticscholar.org\/paper\/Une-assistance-%C3%A0-l'interaction-3D-en-r%C3%A9alit%C3%A9-par-un-Dennemont\/254289782f5feb44e0a0db19ea2f7661578241a1<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Une+assistance+%C3%A0+l%27interaction+3D+en+r%C3%A9alit%C3%A9+virtuelle+par+un+raisonnement+s%C3%A9mantique+et+une+conscience+du+contexte&rft.atitle=&rft.aulast=Dennemont%2C+Y.&rft.au=Dennemont%2C+Y.&rft.date=2013&rft.pub=Allen+Institute+for+Artificial+Intelligence&rft_id=https%3A%2F%2Fwww.semanticscholar.org%2Fpaper%2FUne-assistance-%25C3%25A0-l%27interaction-3D-en-r%25C3%25A9alit%25C3%25A9-par-un-Dennemont%2F254289782f5feb44e0a0db19ea2f7661578241a1&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AbrahamGROMACS15-31\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AbrahamGROMACS15_31-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Abraham, M.J.; Murtola, T.; Schulz, R. et al. (2015). \"GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers\". <i>SoftwareX<\/i> <b>1\u20132<\/b>: 19\u201325. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.softx.2015.06.001\" data-key=\"80259c4e5c997860750c60dce803e852\">10.1016\/j.softx.2015.06.001<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GROMACS%3A+High+performance+molecular+simulations+through+multi-level+parallelism+from+laptops+to+supercomputers&rft.jtitle=SoftwareX&rft.aulast=Abraham%2C+M.J.%3B+Murtola%2C+T.%3B+Schulz%2C+R.+et+al.&rft.au=Abraham%2C+M.J.%3B+Murtola%2C+T.%3B+Schulz%2C+R.+et+al.&rft.date=2015&rft.volume=1%E2%80%932&rft.pages=19%E2%80%9325&rft_id=info:doi\/10.1016%2Fj.softx.2015.06.001&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DrorBiomol12-32\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DrorBiomol12_32-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Dror, R.O.; Dirks, R.M.; Grossman, J.P. et al. (2012). \"Biomolecular simulation: A computational microscope for molecular biology\". <i>Annual Review of Biophysics<\/i> <b>41<\/b>: 429\u201352. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1146%2Fannurev-biophys-042910-155245\" data-key=\"8ac6b849435c25408349fa8eaf335c6d\">10.1146\/annurev-biophys-042910-155245<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22577825\" data-key=\"87178ed1ed0408f58e31ed6cc1e8354e\">22577825<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Biomolecular+simulation%3A+A+computational+microscope+for+molecular+biology&rft.jtitle=Annual+Review+of+Biophysics&rft.aulast=Dror%2C+R.O.%3B+Dirks%2C+R.M.%3B+Grossman%2C+J.P.+et+al.&rft.au=Dror%2C+R.O.%3B+Dirks%2C+R.M.%3B+Grossman%2C+J.P.+et+al.&rft.date=2012&rft.volume=41&rft.pages=429%E2%80%9352&rft_id=info:doi\/10.1146%2Fannurev-biophys-042910-155245&rft_id=info:pmid\/22577825&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AnnettHier03-33\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AnnettHier03_33-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Annett, J. (2003). \"Hierarchical Task Analysis\". In Hollnagel, E.. <i>Handbook of Cognitive Task Design<\/i>. <b>1<\/b> (1st ed.). pp. 17\u201335. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780805840032.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Hierarchical+Task+Analysis&rft.atitle=Handbook+of+Cognitive+Task+Design&rft.aulast=Annett%2C+J.&rft.au=Annett%2C+J.&rft.date=2003&rft.volume=1&rft.pages=pp.%26nbsp%3B17%E2%80%9335&rft.edition=1st&rft.isbn=9780805840032&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ChandlerImmersive15-34\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ChandlerImmersive15_34-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Chandler, T.; Cordell, M.; Czaudema, T. et al. (2015). \"Immersive Analytics\". <i>Proceedings of Big Data Visual Analytics 2015<\/i> <b>1<\/b>: 1\u20138. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FBDVA.2015.7314296\" data-key=\"7804444fa7bde09a5e31435c2552c154\">10.1109\/BDVA.2015.7314296<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Immersive+Analytics&rft.jtitle=Proceedings+of+Big+Data+Visual+Analytics+2015&rft.aulast=Chandler%2C+T.%3B+Cordell%2C+M.%3B+Czaudema%2C+T.+et+al.&rft.au=Chandler%2C+T.%3B+Cordell%2C+M.%3B+Czaudema%2C+T.+et+al.&rft.date=2015&rft.volume=1&rft.pages=1%E2%80%938&rft_id=info:doi\/10.1109%2FBDVA.2015.7314296&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Sommer3D17-35\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Sommer3D17_35-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sommer, B.; Barnes, D.G.; Boyd, S. et al. (2017). \"3D-Stereoscopic Immersive Analytics Projects at Monash University and University of Konstanz\". <i>Proceedings of Electronic Imaging, Stereoscopic Displays and Applications XXVIII<\/i>: 179\u201387, 189. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.2352%2FISSN.2470-1173.2017.5.SDA-109\" data-key=\"fd7c676c20e4ea195cac7761cdca550e\">10.2352\/ISSN.2470-1173.2017.5.SDA-109<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=3D-Stereoscopic+Immersive+Analytics+Projects+at+Monash+University+and+University+of+Konstanz&rft.jtitle=Proceedings+of+Electronic+Imaging%2C+Stereoscopic+Displays+and+Applications+XXVIII&rft.aulast=Sommer%2C+B.%3B+Barnes%2C+D.G.%3B+Boyd%2C+S.+et+al.&rft.au=Sommer%2C+B.%3B+Barnes%2C+D.G.%3B+Boyd%2C+S.+et+al.&rft.date=2017&rft.pages=179%E2%80%9387%2C+189&rft_id=info:doi\/10.2352%2FISSN.2470-1173.2017.5.SDA-109&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WiebuschDecoup15-36\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WiebuschDecoup15_36-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wiebusch, D.; Latoschik, M.E. (2015). \"Decoupling the entity-component-system pattern using semantic traits for reusable realtime interactive systems\". <i>IEEE 8th Workshop on Software Engineering and Architectures for Realtime Interactive Systems<\/i>: 25\u201332. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FSEARIS.2015.7854098\" data-key=\"f33abdd84f437fe6b605b618804e1ead\">10.1109\/SEARIS.2015.7854098<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Decoupling+the+entity-component-system+pattern+using+semantic+traits+for+reusable+realtime+interactive+systems&rft.jtitle=IEEE+8th+Workshop+on+Software+Engineering+and+Architectures+for+Realtime+Interactive+Systems&rft.aulast=Wiebusch%2C+D.%3B+Latoschik%2C+M.E.&rft.au=Wiebusch%2C+D.%3B+Latoschik%2C+M.E.&rft.date=2015&rft.pages=25%E2%80%9332&rft_id=info:doi\/10.1109%2FSEARIS.2015.7854098&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GuttierezSemantics05-37\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GuttierezSemantics05_37-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Gutierrez, M.; Vexo, F.; Thalmann, D. (2005). \"Semantics-based representation of virtual environments\". <i>International Journal of Computer Applications in Technology<\/i> <b>23<\/b> (2\u20134): 229\u201338. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1504%2FIJCAT.2005.006484\" data-key=\"e7ff71f95176a02417b24b3dbe3fd5e3\">10.1504\/IJCAT.2005.006484<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Semantics-based+representation+of+virtual+environments&rft.jtitle=International+Journal+of+Computer+Applications+in+Technology&rft.aulast=Gutierrez%2C+M.%3B+Vexo%2C+F.%3B+Thalmann%2C+D.&rft.au=Gutierrez%2C+M.%3B+Vexo%2C+F.%3B+Thalmann%2C+D.&rft.date=2005&rft.volume=23&rft.issue=2%E2%80%934&rft.pages=229%E2%80%9338&rft_id=info:doi\/10.1504%2FIJCAT.2005.006484&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DoutreligneUnity14-38\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DoutreligneUnity14_38-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Doutreligne, S.; Cragnolimi, T.; Pasquali, S. et al. (2014). \"UnityMol: Interactive scientific visualization for integrative biology\". <i>IEEE 4th Symposium on Large Data Analysis and Visualization<\/i>: 109\u201310. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FLDAV.2014.7013213\" data-key=\"5b7e727d883655fcac605eebae993025\">10.1109\/LDAV.2014.7013213<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=UnityMol%3A+Interactive+scientific+visualization+for+integrative+biology&rft.jtitle=IEEE+4th+Symposium+on+Large+Data+Analysis+and+Visualization&rft.aulast=Doutreligne%2C+S.%3B+Cragnolimi%2C+T.%3B+Pasquali%2C+S.+et+al.&rft.au=Doutreligne%2C+S.%3B+Cragnolimi%2C+T.%3B+Pasquali%2C+S.+et+al.&rft.date=2014&rft.pages=109%E2%80%9310&rft_id=info:doi\/10.1109%2FLDAV.2014.7013213&rfr_id=info:sid\/en.wikipedia.org:Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. Some grammar and punctuation was cleaned up to improve readability. In some cases important information was missing from the references, and that information was added. The original references after 27 were slightly out of order in the original; due to the way this wiki works, references are listed in the order they appear. The original shows a reference [33] for <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25845770\" data-key=\"75b44910bd31ca34f53e3be70871850e\">Perilla <i>et al.<\/i><\/a>, but no inline citation for 33 exists anywhere in the text; it has been omitted for this version. Footnotes were turned into inline URLs. Figure 5 and 9 is shown in the original, but no reference was made to them in the text; a presumption was made where to put the inline reference for each figure for this version. Nothing else was changed in accordance with the NoDerivatives portion of the license.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185655\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.917 seconds\nReal time usage: 0.966 seconds\nPreprocessor visited node count: 28960\/1000000\nPreprocessor generated node count: 37990\/1000000\nPost\u2010expand include size: 213220\/2097152 bytes\nTemplate argument size: 69687\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 887.192 1 - -total\n 87.39% 775.276 1 - Template:Reflist\n 76.40% 677.777 38 - Template:Citation\/core\n 64.51% 572.328 29 - Template:Cite_journal\n 11.35% 100.700 7 - Template:Cite_web\n 7.69% 68.256 50 - Template:Citation\/identifier\n 6.90% 61.257 1 - Template:Infobox_journal_article\n 6.64% 58.870 1 - Template:Infobox\n 5.17% 45.863 2 - Template:Cite_book\n 4.22% 37.436 39 - Template:Citation\/make_link\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10931-0!*!0!!en!5!* and timestamp 20190401185654 and revision id 35148\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data\">https:\/\/www.limswiki.org\/index.php\/Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","6ee24d5f7bd1af8e24033922d437ffd0_images":["https:\/\/www.limswiki.org\/images\/4\/41\/Fig1_Trellet_JOfIntegBioinfo2018_15-2.jpg","https:\/\/www.limswiki.org\/images\/d\/df\/Fig2_Trellet_JOfIntegBioinfo2018_15-2.jpg","https:\/\/www.limswiki.org\/images\/a\/af\/Fig3_Trellet_JOfIntegBioinfo2018_15-2.jpg","https:\/\/www.limswiki.org\/images\/b\/bb\/Fig4_Trellet_JOfIntegBioinfo2018_15-2.jpg","https:\/\/www.limswiki.org\/images\/1\/13\/Fig5_Trellet_JOfIntegBioinfo2018_15-2.jpg","https:\/\/www.limswiki.org\/images\/1\/1d\/Fig6_Trellet_JOfIntegBioinfo2018_15-2.jpg","https:\/\/www.limswiki.org\/images\/c\/c0\/Fig7_Trellet_JOfIntegBioinfo2018_15-2.jpg","https:\/\/www.limswiki.org\/images\/e\/e3\/Fig8_Trellet_JOfIntegBioinfo2018_15-2.jpg","https:\/\/www.limswiki.org\/images\/e\/e0\/Fig9_Trellet_JOfIntegBioinfo2018_15-2.jpg","https:\/\/www.limswiki.org\/images\/0\/03\/Fig10_Trellet_JOfIntegBioinfo2018_15-2.jpg"],"6ee24d5f7bd1af8e24033922d437ffd0_timestamp":1554145014,"804be563fdd6e10a6921069440e3e962_type":"article","804be563fdd6e10a6921069440e3e962_title":"A view of programming scalable data analysis: From clouds to exascale (Talia 2019)","804be563fdd6e10a6921069440e3e962_url":"https:\/\/www.limswiki.org\/index.php\/Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale","804be563fdd6e10a6921069440e3e962_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:A view of programming scalable data analysis: From clouds to exascale\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nA view of programming scalable data analysis: From clouds to exascaleJournal\n \nJournal of Cloud ComputingAuthor(s)\n \nTalia, DomenicoAuthor affiliation(s)\n \nDIMES at Universit\u00e0 della CalabriaPrimary contact\n \nEmail: talia at dimes dot unical dot itYear published\n \n2019Volume and issue\n \n8Page(s)\n \n4DOI\n \n10.1186\/s13677-019-0127-xISSN\n \n2192-113XDistribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttps:\/\/link.springer.com\/article\/10.1186\/s13677-019-0127-xDownload\n \nhttps:\/\/link.springer.com\/content\/pdf\/10.1186%2Fs13677-019-0127-x.pdf (PDF)\n\n\n\n\n \n This article contains rendered mathematical formulae. You may require the Math Anywhere plugin for Chrome or the Native MathML add-on and fonts for Firefox if they don't render properly for you. \n\n\nContents\n\n1 Abstract \n2 Introduction \n3 Data analysis on cloud computing platforms \n\n3.1 Cloud-based data analysis tools \n\n\n4 Exascale and big data analysis \n\n4.1 Extreme data sources and scientific computing \n4.2 Programming model features for exascale data analysis \n\n\n5 Exascale programming systems \n6 Exascale programming systems comparison \n\n6.1 Summary of data security issues \n\n\n7 Requirements of exascale runtime for data analysis \n8 Concluding remarks and future work \n9 Abbreviations \n10 Appendix \n\n10.1 Scalability in parallel systems \n\n\n11 Acknowledgements \n\n11.1 Funding \n11.2 Availability of data and materials \n11.3 Authors\u2019 contributions \n11.4 Competing interests \n\n\n12 References \n13 Notes \n\n\n\nAbstract \nScalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media, sensor networks, smartphones, and the internet. Scalable big data analysis today can be achieved by parallel implementations that are able to exploit the computing and storage facilities of high-performance computing (HPC) systems and cloud computing systems, whereas in the near future exascale systems will be used to implement extreme-scale data analysis. Here is discussed how cloud computing currently supports the development of scalable data mining solutions and what the main challenges to be addressed and solved for implementing innovative data analysis applications on exascale systems currently are.\nKeywords: big data analysis, cloud computing, exascale computing, data mining, parallel programming, scalability\n\nIntroduction \nSolving problems in science and engineering was the first motivation for inventing computers. Much later, computer science remains the main area in which innovative solutions and technologies are being developed and applied. Also due to the extraordinary advancement of computer technology, nowadays data are generated as never before. In fact, the amount of structured and unstructured digital data is going to increase beyond any estimate. Databases, file systems, data streams, social media, and data repositories are increasingly pervasive and decentralized.\nAs the data scale increases, we must address new challenges and attack ever-larger problems. New discoveries will be achieved and more accurate investigations can be carried out due to the increasingly widespread availability of large amounts of data. Scientific sectors that fail to make full use of the volume of digital data available today risk losing out on the significant opportunities that big data can offer.\nTo benefit from big data availability, specialists and researchers need advanced data analysis tools and applications running on scalable architectures allowing for the extraction of useful knowledge from such huge data sources. High-performance computing (HPC) systems and cloud computing systems today are capable platforms for addressing both the computational and data storage needs of big data mining and parallel knowledge discovery applications. These computing architectures are needed to run data analysis because complex data mining tasks involve data- and compute-intensive algorithms that require large, reliable, and effective storage facilities together with high-performance processors to obtain results in a timely fashion.\nNow that data sources have become pervasively huge, reliable and effective programming tools and applications for data analysis are needed to extract value and find useful insights in them. New ways to correctly and proficiently compose different distributed models and paradigms are required, and interaction between hardware resources and programming levels must be addressed. Users, professionals, and scientists working in the area of big data need advanced data analysis programming models and tools coupled with scalable architectures to support the extraction of useful information from such massive repositories. The scalability of a parallel computing system is a measure of its capacity to reduce program execution time in proportion to the number of its processing elements. (The appendix of this article introduces and discusses in detail scalability in parallel systems.) According to scalability definition, scalable data analysis refers to the ability of a hardware\/software parallel system to exploit increasing computing resources effectively in the analysis of (very) large datasets.\nToday, complex analysis of real-world massive data sources requires using high-performance computing systems such as massively parallel machines or clouds. However in the next years, as parallel technologies advance, exascale computing systems will be exploited for implementing scalable big data analysis in all areas of science and engineering.[1] To reach this goal, new design and programming challenges must be addressed and solved. As such, the focus of this paper is on discussing current cloud-based designing and programming solutions for data analysis and suggesting new programming requirements and approaches to be conceived for meeting big data analysis challenges on future exascale platforms.\nCurrent cloud computing platforms and parallel computing systems represent two different technological solutions for addressing the computational and data storage needs of big data mining and parallel knowledge discovery applications. Indeed, parallel machines offer high-end processors with the main goal to support HPC applications, whereas cloud systems implement a computing model in which dynamically scalable virtualized resources are provided to users and developers as a service over the internet. In fact, clouds do not mainly target HPC applications; they represent scalable computing and storage delivery platforms that can be adapted to the needs of different classes of people and organizations by exploiting a service-oriented architecture (SOA) approach. Clouds offer large facilities to many users who were unable to own their parallel\/distributed computing systems to run applications and services. In particular, big data analysis applications requiring access and manipulating very large datasets with complex mining algorithms will significantly benefit from the use of cloud platforms.\nAlthough not many cloud-based data analysis frameworks are available today for end users, within a few years they will become common.[2] Some current solutions are based on open-source systems, such as Apache Hadoop and Mahout, Spark, and SciDB, while others are proprietary solutions provided by companies such as Google, Microsoft, EMC, Amazon, BigML, Splunk Hunk, and InsightsOne. As more such platforms emerge, researchers and professionals will port increasingly powerful data mining programming tools and frameworks to the cloud to exploit complex and flexible software models such as the distributed workflow paradigm. The growing utilization of the service-oriented computing model could accelerate this trend.\nFrom the definition of the term \"big data,\" which refers to datasets so large and complex that traditional hardware and software data processing solutions are inadequate to manage and analyze, we can infer that conventional computer systems are not so powerful to process and mine big data[3], and they are not able to scale with the size of problems to be solved. As mentioned before, to face with limits of sequential machines, advanced systems like HPC, cloud computing, and even more scalable architectures are used today to analyze big data. Starting from this scenario, exascale computing systems will represent the next computing step.[4][5] Exascale systems refers to high-performance computing systems capable of at least one exaFLOPS, so their implementation represents a significant research and technology challenge. Their design and development is currently under investigation with the goal of building by 2020 high-performance computers composed of a very large number of multi-core processors expected to deliver a performance of 1018 operations per second. Cloud computing systems used today are able to store very large amounts of data; however, they do not provide the high performance expected from massively parallel exascale systems. This is the main motivation for developing exascale systems. Exascale technology will represent the most advanced model of supercomputers. They have been conceived for single-site supercomputing centers, not for distributed infrastructures that could use multi-clouds or fog computing systems for decentralizing computing and pervasive data management, and later be interconnected with exascale systems that could be used as a backbone for very large scale data analysis.\nThe development of exascale systems spurs a need to address and solve issues and challenges at both the hardware and software level. Indeed, it requires the design and implementation of novel software tools and runtime systems able to manage a high degree of parallelism, reliability, and data locality in extreme scale computers.[6] Needed are new programming constructs and runtime mechanisms able to adapt to the most appropriate parallelism degree and communication decomposition for making scalable and reliable data analysis tasks. Their dependence on parallelism grain size and data analysis task decomposition must be deeply studied. This is needed because parallelism exploitation depends on several features like parallel operations, communication overhead, input data size, I\/O speed, problem size, and hardware configuration. Moreover, reliability and reproducibility are two additional key challenges to be addressed. At the programming level, constructs for handling and recovering communication, data access, and computing failures must be designed. At the same time, reproducibility in scalable data analysis asks for rich information useful to assure similar results on environments that may dynamically change. All these factors must be taken into account in designing data analysis applications and tools that will be scalable on exascale systems.\nMoreover, reliable and effective methods for storing, accessing, and communicating data; intelligent techniques for massive data analysis; and software architectures enabling the scalable extraction of knowledge from data are needed.[3] To reach this goal, models and technologies enabling cloud computing systems and HPC architectures must be extended\/adapted or completely changed to be reliable and scalable on the very large number of processors\/cores that compose extreme scale platforms and for supporting the implementation of clever data analysis algorithms that ought to be scalable and dynamic in resource usage. Exascale computing infrastructures will play the role of an extraordinary platform for addressing both the computational and data storage needs of big data analysis applications. However, as mentioned before, to have a complete scenario, efforts must be performed for implementing big data analytics algorithms, architectures, programming tools, and applications in exascale systems.[7]\nPursuing this objective within a few years, scalable data access and analysis systems will become the most used platforms for big data analytics on large-scale clouds. In the long term, new exascale computing infrastructures will appear as viable platforms for big data analytics in the next decades, and data mining algorithms, tools, and applications will be ported on such platforms for implementing extreme data discovery solutions.\nIn this paper we first discuss cloud-based scalable data mining and machine learning solutions, then we examine the main research issues that must be addressed for implementing massively parallel data mining applications on exascale computing systems. Data-related issues are discussed together with communication, multi-processing, and programming issues. We then introduce issues and systems for scalable data analysis on clouds and then discuss design and programming issues for big data analysis in exascale systems. We close by outlining some open design challenges.\n\nData analysis on cloud computing platforms \nCloud computing platforms implement elastic services, scalable performance, and scalable data storage used by a large and everyday increasing number of users and applications.[8][9] In fact, cloud platforms have enlarged the arena of distributed computing systems by providing advanced internet services that complement and complete functionalities of distributed computing provided by the internet, grid systems, and peer-to-peer networks. In particular, most cloud computing applications use big data repositories stored within the cloud itself, so in those cases large datasets are analyzed with low latency to effectively extract data analysis models.\n\"Big data\" is a new and overused term that refers to massive, heterogeneous, and often unstructured digital content that is difficult to process using traditional data management tools and techniques. The term includes the complexity and variety of data and data types, real-time data collection and processing needs, and the value that can be obtained by smart analytics. However, we should recognize that data are not necessarily important per se but they become very important if we are able to extract value from them\u2014that is if we can exploit them to make discoveries. The extraction of useful knowledge from big digital datasets requires smart and scalable analytics algorithms, services, programming tools, and applications. All these tools require insights into big data to make them more useful for people.\nThe growing use of service-oriented computing is accelerating the use of cloud-based systems for scalable big data analysis. Developers and researchers are adopting the three main cloud models\u2014software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS)\u2014to implement big data analytics solutions in the cloud.[10][11] According to a specialization of these three models, data analysis tasks and applications can be offered as services at the software, platform, or infrastructure level and made available every time from anywhere. A methodology for implementing them defines a new model stack to deliver data analysis solutions that are a specialization of the XaaS (everything as a service) stack and is called \"data analysis as a service\" (DAaaS). It adapts and specifies the three general service models (SaaS, PaaS, and IaaS) for supporting the structured development of big data analysis systems, tools, and applications according to a service-oriented approach. The DAaaS methodology is then based on the three basic models for delivering data analysis services at different levels as described here (see also Fig. 1):\n\n Data analysis infrastructure as a service (DAIaaS): This model provides a set of hardware\/software virtualized resources that developers can assemble and use as an integrated infrastructure where storing large datasets, running data mining applications, and\/or implementing data analytics systems from scratch;\n Data analysis platform as a service (DAPaaS): This model defines a supporting software platform that developers can use for programming and running their data analytics applications or extending existing ones without worrying about the underlying infrastructure or specific distributed architecture issues; and\n Data analysis software as a service (DASaaS): This is a higher-level model that offers to end users data mining algorithms, data analysis suites, or ready-to-use knowledge discovery applications as internet services that can be accessed and used directly through a web browser. According to this approach, all data analysis software is provided as a service, leaving end users without having to worry about implementation and execution details.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. The three models of the DAaaS software methodology. The DAaaS software methodology is based on three basic models for delivering data analysis services at different levels (application, platform, and infrastructure). The DAaaS methodology defines a new model stack to deliver data analysis solutions that are a specialization of the XaaS (everything as a service) stack and is called \"data analysis as a service\" (DAaaS). It adapts and specifies the three general service models (SaaS, PaaS, and SaaS) for supporting the structured development of big data analysis systems, tools, and applications according to a service-oriented approach.\n\n\n\nCloud-based data analysis tools \nUsing the DASaaS methodology, we designed a cloud-based system, the Data Mining Cloud Framework (DMCF)[12] which supports three main classes of data analysis and knowledge discovery applications:\n\n Single-task applications, in which a single data mining task such as classification, clustering, or association rules discovery is performed on a given dataset;\n Parameter-sweeping applications, in which a dataset is analyzed by multiple instances of the same data mining algorithm with different parameters; and\n Workflow-based applications, in which knowledge discovery applications are specified as graphs linking together data sources, data mining tools, and data mining models.\nDMCF includes a large variety of processing patterns to express knowledge discovery workflows as graphs whose nodes denote resources (datasets, data analysis tools, mining models) and whose edges denote dependencies among resources. A web-based user interface allows users to compose their applications and submit them for execution to the cloud platform, following the data analysis software as a service approach. Visual workflows can be programmed in DMCF through a language called VL4Cloud (Visual Language for Cloud), whereas script-based workflows can be programmed by JS4Cloud (JavaScript for Cloud), a JavaScript-based language for data analysis programming.\nFigure 2 shows a sample data mining workflow composed of several sequential and parallel steps. It is just an example for presenting the main features of the VL4Cloud programming interface.[12] The example workflow analyses a dataset by using n instances of a classification algorithm, which work on n portions of the training set and generate the same number of knowledge models. By using the n generated models and the test set, n classifiers produce in parallel n classified datasets (n classifications). In the final step of the workflow, a voter generates the final classification by assigning a class to each data item, by choosing the class predicted by the majority of the models.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. A parallel classification workflow designed by the VL4Cloud programming interface. The figure shows a workflow designed by the VL4Cloud programming interface during its execution. The workflow implements a parallel classification application. Tasks\/services included in square bracket are executed in parallel. The results produced by classifiers are selected by a voter task that produces the final classification.\n\n\n\nAlthough DMCF has been mainly designed to coordinate coarse grain data and task parallelism in big data analysis applications by exploiting the workflow paradigm, the DMCF script-based programming interface (JS4Cloud) allows also for parallelizing fine-grain operations in data mining algorithms, as it permits to program in a JavaScript style any data mining algorithm, such as classification, clustering, and others. This can be done because loops and data parallel methods are run in parallel on the virtual machines of a cloud.[13][14]\nLike DMCF, other innovative cloud-based systems designed for programming data analysis applications include Apache Spark, Sphere, Swift, Mahout, and CloudFlows. Most of them are open-source. Apache Spark is an open-source framework developed at University of California, Berkeley for in-memory data analysis and machine learning.[5] Spark has been designed to run both batch processing and dynamic applications like streaming, interactive queries, and graph analysis. Spark provides developers with a programming interface centered on a data structure called the \"resilient distributed dataset\" (RDD) that represents a read-only multi-set of data items distributed over a cluster of machines maintained in a fault-tolerant way. Differently from other systems and from Hadoop, Spark stores data in memory and queries it repeatedly so as to obtain better performance. This feature can be useful for a future implementation of Spark on exascale systems.\nSwift is a workflow-based framework for implementing functional data-driven task parallelism in data-intensive applications. The Swift language provides a functional programming paradigm where workflows are designed as a set of calls with associated command-line arguments and input and output files. Swift uses an implicit data-driven task parallelism.[15] In fact, it looks like a sequential language, but being a dataflow language, all variables are futures, thus execution is based on data availability. Parallelism can be also exploited through the use of the foreach statement. Swift\/T is a new implementation of the Swift language for high-performance computing. In this implementation, a Swift program is translated into an MPI program that uses the Turbine and ADLB runtime libraries for scalable dataflow processing over MPI. Recently, a porting of Swift\/T on large cloud systems for the execution of numerous tasks has been investigated.\nDMCF, differently from the other frameworks discussed here, it is the only system that offers both a visual and a script-based programming interface. Visual programming is a very convenient design approach for high-level users, like domain-expert analysts having a limited understanding of programming. On the other hand, script-based workflows are a useful paradigm for expert programmers who can code complex applications rapidly, in a more concise way and with greater flexibility. Finally, the workflow-based model exploited in DMCF and Swift make these frameworks of more general use with respect to Spark, which offers a very restricted set of programming patterns (e.g., map, filter, and reduce), so limiting the variety of data analysis applications that can be implemented with it.\nThese and other related systems are currently used for the development of big data analysis applications on HPC and cloud platforms. However, additional research in this field must be done and the development of new models, solutions, and tools is needed.[7][16] Just to mention a few, active and promising research topics are listed here, ordered by importance:\n1. Programming models for big data analytics: New abstract programming models and constructs hiding the system complexity are needed for big data analytics tools. The MapReduce model and workflow models are often used on HPC and cloud implementations, but more research effort is needed to develop other scalable, adaptive, general-purpose higher-level models and tools. Research in this area is even more important for exascale systems; in the next section we will discuss some of these topics in exascale computing.\n2. Reliability in scalable data analysis: As the number of processing elements increases, reliability of systems and applications decreases, and therefore mechanisms for detecting and handling hardware and software faults are needed. Although Fekete et al.[17] have proven that no reliable communication protocol can tolerate crashes of processors on which the protocol runs, some ways in which systems cope with the impossibility result can be found. Among them, at the programming level it is necessary to design constructs for handling communication, data access, and computing failures and for recovering from them. Programming models, languages, and APIs must provide general and data-oriented mechanisms for failure detection and isolation, preventing an entire application from failing and assuring its completion. Reliability is a much more important issue in the exascale domain, where the number of processing elements is massive and fault occurrence increases, making detection and recovering vital.\n3. Application reproducibility: Reproducibility is another open research issue for designers of complex applications running on parallel systems. Reproducibility in scalable data analysis must, for example, face with data communication, data parallel manipulation, and dynamic computing environments. Reproducibility demands that current data analysis frameworks (like those based on MapReduce and on workflows) and the future ones, especially those implemented on exascale systems, must provide additional information and knowledge on how data are managed, on algorithm characteristics, and on configuration of software and execution environments.\n4. Data and tool integration and openness: Code coordination and data integration are main issues in large-scale applications that use data and computing resources. Standard formats, data exchange models, and common application programming interfaces (APIs) are needed to support interoperability and ease cooperation among design teams using different data formats and tools.\n5. Interoperability of big data analytics frameworks: The service-oriented paradigm allows running large-scale distributed applications on cloud heterogeneous platforms along with software components developed using different programming languages or tools. Cloud service paradigms must be designed to allow worldwide integration of multiple data analytics frameworks.\n\nExascale and big data analysis \nAs we discussed in the previous sections, data analysis gained a primary role because of the very large availability of datasets and the continuous advancement of methods and algorithms for finding knowledge in them. Data analysis solutions advance by exploiting the power of data mining and machine learning techniques and are changing several scientific and industrial areas. For example, the amount of data that social media daily generate is impressive and continuous. Some hundreds of terabyte of data, including several hundreds of millions of photos, are uploaded daily to Facebook and Twitter.\nTherefore it is central to design scalable solutions for processing and analyzing such massive datasets. As a general forecast, IDC experts estimate data generated to reach about 45 zettabytes worldwide by 2020.[18] This impressive amount of digital data asks for scalable high-performance data analysis solutions. However, today only one-quarter of digital data available would be a candidate for analysis, and about five percent of that is actually analyzed. By 2020, the useful percentage could grow to about 35 percent, thanks to data mining technologies.\n\nExtreme data sources and scientific computing \nScalability and performance requirements are challenging conventional data storage, file systems, and database management systems. Architectures of such systems have reached limits in handling extremely large processing tasks involving petabytes of data because they have not been built for scaling after a given threshold. New architectures and analytics platform solutions that must process big data for extracting complex predictive and descriptive models have become necessary.[19] Exascale systems, both from the hardware and the software side, can play a key role in supporting solutions to these problems.[1]\nAn IBM study reports that we are generating around 2.5 exabytes of data per day.[20] Because of that continuous and explosive growth of data, many applications require the use of scalable data analysis platforms. A well-known example is the ATLAS detector from the Large Hadron Collider at CERN in Geneva. The ATLAS infrastructure has a capacity of 200\u2009PB of disk space and 300,000 processor cores, with more than 100 computing centers connected via 10 Gbps links. The data collection rate is massive, and only a portion of the data produced by the collider is stored. Several teams of scientists run complex applications to analyze subsets of those huge volumes of data. This analysis would be impossible without a high-performance infrastructure that supports data storage, communication, and processing. Also computational astronomers are collecting and producing increasingly larger datasets each year that without scalable infrastructures cannot be stored and processed. Another significant case is represented by the Energy Sciences Network (ESnet), the U.S. Department of Energy\u2019s high-performance network managed by Berkeley Lab that in late 2012 rolled out a 100 gigabits-per-second national network to accommodate the growing scale of scientific data.\nIf we go from science to society, social data and eHealth are good examples to discuss. Social networks, such as Facebook and Twitter, have become very popular and are receiving increasing attention from the research community because of the huge amount of user-generated data, which provide valuable information concerning human behavior, habits, and travel. When the volume of data to be analyzed is of the order of terabytes or petabytes (billions of tweets or posts), scalable storage and computing solutions must be used, but no clear solutions today exist for the analysis of exascale datasets. The same occurs in the eHealth domain, where huge amounts of patient data are available and can be used for improving therapies, for forecasting and tracking of health data, and for the management of hospitals and health centers. Very complex data analysis in this area will need novel hardware\/software solutions; however, exascale computing is still promising in other scientific fields where scalable storage and databases are not used\/required. Examples of scientific disciplines where future exascale computing will be extensively used are quantum chromodynamics, materials simulation, molecular dynamics, materials design, earthquake simulations, subsurface geophysics, climate forecasting, nuclear energy, and combustion. All those applications require the use of sophisticated models and algorithms to solve complex equation systems that will benefit from the exploitation of exascale systems.\n\nProgramming model features for exascale data analysis \nImplementing scalable data analysis applications in exascale computing systems is a complex job requiring high-level fine-grain parallel models, appropriate programming constructs, and skills in parallel and distributed programming. In particular, mechanisms and expertise are needed for expressing task dependencies and inter-task parallelism, for designing synchronization and load balancing mechanisms, handling failures, and properly managing distributed memory and concurrent communication among a very large number of tasks. Moreover, when the target computing infrastructures are heterogeneous and require different libraries and tools to program applications on them, the programming issues are even more complex. To cope with some of these issues in data-intensive applications, different scalable programming models have been proposed.[21]\nScalable programming models may be categorized by:\n\ni. Their level of abstraction, expressing high-level and low-level programming mechanisms, and\nii. How they allow programmers to develop applications, using visual or script-based formalisms.\nUsing high-level scalable models, a programmer defines only the high-level logic of an application while hiding the low-level details that are not essential for application design, including infrastructure-dependent execution details. A programmer is assisted in application definition, and application performance depends on the compiler that analyzes the application code and optimizes its execution on the underlying infrastructure. On the other hand, low-level scalable models allow programmers to interact directly with computing and storage elements composing the underlying infrastructure and thus define the application's parallelism directly.\nData analysis applications implemented by some frameworks can be programmed through a visual interface, which is a convenient design approach for high-level users, for instance domain-expert analysts having a limited understanding of programming. In addition, a visual representation of workflows or components intrinsically captures parallelism at the task level, without the need to make parallelism explicit through control structures.[6] Visual-based data analysis typically is implemented by providing workflow-based languages or component-based paradigms (Fig. 3). Dataflow-based approaches that share with workflows the same application structure are also used. However, in dataflow models, the grain of parallelism and the size of data items are generally smaller with respect to workflows. In general, visual programming tools are not very flexible because they often implement a limited set of visual patterns and provide restricted manners to configure them. For addressing this issue, some visual languages provide users with the possibility to customize the behavior of patterns by adding code that can specify operations that execute a specific pattern when an event occurs.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 3. Main visual and script-based programming models used today for data analysis programming\n\n\n\nOn the other hand, code-based (or script-based) formalism allows users to program complex applications more rapidly, in a more concise way, and with higher flexibility.[13] Script-based applications can be designed in different ways (see Fig. 3):\n\n Use complete language or a language extension that allows to express parallelism in applications, according to a general purpose or a domain-specific approach. This approach requires the design and implementation of a new parallel programming language or a complete set of data types and parallel constructs to be fully inserted in an existing language.\n Use annotations in the application code that allow the compiler to identify which instructions will be executed in parallel. According to this approach, parallel statements are separated from sequential constructs, and they are clearly identified in the program code because they are denoted by special symbols or keywords.\n Use a library in the application code that adds parallelism to the data analysis application. Currently this is the most-used approach since it is orthogonal to host languages. MPI and MapReduce are two well-known examples of this approach.\nGiven the variety of data analysis applications and classes of users (from skilled programmers to end users) that can be envisioned for future exascale systems, there is a need for scalable programming models with different levels of abstractions (high-level and low-level) and different design formalisms (visual and script-based), according to the classification outlined above.\nAs we discussed, data-intensive applications are software programs that have a significant need to process large volumes of data.[22] Such applications devote most of their processing time to running I\/O operations and exchanging and moving data among the processing elements of a parallel computing infrastructure. Parallel processing in data analysis applications typically involves accessing, pre-processing, partitioning, distributing, aggregating, querying, mining, and visualizing data that can be processed independently.\nThe main challenges for programming data analysis applications on exascale computing systems come from potential scalability; network latency and reliability; reproducibility of data analysis; and resilience of mechanisms and operations offered to developers for accessing, exchanging, and managing data. Indeed, processing extremely large data volumes requires operations and new algorithms able to scale in loading, storing, and processing massive amounts of data that generally must be partitioned in very small data grains, on which thousands to millions of simple parallel operations do analysis.\n\nExascale programming systems \nExascale systems force new requirements on programming systems to target platforms with hundreds of homogeneous and heterogeneous cores. Evolutionary models have been recently proposed for exascale programming that extend or adapt traditional parallel programming models like MPI (e.g., EPiGRAM[23] that uses a library-based approach, Open MPI for exascale in the ECP initiative), OpenMP (e.g., OmpSs[24] that exploits an annotation-based approach, the SOLLVE project), and MapReduce (e.g., Pig Latin[25] that implements a domain-specific complete language). These new frameworks limit the communication overhead in message passing paradigms or limit the synchronization control if a shared-memory model is used.[26]\nAs exascale systems are likely to be based on large distributed memory hardware, MPI is one of the most natural programming systems. MPI is currently used on over one million cores, and therefore it is reasonable to have MPI as one programming paradigm used on exascale systems. The same possibility occurs for MapReduce-based libraries that today are run on very large HPC and cloud systems. Both these paradigms are largely used for implementing big data analysis applications. As expected, general MPI all-to-all communication does not scale well in exascale environments; thus, to solve this issue new MPI releases introduced neighbor collectives to support sparse \u201call-to-some\u201d communication patterns that limit the data exchange on limited regions of processors.[26]\nEnsuring the reliability of exascale systems requires a holistic approach, including several hardware and software technologies for both predicting crashes and keeping systems stable despite failures. In the runtime of parallel APIs (like MPI and MapReduce-based libraries like Hadoop), a reliable communication layer must be provided if incorrect behavior in case of processor failure is to be mitigated. The lower unreliable layer is used by implementing a correct protocol that works safely with every implementation of the unreliable layer that cannot tolerate crashes of the processors on which it runs. Concerning MapReduce frameworks, reference[27] reports on an adaptive MapReduce framework, called P2P-MapReduce\u2014which has been developed to manage node churn, master node failures, and job recovery in a decentralized way\u2014provide a more reliable MapReduce middleware that can be effectively exploited in dynamic large-scale infrastructures.\nOn the other hand, new complete languages such as X10[28], ECL[29], UPC[30], Legion[31], and Chapel[32] have been defined by exploiting in them a data-centric approach. Furthermore, new APIs based on a revolutionary approach, such as GA[33] and SHMEM[34], have been implemented according to a library-based model. These novel parallel paradigms are devised to address the requirements of data processing using massive parallelism. In particular, languages such as X10, UPC, and Chapel and the GA library are based on a partitioned global address space (PGAS) memory model that is suited to implement data-intensive exascale applications because it uses private data structures and limits the amount of shared data among parallel threads.\nTogether with different approaches, such as Pig Latin and ECL, those programming models, languages, and APIs, must be further investigated, designed, and adapted, for providing data-centric scalable programming models useful in supporting the reliable and effective implementation of exascale data analysis applications composed of up to millions of computing units that process small data elements and exchange them with a very limited set of processing elements. PGAS-based models, data-flow and data-driven paradigms, and local-data approaches today represent promising solutions that could be used for exascale data analysis programming. The APGAS model is, for example, implemented in the X10 language, based on the notions of places and asynchrony. A place is an abstraction of shared, mutable data and worker threads operating on the data. A single APGAS computation can consist of hundreds or potentially tens of thousands of places. Asynchrony is implemented by a single block-structured control construct async. Given a statement ST, the construct async ST executes ST in a separate thread of control. Memory locations in one place can contain references to locations at other places. To compute upon data at another place, the following statement must be used:\n\r\n\n\n \n \n \n \n a\n t\n \n \n (\n p\n )\n \n \n S\n T\n \n \n \n {\\displaystyle {at}\\left(p\\right){ST}}\n \n \n\r\n\nThis allows the task to change its place of execution to p, execute ST at p and return, leaving behind tasks that may have been spawned during the execution of ST.\nAnother interesting language based on the PGAS model is Chapel.[32] Its locality mechanisms can be effectively used for scalable data analysis where light data mining (sub-)tasks are run on local processing elements and partial results must be exchanged. Chapel's data locality provides control over where data values are stored and where tasks execute so that developers can ensure parallel data analysis computations execute near the variables they access, or vice-versa for minimizing the communication and synchronization costs. For example, Chapel programmers can specify how domains and arrays are distributed among the system nodes. Another appealing feature in Chapel is the expression of synchronization in a data-centric style. By associating synchronization constructs with data (variables), locality is enforced and data-driven parallelism can be easily expressed also at large scale. In Chapel, \"locales\" and \"domains\" are abstractions for referring to machine resources and map tasks and data to them. Locales are language abstractions for naming a portion of a target architecture (e.g., a GPU, a single core, or a multicore node) that has processing and storage capabilities. A locale specifies where (on which processing node) to execute tasks\/statements\/operations. For example, in a system composed of four locales:\n\r\n\n\n \n \n \n \n \n c\n \n \n o\n \n \n n\n \n \n s\n \n \n t\n \n \n  \n \n L\n o\n c\n s\n \n :\n \n [\n 4\n ]\n \n \n \n l\n \n \n o\n \n \n c\n \n \n a\n \n \n l\n \n \n e\n \n \n \n \n {\\displaystyle {\\mathbf {c} \\mathbf {o} \\mathbf {n} \\mathbf {s} \\mathbf {t} }\\ {Locs}:\\left\\lbrack 4\\right\\rbrack {\\mathbf {l} \\mathbf {o} \\mathbf {c} \\mathbf {a} \\mathbf {l} \\mathbf {e} }}\n \n \n\r\n\nwe can use the following for executing the method Filter (D) on the first locale:\n\r\n\n\n \n \n \n \n \n o\n \n \n n\n \n \n  \n \n L\n o\n c\n s\n \n \n [\n 0\n ]\n \n  \n \n \n d\n \n \n o\n \n \n \n F\n i\n l\n t\n e\n r\n \n \n (\n D\n )\n \n \n \n {\\displaystyle {\\mathbf {o} \\mathbf {n} }\\ {Locs}\\left\\lbrack 0\\right\\rbrack \\ {\\mathbf {d} \\mathbf {o} }{Filter}\\left(D\\right)}\n \n .\n\r\n\nAnd to execute the K-means() algorithm on the four locales, we can use:\n\r\n\n\n \n \n \n \n \n f\n \n \n o\n \n \n r\n \n \n a\n \n \n l\n \n \n l\n \n \n \n l\n c\n \n \n \n i\n \n \n n\n \n \n  \n \n L\n o\n c\n s\n \n \n (\n i\n )\n \n \n \n d\n \n \n o\n \n \n  \n \n \n o\n \n \n n\n \n \n \n l\n c\n \n \n \n d\n \n \n o\n \n \n \n K\n m\n e\n a\n n\n s\n \n \n (\n )\n \n \n \n {\\displaystyle {\\mathbf {f} \\mathbf {o} \\mathbf {r} \\mathbf {a} \\mathbf {l} \\mathbf {l} }{lc}{\\mathbf {i} \\mathbf {n} }\\ {Locs}\\left(i\\right){\\mathbf {d} \\mathbf {o} }\\ {\\mathbf {o} \\mathbf {n} }{lc}{\\mathbf {d} \\mathbf {o} }{Kmeans}{()}}\n \n .\n\r\n\nWhereas locales are used to map tasks to machine nodes, domain maps are used for mapping data to a target architecture. Here is a simple example of a declaration of a rectangular domain:\n\r\n\n\n \n \n \n \n \n c\n \n \n o\n \n \n n\n \n \n s\n \n \n t\n \n \n  \n D\n :\n \n \n d\n \n \n o\n \n \n m\n \n \n a\n \n \n i\n \n \n n\n \n \n (\n 2\n )\n =\n \n {\n \n 1..\n n\n ,\n 1..\n n\n \n }\n \n \n \n {\\displaystyle {\\mathbf {c} \\mathbf {o} \\mathbf {n} \\mathbf {s} \\mathbf {t} }\\ D:{\\mathbf {d} \\mathbf {o} \\mathbf {m} \\mathbf {a} \\mathbf {i} \\mathbf {n} }(2)=\\left\\{1..n,1..n\\right\\}}\n \n \n\r\n\nDomains can be also mapped to locales. Similar concepts (logical regions and mapping interfaces) are used in the Legion programming model.[31][21]\n\r\n\nExascale programming is a strongly evolving research field, and it is not possible to discuss in detail all programming models, languages, and libraries that are contributing to provide features and mechanisms useful for exascale data analysis application programming. However, the next section introduces, discusses, and classifies current programming systems for exascale computing according to the most used programming and data management models.\n\nExascale programming systems comparison \nAs mentioned, several parallel programming models, languages, and libraries are under development for providing high-level programming interfaces and tools for implementing high-performance applications on future Exascale computers. Here we introduce the most significant proposals and discuss their main features. Table 1 lists and classifies the considered systems and summarizes some pros and fallacies of different classes.\n\nSummary of data security issues \nBased on the literature review, Table 1 summarizes the major data security concerns that IT leaders should consider in order to move their ERP systems into the cloud.\n\n\n\n\n\n\n\nTable 1. Exascale programming systems classification\n\n\nProgramming Models\n\nLanguages\n\nLibraries\/APIs\n\nPros and Fallacies\n\n\nDistributed memory\n\nCharm++, Legion, High Performance Fortran (HPF), ECL, PaRSEC\n\nMPI, BSP, Pig Latin, AllScale\n\nDistributed memory languages\/APIs are very close to the exascale hardware model. Systems in this class consider and deal with communication latency; however, data exchange costs are the main source of overhead. Except AllScale, and some MPI version, systems in this class do not manage network and CPU failures.\n\n\nShared memory\n\nTBB, Cilk++\n\nOpenMP, OmpSs\n\nShared memory models do not map efficiently on exascale systems. Extensions have been proposed to improve performance when dealing with synchronization and network failures. No single convincing solution till now exists.\n\n\nPartitioned memory\n\nUPC, Chapel, X10, CAF\n\nGA, SHMEM, DASH, OpenSHMEM, GASPI\n\nThe local memory model is very useful, but combining it with global\/shared memory mechanisms introduces too much overhead. GASPI is the only system in this class enabling applications to recover from failures.\n\n\nHybrid models\n\nUPC\u2009+\u2009MPI, C++\/MPI\n\nMPI\u2009+\u2009OpenMP, Spark-MPI, FLUX, EMPI4Re, DPLASMA\n\nHybrid models facilitate the mapping to the hardware architectures; however, the different programming routines compete for resources, making it harder to control concurrency and contention. Resilient mechanisms are harder to implement because of the mixing of different constructs and data models.\n\n\n\nSince exascale systems will be composed of millions of processing nodes, distributed memory paradigms and message passing systems in particular are candidate tools to be used as programming systems for such class of systems. In this area, MPI is currently the most used and studied system. Different adaptations of this well-known model are under development such as, for example, Open MPI for Exascale. Other systems based on distributed memory programming are Pig Latin, Charm++, Legion, PaRSEC, Bulk Synchronous Parallel (BSP), AllScale API, and Enterprise Control Language (ECL). Just considering Pig Latin, we notice that some of its parallel operators such as FILTER, which selects a set of tuples from a relation based on a condition, and SPLIT, which partitions a relation into two or more relations, can be very useful in many highly parallel big data analysis applications.\nOn the other side, we have shared-memory models, where the major system is OpenMP, which offers a simple parallel programming model, although it does not provide mechanisms to explicitly map and control data distribution and includes non-scalable synchronization operations that make its implementation on massively parallel systems a challenging prospect. Other programming systems in this area are Threading Building Blocks (TBB), OmpSs, and Cilk++. The OpenMP synchronization model based on locks and atomic and sequential sections that limit parallelism exploitation in exascale systems are going to be modified and integrated in recent OpenMP implementations with new techniques and routines that increase asynchronous operations and parallelism exploitation. A similar approach is used in Cilk++, which supports parallel loops and hyperobjects, a new construct designed to solve data race problems created by parallel accesses to global variables. In fact, a hyperobject allows multiple tasks to share state without race conditions and without using explicit locks.\nAs a tradeoff between distributed and shared memory organizations, the Partitioned Global Address Space (PGAS) model has been designed for implementing a global memory address space that is logically partitioned, and portions of it are local to single processes. The main goal of the PGAS model is to limit data exchange and isolate failures in very large-scale systems. Languages and libraries based on PGAS are Unified Parallel C (UPC), Chapel, X10, Global Arrays (GA), Co-Array Fortran (CAF), DASH, and SHMEM. PGAS appears to be suited for implementing data-intensive exascale applications because it uses private data structures and limits the amount of shared data among parallel threads. Its memory-partitioning model facilitates failure detection and resilience. Another programming mechanism useful for decentralized data analysis is related to data synchronization. In the SHMEM library it is implemented through the shmem_barrier operation, which performs a barrier operation on a subset of processing elements, then enables them to go further by sharing synchronized data.\nStarting from those three main programming approaches, hybrid systems have been proposed and developed to better map application tasks and data onto hardware architectures of exascale systems. In hybrid systems that combine distributed and shared memory, message-passing routines are used for data communication and inter-node processing, whereas shared-memory operations are used for exploiting intranode parallelism. A major example in this area is given by the different MPI\u2009+\u2009OpenMP systems recently implemented. Hybrid systems have been also designed by combining message passing models, like MPI, with PGAS models for restricting data communication overhead and improving MPI efficiency in execution time and memory consumption. The PGAS-based MPI implementation EMPI4Re, developed in the EPiGRAM project, is an example of this class of hybrid system.\nAssociated with the programming model issues, a set of challenges concern the design of runtime systems that in exascale computing systems must be tightly integrated with the programming tools level. The main challenges for runtime systems obviously include parallelism exploitation, limited data communication, data dependence management, data-aware task scheduling, processor heterogeneity, and energy efficiency. However, together with those main issues, other aspects are addressed in runtime systems like storage\/memory hierarchies, storage and processor heterogeneity, performance adaptability, resource allocation, performance analysis, and performance portability. In addressing those issues, the currently used approaches aim at providing simplified abstractions and machine models that allow algorithm developers and application programmers to generate code that can run and scale on a wide range of exascale computing systems.\nThis is a complex task that can be achieved by exploiting techniques that allow the runtime system to cooperate with the compiler, the libraries, and the operating system to find integrated solutions and make smarter use of hardware resources by efficient ways to map the application code to the exascale hardware. Finally, due to the specific features of exascale hardware, runtime systems need to find methods and techniques that allow bringing the computing system closer to the application requirements. Research work in this area is carried out in projects like XPRESS, StarPU, Corvette DEGAS, libWater[35], Traleika-Glacier, OmpSs[24], SnuCL, D-TEC, SLEEC, PIPER, and X-TUNE that are proposing innovative solutions for large-scale parallel computing systems that can be used in exascale machines. For instance, a system that aims at integrating the runtime with the language level is OmpSs, where mechanisms for data dependence management (based on DAG analysis like in libWater) and for mapping tasks to computing nodes and handling processor heterogeneity (the target construct) are provided. Another issue to be taken into account in the interaction between the programming level and the runtime is performance and scalability monitoring. In the StarPU project, for example, performance feedback through task profiling and trace analysis is provided.\nIn large-scale high-performance machines and in exascale systems, the runtime systems are more complex than in traditional parallel computers. In fact, performance and scalability issues must be addressed at the inter-node runtime level, and they must be appropriately integrated with intra-node runtime mechanisms.[36] All these issues relate to system and application scalability. In fact, vertical scaling of systems with multicore parallelism within a single node must be addressed. Scalability is still an open issue in exascale systems also because speed-up requirements for system software and runtimes are much higher than in traditional HPC systems, and different portions of code in applications or runtimes can generate performance bottlenecks.\nConcerning application resiliency, the runtime of exascale systems must include mechanisms for restarting tasks and accessing data in case of software or hardware faults without requiring developer involvement. Traditional approaches for providing reliability in HPC include checkpointing and restart (see for instance MPI_Checkpoint), reliable data storage (through file and in-memory replication or double buffering), and message logging for minimizing the checkpointing overhead. In fact, whereas the global checkpointing\/restart technique is the most used to limit system\/application faults, in the exascale scenario, new mechanisms with low overhead and highly scalability must be designed. These mechanisms should limit task and data duplication through smart approaches for selective replication. For example, silent data corruption (SDC) is recognized to be a critical problem in exascale computing. However, although replication is useful, their inherent inefficiency must be limited. Research work is carried out in this area to define techniques that limit replication costs while offering protection from SDC. For application\/task checkpointing, instead of checkpointing the entire address space of the application, as occurs in OpenMP and MPI, the minimal state of the tasks needed to be checkpointed for the fault recovery must be identified, thus limiting data size and recovery overhead.\n\nRequirements of exascale runtime for data analysis \nOne of the most important aspect to ponder in applications that run on exascale systems and analyze big datasets is the tradeoff between sharing data among processing elements and computing things locally to reduce communication and energy costs, while keeping performance and fault-tolerance levels. A scalable programming model founded on basic operations for data intensive\/data-driven applications must include mechanisms and operations for:\n\n parallel data access that allows increasing data access bandwidth by partitioning data into multiple chunks, according to different methods, and accessing several data elements in parallel to meet high throughput requirements;\n fault resiliency, a major issue as machines expand in size and complexity; on exascale systems with huge amounts of processes, non-local communication must be prepared for a potential failure of one of the communication sides; runtimes must features failure-handling mechanisms for recovering from node and communication faults;\n data-driven local communication that is useful for limiting the data exchange overhead in massively parallel systems composed of many cores; in this case, data availability among neighbor nodes dictates the operations taken by those nodes;\n data processing on limited groups of cores, which allows concentrating data analysis operations involving limited sets of cores and large amount of data on localities of exascale machines facilitating a type of data affinity co-locating related data and computation;\n near-data synchronization to limit the overhead generated by synchronization mechanisms and protocols that involve several far away cores in keeping data up-to-date;\n in-memory querying and analytics, needed to reduce query response times and execution of analytics operations by caching large volumes of data in the computing node RAMs and issuing queries and other operations in parallel on the main memory of computing nodes;\n group-level data aggregation in parallel systems, which is useful for efficient summarization, graph traversal, and matrix operations, making it of great importance in programming models for data analysis on massively parallel systems; and\n locality-based data selection and classification, for limiting the latency of basic data analysis operations running in parallel on large scale machines in a way that the subset of data needed together in a given phase are locally available (in a subset of nearby cores).\nA reliable and high-level programming model and its associated runtime must be able to manage and provide implementation solutions for those operations, together with the reliable exploitation of a very large amount of parallelism.\nReal-world big data analysis applications cannot be practically solved on sequential machines. If we refer to real-world applications, each large-scale data mining and machine learning software platform that today is under development in the areas of social data analysis and bioinformatics will certainly benefit from the availability of exascale computing systems. They will also benefit from the use of exascale programming environments that will offer massive and adaptive-grain parallelism, data locality, local communication, and synchronization mechanisms, together with the other features discussed in the previous sections that are needed for reducing execution time and making feasible the solution of new problems and challenges. For example, in bioinformatics applications, parallel data partitioning is a key feature for running statistical analysis or machine learning algorithms on high-performance computing systems. After that, clever and complex data mining algorithms must be run on each single core\/node of an exascale machine on subsets of data to produce data models in parallel. When partial models are produced, they can be checked locally and must be merged among nearby processors to obtain, for example, a general model of gene expression correlations or of drug-gene interactions. Therefore for those applications, data locality, highly parallel correlation algorithms, and limited communication structures are very important to reduce execution time from several days to a few minutes. Moreover, fault tolerance software mechanisms are also useful in long-running bioinformatics applications to avoid restarting them from the beginning when a software\/hardware failure occurs.\nMoving to social media applications, nowadays the huge volume of user-generated data in social media platforms such as Facebook, Twitter and Instagram are very precious sources of data from which to extract insights concerning human dynamics and behaviors. In fact, social media analysis is a fast growing research area that will benefit from the use of exascale computing systems. For example, social media users moving through a sequence of places in a city or a region may create a huge amount of geo-referenced data that include extensive knowledge about human dynamics and mobility behaviors. A methodology for discovering behavior and mobility patterns of users from social media posts and tweets includes a set of steps such as collection and pre-processing of geotagged items, organization of the input dataset, data analysis and trajectory mining algorithm execution, and results visualization. In all those data analysis steps, the utilization of scalable programming techniques and tools is vital to obtain practical results in feasible time when massive datasets are analyzed. The exascale programming features and requirements discussed here and in the previous sections will be very useful in social data analysis, particularly for executing parallel tasks like concurrent data acquisition (thus data items are collected exploiting parallel queries from different data sources), parallel data filtering ,and data partitioning by the exploitation of local and in-memory algorithms, classification, clustering and association mining algorithms that are computing intensive and need a large number of processing elements working asynchronously to produce learning models from billions of posts containing text, photos, and videos. The management and processing of terabytes of data that are involved in those applications cannot be done efficiently without solving issues like data locality, near-data processing, large asynchronous execution, and the other similar issues addressed in exascale computing systems.\nTogether with an accurate modeling of basic operations and of the programming languages\/APIs that include them, supporting correct and effective data-intensive applications on exascale systems will require also a significant programming effort of developers when they need to implement complex algorithms and data-driven applications such as those used, for example, in big data analysis and distributed data mining. Parallel and distributed data mining strategies, like collective learning, meta-learning, and ensemble learning must be devised using fine-grain parallel approaches to be adapted on exascale computers. Programmers must be able to design and implement scalable algorithms by using the operations sketched above specifically adapted to those new systems. To reach this goal, a coordinated effort between the operation\/language designers and the application developers would be fruitful.\nIn Exascale systems, the cost of accessing, moving, and processing data across a parallel system is enormous.[19][7] This requires mechanisms, techniques, and operations for capable data access, placement, and querying. In addition, scalable operations must be designed in such a way to avoid global synchronizations, centralized control, and global communications. Many data scientists want to be abstracted away from these tricky, lower level aspects of HPC until at least they have their code working, afterwards potentially tweaking communication and distribution choices in a high-level manner in order to further tune their code. Interoperability and integration with the MapReduce model and MPI must be investigated, with the main goal of achieving scalability on large-scale data processing.\nDifferent data-driven abstractions can be combined for providing a programming model and an API that allow the reliable and productive programming of very large-scale heterogeneous and distributed memory systems. In order to simplify the development of applications in heterogeneous distributed memory environments, large-scale data-parallelism can be exploited on top of the abstraction of n-dimensional arrays subdivided in partitions, so that different array partitions are placed on different cores\/nodes that will process in parallel the array partitions. This approach can allow the computing nodes to process in parallel data partitions at each core\/node using a set of statements\/library calls that hide the complexity of the underlying process. Data dependency in this scenario limits scalability, so it should be avoided or limited to a local scale.\nAbstract data types provided by libraries, so that they can be easily integrated in existing applications, should support this abstraction. As we mentioned above, another issue is the gap between users with HPC needs and experts with the skills to make the most of these technologies. An appropriate directive-based approach can be to design, implement, and evaluate a compiler framework that allows generic translations from high-level languages to exascale heterogeneous platforms. A programming model should be designed at a level that is higher than that of standards, such as OpenCL, including also checkpointing and fault resiliency. Efforts must be carried out to show the feasibility of transparent checkpointing of exascale programs and quantitatively evaluate the runtime overhead. Approaches like CheCL show that it is also possible to enable transparent checkpoint and restart in high-performance and dependable GPU computing, including support for process migration among different processors such as a CPU and a GPU.\nThe model should enable rapid development with reduced effort for different heterogeneous platforms. These heterogeneous platforms need to include low-energy architectures and mobile devices. The new model should allow a preliminary evaluation of results on the target architectures.\n\nConcluding remarks and future work \nCloud-based solutions for big data analysis tools and systems are in an advanced phase both on the research and the commercial sides. On the other hand, new exascale hardware\/software solutions must be studied and designed to allow the mining of very large-scale datasets on those new platforms.\nExascale systems raise new requirements on application developers and programming systems to target architectures composed of a significantly large number of homogeneous and heterogeneous cores. General issues like energy consumption, multitasking, scheduling, reproducibility, and resiliency must be addressed together with other data-oriented issues like data distribution and mapping, data access, data communication, and synchronization. Programming constructs and runtime systems will play a crucial role in enabling future data analysis programming models, runtime models, and hardware platforms to address these challenges, supporting the scalable implementation of real big data analysis applications.\nIn particular, here we summarize a set of open design challenges that are critical for designing exascale programming systems and for their scalable implementation. The following design choices, among others, must be taken into account:\n\n Application reliability: Data analysis programming models must include constructs and\/or mechanisms for handling task and data access failures as well as system recoveries. As new data analysis platforms appear ever larger, the fully reliable operations cannot be implicit, and this assumption becomes less credible, therefore explicit solutions must be proposed.\n Reproducibility requirements: Big data analysis running on massively parallel systems demands reproducibility. New data analysis programming frameworks must collect and generate metadata and provenance information about algorithm characteristics, software configuration, and execution environment for supporting application reproducibility on large-scale computing platforms.\n Communication mechanisms: Novel approaches must be devised for facing network unreliability[17] and network latency, for example by expressing asynchronous data communications and locality-based data exchange\/sharing.\n Communication patterns: A correct paradigm design should include communication patterns allowing application-dependent features and data access models, limiting data movement, and simplifying the burden on exascale runtimes and interconnection.\n Data handling and sharing patterns: Data locality mechanisms\/constructs like near-data computing must be designed and evaluated on big data applications when subsets of data are stored in nearby processors, and by avoiding that locality, this is imposed when data must be moved. Other challenges concern data affinity control data querying (NoSQL approach), global data distribution, and sharing patterns.\n Data-parallel constructs: Useful models like data-driven\/data-centric constructs, dataflow parallel operations, independent data parallelism, and SPMD patterns must be deeply considered and studied.\n Grain of parallelism: Anything from fine-grain to process-grain parallelism must be analyzed also in combination with the different parallelism degree that the exascale hardware supports. Perhaps different grain size should be considered in a single model to address hardware needs and heterogeneity.\nFinally, since big data mining algorithms often require the exchange of raw data or, better, of mining parameters and partial models, to achieve scalability and reliability on thousands of processing elements, metadata-based information, limited-communication programming mechanisms, and partition-based data structures with associated parallel operations must be proposed and implemented.\n\nAbbreviations \nAPGAS: asynchronous partitioned global address space\nBSP: bulk synchronous parallel\nCAF: Co-Array Fortran\nDAaaS: data analysis as a service\nDAIaaS: data analysis infrastructure as a service\nDAPaaS: data analysis platform as a service\nDASaaS: data analysis software as a service\nDMCF: Data Mining Cloud Framework\nECL: Enterprise Control Language\nESnet: Energy Sciences Network\nGA: global array\nHPC: high-performance computing\nIaaS: infrastructure as a service\nJS4Cloud: JavaScript for Cloud\nPaaS: platform as a service\nPGAS: partitioned global address space\nRDD: resilient distributed dataset\nSaaS: software as a service\nSOA: service oriented computing\nTBB: threading building blocks\nVL4Cloud: Visual Language for Cloud\nXaaS: everything as a service\n\nAppendix \nScalability in parallel systems \nParallel computing systems aim at exploiting the capacity of usefully employing all their processing elements during application execution. Indeed, only an ideal parallel system can do that fully because of its sequential times that cannot be parallelized (as Amdahl\u2019s law suggests[37]) and due to several sources of overhead such as sequential operations, communication, synchronization, I\/O and memory access, network speed, I\/O system speed, hardware and software failures, problem size, and program input. All these issues related to the ability of parallel systems to fully exploit their resources are referred to as system or program scalability.[38]\nThe scalability of a parallel computing system is a measure of its capacity to reduce program execution time in proportion to the number of its processing elements. According to this definition, scalable computing refers to the ability of a hardware\/software parallel system to exploit increasing computing resources effectively in the execution of a software application.[39]\nDespite the difficulties that can be faced in the parallel implementation of an application, a framework, or a programming system, a scalable parallel computation can always be made cost-optimal if the number of processing elements, the size of memory, the network bandwidth, and the size of the problem are chosen appropriately.\nFor evaluation and measuring scalability of a parallel program, some metrics have been defined and are largely used: parallel runtime T(p), speedup S(p) and efficiency E(p). Parallel runtime is the total processing time of the program using p processor (with p\u2009>\u20091). Speedup is the ratio between the total processing time of the program on one processor and the total processing time on p processors: S(p)\u2009=\u2009T(1)\/T(p). Efficiency is the ratio between speedup and the total number of used processors: E(p)\u2009=\u2009S(p)\/p.\nApplication scalability is influenced by the available hardware and software resources, their performance and reliability, and by the sources of overhead discussed before. In particular, scalability of data analysis applications are tight related to the exploitation of parallelism in data-driven operations and the overhead generated by data management mechanisms and techniques. Moreover, application scalability also depends on the programmer's ability to design the algorithms, reducing sequential time and exploiting parallel operations. Finally, the instruction designers and the runtime implementers contribute to exploitation of scalability.[40] All these arguments mean that for realizing exascale computing in practice, many issues and aspects must be taken into account by considering all the layers of hardware\/software stack involved in the execution of exascale programs.\nIn addressing parallel system scalability, it must be also tackled system dependability. As the number of processors and network interconnections increases\u2014and as tasks, threads, and message exchanges increase\u2014the rate of failures and faults increases too.[41] As discussed in reference[42], the design of scalable parallel systems requires assuring system dependability. Therefore understanding of failure characteristics is a key issue to couple high performance and reliability in massive parallel systems at exascale size.\n\nAcknowledgements \nFunding \nThis work has been partially funded by the ASPIDE Project funded by the European Union\u2019s Horizon 2020 research and innovation programme under grant agreement No 801091.\n\nAvailability of data and materials \nData sharing not applicable to this article as no datasets were generated or analyzed during the current study.\n\nAuthors\u2019 contributions \nDT carried out all the work presented in the paper. The author read and approved the final manuscript.\n\nCompeting interests \nThe author declare that he\/she has no competing interests.\n\nReferences \n\n\n\u2191 1.0 1.1 Petcu, D.; Iuhasz, G.; Pop, D. et al. (2015). \"On Processing Extreme Data\". Scalable Computing: Practice and Experience 16 (4). doi:10.12694\/scpe.v16i4.1134.   \n\n\u2191 Tardieu, O.; Herta, B.; Cunningham, D. et al. (2016). \"X10 and APGAS at Petascale\". ACM Transactions on Parallel Computing (TOPC) 2 (4): 25. doi:10.1145\/2894746.   \n\n\u2191 3.0 3.1 Talia, D. (2015). \"Making knowledge discovery services scalable on clouds for big data mining\". Proceedings from the Second IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM): 1\u20134. doi:10.1109\/ICSDM.2015.7298015.   \n\n\u2191 Amarasinghe, S.; Campbell, D.; Carlson, W. et al. (14 September 2009). \"ExaScale Software Study: Software Challenges in Extreme Scale Systems\". DARPA IPTO. pp. 153. doi:10.1.1.205.3944. http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.205.3944 .   \n\n\u2191 5.0 5.1 Zaharia. M.; Xin, R.S.; Wendell, P. et al. (2016). \"Apache Spark: A unified engine for big data processing\". Communications of the ACM 59 (11): 56\u201365. doi:10.1145\/2934664.   \n\n\u2191 6.0 6.1 Maheshwari, K.; Montagnat, J. (2010). \"Scientific Workflow Development Using Both Visual and Script-Based Representation\". 6th World Congress on Services: 328\u201335. doi:10.1109\/SERVICES.2010.14.   \n\n\u2191 7.0 7.1 7.2 Reed, D.A.; Dongarra, J. (2015). \"Exascale computing and big data\". Communications of the ACM 58 (7): 56\u201368. doi:10.1145\/2699414.   \n\n\u2191 Armbrust, M.; Fox, A.; Griffith, R. et al. (2010). \"A view of cloud computing\". Communications of the ACM 53 (4): 50\u201358. doi:10.1145\/1721654.1721672.   \n\n\u2191 Gu, Y.; Grossman, R.L. (2009). \"Sector and Sphere: The design and implementation of a high-performance data cloud\". Philosophical Transactions, Series A: Mathematical, Physical, and Engineering Sciences 367 (1897): 2429\u201345. doi:10.1098\/rsta.2009.0053. PMC PMC3391065. PMID 19451100. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3391065 .   \n\n\u2191 Talia, D.; Trunfio, P.; Marozzo, F. (2015). Data Analysis in the Cloud. Elsevier. pp. 150. ISBN 9780128029145.   \n\n\u2191 Hwang, K. (2017). Cloud Computing for Machine Learning and Cognitive Applications. MIT Press. pp. 624. ISBN 9780262036412.   \n\n\u2191 12.0 12.1 Marozzo, F.; Talia, D.; Trunfio, P. (2013). \"A Cloud Framework for Big Data Analytics Workflows on Azure\". In Catlett, C., Gentzsch, W., Grandinetti, L. et al.. Cloud Computing and Big Data. Advances in Parallel Computing. 23. pp. 182\u201391. doi:10.3233\/978-1-61499-322-3-182. ISBN 9781614993223.   \n\n\u2191 13.0 13.1 Marozzo, F.; Talia, D.; Trunfio, P. (2015). \"JS4Cloud: script\u2010based workflow programming for scalable data analysis on cloud platforms\". Concurrency and Computation: Practice and Experience 27 (17): 5214\u201337. doi:10.1002\/cpe.3563.   \n\n\u2191 Talia, D. (2013). \"Clouds for Scalable Big Data Analytics\". Computer 46 (5): 98\u2013101. doi:10.1109\/MC.2013.162.   \n\n\u2191 Wozniak, J.M.; Wilde, M.; Foster, I.T. (2014). \"Language Features for Scalable Distributed-Memory Dataflow Computing\". Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing: 50\u201353. doi:10.1109\/DFM.2014.17.   \n\n\u2191 Lucas, R.; Ang, J.; Bergman, K. et al. (10 February 2014). \"Top Ten Exascale Research Challenges\" (PDF). U.S. Department of Energy. pp. 80. https:\/\/science.energy.gov\/~\/media\/ascr\/ascac\/pdf\/meetings\/20140210\/Top10reportFEB14.pdf .   \n\n\u2191 17.0 17.1 Fekete, A.; Lynch, N.; Mansour, Y.; Spinelli, J. (1993). \"The impossibility of implementing reliable communication in the face of crashes\". Journal of the ACM 40 (5): 1087\u20131107. doi:10.1145\/174147.169676.   \n\n\u2191 IDC (April 2014). \"The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things\". Dell EMC. https:\/\/www.emc.com\/leadership\/digital-universe\/2014iview\/executive-summary.htm .   \n\n\u2191 19.0 19.1 Chen, J.; Choudhary, A.; Feldman, S. et al. (March 2013). \"Synergistic Challenges in Data-Intensive Science and Exascale Computing: DOE ASCAC Data Subcommittee Report\". Department of Energy, Office of Science. https:\/\/www.scholars.northwestern.edu\/en\/publications\/synergistic-challenges-in-data-intensive-science-and-exascale-com .   \n\n\u2191 \"What will we make of this moment?\" (PDF). IBM. 2013. pp. 151. https:\/\/www.ibm.com\/annualreport\/2013\/bin\/assets\/2013_ibm_annual.pdf .   \n\n\u2191 21.0 21.1 Diaz, J.; Mu\u00f1oz-Caro, C.; Ni\u00f1o, A. (2012). \"A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era\". IEEE Transactions on Parallel and Distributed Systems 23 (8): 1369\u201386. doi:10.1109\/TPDS.2011.308.   \n\n\u2191 Gorton, I.; Greenfield, P.; Szalay, A.; Willimas, R. (2008). \"Data-Intensive Computing in the 21st Century\". Computer 41 (4): 30\u201332. doi:10.1109\/MC.2008.122.   \n\n\u2191 Markidis, S.; Peng, I.B.; Larsson, J. et al. (2016). \"The EPiGRAM Project: Preparing Parallel Programming Models for Exascale\". High Performance Computing - ISC High Performance 2016: 56\u201368. doi:10.1007\/978-3-319-46079-6_5.   \n\n\u2191 24.0 24.1 Fern\u00e1ndez, A.; Beltran, V.; Martorell, X. et al. (2014). \"Task-Based Programming with OmpSs and Its Application\". Euro-Par 2014: Parallel Processing Workshops: 601\u201312. doi:10.1007\/978-3-319-14313-2_51.   \n\n\u2191 Olston, C.; Reed, B.; Srivastava, U. et al. (2008). \"Pig Latin: A not-so-foreign language for data processing\". Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data: 1099\u20131110. doi:10.1145\/1376616.1376726.   \n\n\u2191 26.0 26.1 Gropp, W.; Snir, M. (2013). \"Programming for Exascale Computers\". Computing in Science & Engineering 15 (6): 27\u201335. doi:10.1109\/MCSE.2013.96.   \n\n\u2191 Marozzo, F.; Talia, D.; Trunfio, P. (2012). \"P2P-MapReduce: Parallel data processing in dynamic cloud environments\". Journal of Computer and System Sciences 78 (5): 1382\u20131402. doi:10.1016\/j.jcss.2011.12.021.   \n\n\u2191 Tardieu, O.; Herta, B.; Cunningham, D. et al. (2014). \"X10 and APGAS at Petascale\". Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming: 53\u201366. doi:10.1145\/2555243.2555245.   \n\n\u2191 Yoo, A.; Kaplan, I. (2009). \"Evaluating use of data flow systems for large graph analysis\". Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers: 5. doi:10.1145\/1646468.1646473.   \n\n\u2191 Nishtala, R.; Zheng, Y.; Hargrove, P.H. et al. (2011). \"Tuning collective communication for Partitioned Global Address Space programming models\". Parallel Computing 37 (9): 576\u201391. doi:10.1016\/j.parco.2011.05.006.   \n\n\u2191 31.0 31.1 Bauer, M.; Treichler, S.; Slaughter, E.; Aiken, A. (2012). \"Legion: Expressing locality and independence with logical regions\". Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis: 66. https:\/\/dl.acm.org\/citation.cfm?id=2389086 .   \n\n\u2191 32.0 32.1 Chamberlain, B.L.; Callahan, D.; Zima, H.P. (2007). \"Parallel Programmability and the Chapel Language\". The International Journal of High Performance Computing Applications 21 (3): 291\u2013312. doi:10.1177\/1094342007078442.   \n\n\u2191 Nieplocha, J.; Palmer, B.; Tipparaju, V. et al. (2006). \"Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit\". The International Journal of High Performance Computing Applications 20 (2): 203\u201331. doi:10.1177\/1094342006064503.   \n\n\u2191 Meswani, M.R.; Carrington, L.; Snavely, A.; Poole, S. (2012). \"Tools for Benchmarking, Tracing, and Simulating SHMEM Applications\". CUG2012 Final Proceedings: 1\u20136. https:\/\/cug.org\/proceedings\/attendee_program_cug2012\/by_auth.html .   \n\n\u2191 Crasso, I.; Pellagrini, S.; Cosenza, B.; Fahringer, T. (2013). \"LibWater: Heterogeneous distributed computing made easy\". Proceedings of the 27th International ACM conference on Supercomputing: 161\u201372. doi:10.1145\/2464996.2465008.   \n\n\u2191 Sarkar, V.; Budimlic, Z.; Kulkani, M. (19 September 2016). \"2014 Runtime Systems Summit. Runtime Systems Report\". U.S. Department of Energy. doi:10.2172\/1341724. https:\/\/www.osti.gov\/biblio\/1341724-runtime-systems-summit-runtime-systems-report .   \n\n\u2191 Amdahl, G.M. (1967). \"Validity of single-processor approach to achieving large-scale computing capability\". Proceedings of AFIPS Conference: 483\u201385.   \n\n\u2191 Bailey, D.H. (1991). \"Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers\" (PDF). Supercomputing Review: 54\u201355. https:\/\/crd-legacy.lbl.gov\/~dhbailey\/dhbpapers\/twelve-ways.pdf .   \n\n\u2191 Grama, A.; Karypis, G.; Kumar, V.; Gupta, A. (2003). Introduction to Parallel Computing (2nd ed.). Pearson. pp. 656. ISBN 9780201648652.   \n\n\u2191 Gustafson, J.L. (1988). \"Reevaluating Amdahl's law\". Communications of the ACM 31 (5): 532\u201333. doi:10.1145\/42411.42415.   \n\n\u2191 Shi, J.Y.; Taifi, M.; Pradeep, A. et al. (2012). \"Program Scalability Analysis for HPC Cloud: Applying Amdahl's Law to NAS Benchmarks\". 2012 SC Companion: High Performance Computing, Networking Storage and Analysis: 1215\u20131225. doi:10.1109\/SC.Companion.2012.147.   \n\n\u2191 Schroeder, B.; Gibson, G. (2010). \"A Large-Scale Study of Failures in High-Performance Computing Systems\". IEEE Transactions on Dependable and Secure Computing 7 (4): 337\u201350. doi:10.1109\/TDSC.2009.4.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. Some grammar and punctuation was cleaned up to improve readability. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version\u2014by design\u2014lists them in order of appearance. The lone footnote was turned into an inline reference.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\">https:\/\/www.limswiki.org\/index.php\/Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles (with rendered math)LIMSwiki journal articles on big dataLIMSwiki journal articles on cloud computingLIMSwiki journal articles on data analysis\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 27 February 2019, at 01:25.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 74 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","804be563fdd6e10a6921069440e3e962_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_A_view_of_programming_scalable_data_analysis_From_clouds_to_exascale skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:A view of programming scalable data analysis: From clouds to exascale<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p>Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media, sensor networks, smartphones, and the internet. Scalable big data analysis today can be achieved by parallel implementations that are able to exploit the computing and storage facilities of high-performance computing (HPC) systems and <a href=\"https:\/\/www.limswiki.org\/index.php\/Cloud_computing\" title=\"Cloud computing\" class=\"wiki-link\" data-key=\"fcfe5882eaa018d920cedb88398b604f\">cloud computing<\/a> systems, whereas in the near future exascale systems will be used to implement extreme-scale <a href=\"https:\/\/www.limswiki.org\/index.php\/Data_analysis\" title=\"Data analysis\" class=\"wiki-link\" data-key=\"545c95e40ca67c9e63cd0a16042a5bd1\">data analysis<\/a>. Here is discussed how cloud computing currently supports the development of scalable data mining solutions and what the main challenges to be addressed and solved for implementing innovative data analysis applications on exascale systems currently are.\n<\/p><p><b>Keywords<\/b>: big data analysis, cloud computing, exascale computing, data mining, parallel programming, scalability\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Solving problems in science and engineering was the first motivation for inventing computers. Much later, computer science remains the main area in which innovative solutions and technologies are being developed and applied. Also due to the extraordinary advancement of computer technology, nowadays data are generated as never before. In fact, the amount of structured and unstructured digital data is going to increase beyond any estimate. Databases, file systems, data streams, social media, and data repositories are increasingly pervasive and decentralized.\n<\/p><p>As the data scale increases, we must address new challenges and attack ever-larger problems. New discoveries will be achieved and more accurate investigations can be carried out due to the increasingly widespread availability of large amounts of data. Scientific sectors that fail to make full use of the volume of digital data available today risk losing out on the significant opportunities that big data can offer.\n<\/p><p>To benefit from big data availability, specialists and researchers need advanced data analysis tools and applications running on scalable architectures allowing for the extraction of useful knowledge from such huge data sources. High-performance computing (HPC) systems and cloud computing systems today are capable platforms for addressing both the computational and data storage needs of big data mining and parallel knowledge discovery applications. These computing architectures are needed to run data analysis because complex data mining tasks involve data- and compute-intensive algorithms that require large, reliable, and effective storage facilities together with high-performance processors to obtain results in a timely fashion.\n<\/p><p>Now that data sources have become pervasively huge, reliable and effective programming tools and applications for data analysis are needed to extract value and find useful insights in them. New ways to correctly and proficiently compose different distributed models and paradigms are required, and interaction between hardware resources and programming levels must be addressed. Users, professionals, and scientists working in the area of big data need advanced data analysis programming models and tools coupled with scalable architectures to support the extraction of useful <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> from such massive repositories. The scalability of a parallel computing system is a measure of its capacity to reduce program execution time in proportion to the number of its processing elements. (The appendix of this article introduces and discusses in detail scalability in parallel systems.) According to scalability definition, scalable data analysis refers to the ability of a hardware\/software parallel system to exploit increasing computing resources effectively in the analysis of (very) large datasets.\n<\/p><p>Today, complex analysis of real-world massive data sources requires using high-performance computing systems such as massively parallel machines or clouds. However in the next years, as parallel technologies advance, exascale computing systems will be exploited for implementing scalable big data analysis in all areas of science and engineering.<sup id=\"rdp-ebb-cite_ref-PetcuOnProc15_1-0\" class=\"reference\"><a href=\"#cite_note-PetcuOnProc15-1\">[1]<\/a><\/sup> To reach this goal, new design and programming challenges must be addressed and solved. As such, the focus of this paper is on discussing current cloud-based designing and programming solutions for data analysis and suggesting new programming requirements and approaches to be conceived for meeting big data analysis challenges on future exascale platforms.\n<\/p><p>Current cloud computing platforms and parallel computing systems represent two different technological solutions for addressing the computational and data storage needs of big data mining and parallel knowledge discovery applications. Indeed, parallel machines offer high-end processors with the main goal to support HPC applications, whereas cloud systems implement a computing model in which dynamically scalable virtualized resources are provided to users and developers as a service over the internet. In fact, clouds do not mainly target HPC applications; they represent scalable computing and storage delivery platforms that can be adapted to the needs of different classes of people and organizations by exploiting a service-oriented architecture (SOA) approach. Clouds offer large facilities to many users who were unable to own their parallel\/distributed computing systems to run applications and services. In particular, big data analysis applications requiring access and manipulating very large datasets with complex mining algorithms will significantly benefit from the use of cloud platforms.\n<\/p><p>Although not many cloud-based data analysis frameworks are available today for end users, within a few years they will become common.<sup id=\"rdp-ebb-cite_ref-TardieuX10_16_2-0\" class=\"reference\"><a href=\"#cite_note-TardieuX10_16-2\">[2]<\/a><\/sup> Some current solutions are based on open-source systems, such as Apache Hadoop and Mahout, Spark, and SciDB, while others are proprietary solutions provided by companies such as Google, Microsoft, EMC, Amazon, BigML, Splunk Hunk, and InsightsOne. As more such platforms emerge, researchers and professionals will port increasingly powerful data mining programming tools and frameworks to the cloud to exploit complex and flexible software models such as the distributed <a href=\"https:\/\/www.limswiki.org\/index.php\/Workflow\" title=\"Workflow\" class=\"wiki-link\" data-key=\"92bd8748272e20d891008dcb8243e8a8\">workflow<\/a> paradigm. The growing utilization of the service-oriented computing model could accelerate this trend.\n<\/p><p>From the definition of the term \"big data,\" which refers to datasets so large and complex that traditional hardware and software data processing solutions are inadequate to manage and analyze, we can infer that conventional computer systems are not so powerful to process and mine big data<sup id=\"rdp-ebb-cite_ref-TaliaMaking15_3-0\" class=\"reference\"><a href=\"#cite_note-TaliaMaking15-3\">[3]<\/a><\/sup>, and they are not able to scale with the size of problems to be solved. As mentioned before, to face with limits of sequential machines, advanced systems like HPC, cloud computing, and even more scalable architectures are used today to analyze big data. Starting from this scenario, exascale computing systems will represent the next computing step.<sup id=\"rdp-ebb-cite_ref-AmarasingheExa09_4-0\" class=\"reference\"><a href=\"#cite_note-AmarasingheExa09-4\">[4]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ZahariaApache16_5-0\" class=\"reference\"><a href=\"#cite_note-ZahariaApache16-5\">[5]<\/a><\/sup> Exascale systems refers to high-performance computing systems capable of at least one exaFLOPS, so their implementation represents a significant research and technology challenge. Their design and development is currently under investigation with the goal of building by 2020 high-performance computers composed of a very large number of multi-core processors expected to deliver a performance of 10<sup>18<\/sup> operations per second. Cloud computing systems used today are able to store very large amounts of data; however, they do not provide the high performance expected from massively parallel exascale systems. This is the main motivation for developing exascale systems. Exascale technology will represent the most advanced model of supercomputers. They have been conceived for single-site supercomputing centers, not for distributed infrastructures that could use multi-clouds or fog computing systems for decentralizing computing and pervasive data management, and later be interconnected with exascale systems that could be used as a backbone for very large scale data analysis.\n<\/p><p>The development of exascale systems spurs a need to address and solve issues and challenges at both the hardware and software level. Indeed, it requires the design and implementation of novel software tools and runtime systems able to manage a high degree of parallelism, reliability, and data locality in extreme scale computers.<sup id=\"rdp-ebb-cite_ref-MaheshwariScientific10_6-0\" class=\"reference\"><a href=\"#cite_note-MaheshwariScientific10-6\">[6]<\/a><\/sup> Needed are new programming constructs and runtime mechanisms able to adapt to the most appropriate parallelism degree and communication decomposition for making scalable and reliable data analysis tasks. Their dependence on parallelism grain size and data analysis task decomposition must be deeply studied. This is needed because parallelism exploitation depends on several features like parallel operations, communication overhead, input data size, I\/O speed, problem size, and hardware configuration. Moreover, reliability and reproducibility are two additional key challenges to be addressed. At the programming level, constructs for handling and recovering communication, data access, and computing failures must be designed. At the same time, reproducibility in scalable data analysis asks for rich information useful to assure similar results on environments that may dynamically change. All these factors must be taken into account in designing data analysis applications and tools that will be scalable on exascale systems.\n<\/p><p>Moreover, reliable and effective methods for storing, accessing, and communicating data; intelligent techniques for massive data analysis; and software architectures enabling the scalable extraction of knowledge from data are needed.<sup id=\"rdp-ebb-cite_ref-TaliaMaking15_3-1\" class=\"reference\"><a href=\"#cite_note-TaliaMaking15-3\">[3]<\/a><\/sup> To reach this goal, models and technologies enabling cloud computing systems and HPC architectures must be extended\/adapted or completely changed to be reliable and scalable on the very large number of processors\/cores that compose extreme scale platforms and for supporting the implementation of clever data analysis algorithms that ought to be scalable and dynamic in resource usage. Exascale computing infrastructures will play the role of an extraordinary platform for addressing both the computational and data storage needs of big data analysis applications. However, as mentioned before, to have a complete scenario, efforts must be performed for implementing big data analytics algorithms, architectures, programming tools, and applications in exascale systems.<sup id=\"rdp-ebb-cite_ref-ReedExa15_7-0\" class=\"reference\"><a href=\"#cite_note-ReedExa15-7\">[7]<\/a><\/sup>\n<\/p><p>Pursuing this objective within a few years, scalable data access and analysis systems will become the most used platforms for big data analytics on large-scale clouds. In the long term, new exascale computing infrastructures will appear as viable platforms for big data analytics in the next decades, and data mining algorithms, tools, and applications will be ported on such platforms for implementing extreme data discovery solutions.\n<\/p><p>In this paper we first discuss cloud-based scalable data mining and machine learning solutions, then we examine the main research issues that must be addressed for implementing massively parallel data mining applications on exascale computing systems. Data-related issues are discussed together with communication, multi-processing, and programming issues. We then introduce issues and systems for scalable data analysis on clouds and then discuss design and programming issues for big data analysis in exascale systems. We close by outlining some open design challenges.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Data_analysis_on_cloud_computing_platforms\">Data analysis on cloud computing platforms<\/span><\/h2>\n<p>Cloud computing platforms implement elastic services, scalable performance, and scalable data storage used by a large and everyday increasing number of users and applications.<sup id=\"rdp-ebb-cite_ref-ArmbrustAView10_8-0\" class=\"reference\"><a href=\"#cite_note-ArmbrustAView10-8\">[8]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-GuSector09_9-0\" class=\"reference\"><a href=\"#cite_note-GuSector09-9\">[9]<\/a><\/sup> In fact, cloud platforms have enlarged the arena of distributed computing systems by providing advanced internet services that complement and complete functionalities of distributed computing provided by the internet, grid systems, and peer-to-peer networks. In particular, most cloud computing applications use big data repositories stored within the cloud itself, so in those cases large datasets are analyzed with low latency to effectively extract data analysis models.\n<\/p><p>\"Big data\" is a new and overused term that refers to massive, heterogeneous, and often unstructured digital content that is difficult to process using traditional data management tools and techniques. The term includes the complexity and variety of data and data types, real-time data collection and processing needs, and the value that can be obtained by smart analytics. However, we should recognize that data are not necessarily important per se but they become very important if we are able to extract value from them\u2014that is if we can exploit them to make discoveries. The extraction of useful knowledge from big digital datasets requires smart and scalable analytics algorithms, services, programming tools, and applications. All these tools require insights into big data to make them more useful for people.\n<\/p><p>The growing use of <a href=\"https:\/\/www.limswiki.org\/index.php\/Service-level_agreement\" title=\"Service-level agreement\" class=\"wiki-link\" data-key=\"e2cebf861e03b214f4a0accedfac3f5a\">service-oriented<\/a> computing is accelerating the use of cloud-based systems for scalable big data analysis. Developers and researchers are adopting the three main cloud models\u2014<a href=\"https:\/\/www.limswiki.org\/index.php\/Software_as_a_service\" title=\"Software as a service\" class=\"wiki-link\" data-key=\"ae8c8a7cd5ee1a264f4f0bbd4a4caedd\">software as a service<\/a> (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS)\u2014to implement big data analytics solutions in the cloud.<sup id=\"rdp-ebb-cite_ref-TaliaData15_10-0\" class=\"reference\"><a href=\"#cite_note-TaliaData15-10\">[10]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HwangCloud17_11-0\" class=\"reference\"><a href=\"#cite_note-HwangCloud17-11\">[11]<\/a><\/sup> According to a specialization of these three models, data analysis tasks and applications can be offered as services at the software, platform, or infrastructure level and made available every time from anywhere. A methodology for implementing them defines a new model stack to deliver data analysis solutions that are a specialization of the XaaS (everything as a service) stack and is called \"data analysis as a service\" (DAaaS). It adapts and specifies the three general service models (SaaS, PaaS, and IaaS) for supporting the structured development of big data analysis systems, tools, and applications according to a service-oriented approach. The DAaaS methodology is then based on the three basic models for delivering data analysis services at different levels as described here (see also Fig. 1):\n<\/p>\n<ul><li> <i>Data analysis infrastructure as a service (DAIaaS)<\/i>: This model provides a set of hardware\/software virtualized resources that developers can assemble and use as an integrated infrastructure where storing large datasets, running data mining applications, and\/or implementing data analytics systems from scratch;<\/li><\/ul>\n<ul><li> <i>Data analysis platform as a service (DAPaaS)<\/i>: This model defines a supporting software platform that developers can use for programming and running their data analytics applications or extending existing ones without worrying about the underlying infrastructure or specific distributed architecture issues; and<\/li><\/ul>\n<ul><li> <i>Data analysis software as a service (DASaaS)<\/i>: This is a higher-level model that offers to end users data mining algorithms, data analysis suites, or ready-to-use knowledge discovery applications as internet services that can be accessed and used directly through a web browser. According to this approach, all data analysis software is provided as a service, leaving end users without having to worry about implementation and execution details.<\/li><\/ul>\n<p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Talia_JOfCloudComp2019_8.png\" class=\"image wiki-link\" data-key=\"a1fa7f570f9be57899e3c4086266fbe9\"><img alt=\"Fig1 Talia JOfCloudComp2019 8.png\" src=\"https:\/\/www.limswiki.org\/images\/b\/b6\/Fig1_Talia_JOfCloudComp2019_8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> The three models of the DAaaS software methodology. The DAaaS software methodology is based on three basic models for delivering data analysis services at different levels (application, platform, and infrastructure). The DAaaS methodology defines a new model stack to deliver data analysis solutions that are a specialization of the XaaS (everything as a service) stack and is called \"data analysis as a service\" (DAaaS). It adapts and specifies the three general service models (SaaS, PaaS, and SaaS) for supporting the structured development of big data analysis systems, tools, and applications according to a service-oriented approach.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Cloud-based_data_analysis_tools\">Cloud-based data analysis tools<\/span><\/h3>\n<p>Using the DASaaS methodology, we designed a cloud-based system, the Data Mining Cloud Framework (DMCF)<sup id=\"rdp-ebb-cite_ref-MarozzoACloud13_12-0\" class=\"reference\"><a href=\"#cite_note-MarozzoACloud13-12\">[12]<\/a><\/sup> which supports three main classes of data analysis and knowledge discovery applications:\n<\/p>\n<ul><li> <i>Single-task applications<\/i>, in which a single data mining task such as classification, clustering, or association rules discovery is performed on a given dataset;<\/li><\/ul>\n<ul><li> <i>Parameter-sweeping applications<\/i>, in which a dataset is analyzed by multiple instances of the same data mining algorithm with different parameters; and<\/li><\/ul>\n<ul><li> <i>Workflow-based applications<\/i>, in which knowledge discovery applications are specified as graphs linking together data sources, data mining tools, and data mining models.<\/li><\/ul>\n<p>DMCF includes a large variety of processing patterns to express knowledge discovery workflows as graphs whose nodes denote resources (datasets, data analysis tools, mining models) and whose edges denote dependencies among resources. A web-based user interface allows users to compose their applications and submit them for execution to the cloud platform, following the data analysis software as a service approach. Visual workflows can be programmed in DMCF through a language called VL4Cloud (Visual Language for Cloud), whereas script-based workflows can be programmed by JS4Cloud (JavaScript for Cloud), a JavaScript-based language for data analysis programming.\n<\/p><p>Figure 2 shows a sample data mining workflow composed of several sequential and parallel steps. It is just an example for presenting the main features of the VL4Cloud programming interface.<sup id=\"rdp-ebb-cite_ref-MarozzoACloud13_12-1\" class=\"reference\"><a href=\"#cite_note-MarozzoACloud13-12\">[12]<\/a><\/sup> The example workflow analyses a dataset by using <i>n<\/i> instances of a classification algorithm, which work on <i>n<\/i> portions of the training set and generate the same number of knowledge models. By using the <i>n<\/i> generated models and the test set, <i>n<\/i> classifiers produce in parallel <i>n<\/i> classified datasets (<i>n<\/i> classifications). In the final step of the workflow, a voter generates the final classification by assigning a class to each data item, by choosing the class predicted by the majority of the models.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Talia_JOfCloudComp2019_8.png\" class=\"image wiki-link\" data-key=\"ecdf2d3ca3d8f55489a3a24d8d1e52a1\"><img alt=\"Fig2 Talia JOfCloudComp2019 8.png\" src=\"https:\/\/www.limswiki.org\/images\/4\/41\/Fig2_Talia_JOfCloudComp2019_8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> A parallel classification workflow designed by the VL4Cloud programming interface. The figure shows a workflow designed by the VL4Cloud programming interface during its execution. The workflow implements a parallel classification application. Tasks\/services included in square bracket are executed in parallel. The results produced by classifiers are selected by a voter task that produces the final classification.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Although DMCF has been mainly designed to coordinate coarse grain data and task parallelism in big data analysis applications by exploiting the workflow paradigm, the DMCF script-based programming interface (JS4Cloud) allows also for parallelizing fine-grain operations in data mining algorithms, as it permits to program in a JavaScript style any data mining algorithm, such as classification, clustering, and others. This can be done because loops and data parallel methods are run in parallel on the virtual machines of a cloud.<sup id=\"rdp-ebb-cite_ref-MarozzoJS4_15_13-0\" class=\"reference\"><a href=\"#cite_note-MarozzoJS4_15-13\">[13]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-TaliaClouds13_14-0\" class=\"reference\"><a href=\"#cite_note-TaliaClouds13-14\">[14]<\/a><\/sup>\n<\/p><p>Like DMCF, other innovative cloud-based systems designed for programming data analysis applications include Apache Spark, Sphere, Swift, Mahout, and CloudFlows. Most of them are open-source. Apache Spark is an open-source framework developed at University of California, Berkeley for in-memory data analysis and machine learning.<sup id=\"rdp-ebb-cite_ref-ZahariaApache16_5-1\" class=\"reference\"><a href=\"#cite_note-ZahariaApache16-5\">[5]<\/a><\/sup> Spark has been designed to run both batch processing and dynamic applications like streaming, interactive queries, and graph analysis. Spark provides developers with a programming interface centered on a data structure called the \"resilient distributed dataset\" (RDD) that represents a read-only multi-set of data items distributed over a cluster of machines maintained in a fault-tolerant way. Differently from other systems and from Hadoop, Spark stores data in memory and queries it repeatedly so as to obtain better performance. This feature can be useful for a future implementation of Spark on exascale systems.\n<\/p><p>Swift is a workflow-based framework for implementing functional data-driven task parallelism in data-intensive applications. The Swift language provides a functional programming paradigm where workflows are designed as a set of calls with associated command-line arguments and input and output files. Swift uses an implicit data-driven task parallelism.<sup id=\"rdp-ebb-cite_ref-WozniakLang14_15-0\" class=\"reference\"><a href=\"#cite_note-WozniakLang14-15\">[15]<\/a><\/sup> In fact, it looks like a sequential language, but being a dataflow language, all variables are futures, thus execution is based on data availability. Parallelism can be also exploited through the use of the <tt>foreach<\/tt> statement. Swift\/T is a new implementation of the Swift language for high-performance computing. In this implementation, a Swift program is translated into an MPI program that uses the Turbine and ADLB runtime libraries for scalable dataflow processing over MPI. Recently, a porting of Swift\/T on large cloud systems for the execution of numerous tasks has been investigated.\n<\/p><p>DMCF, differently from the other frameworks discussed here, it is the only system that offers both a visual and a script-based programming interface. Visual programming is a very convenient design approach for high-level users, like domain-expert analysts having a limited understanding of programming. On the other hand, script-based workflows are a useful paradigm for expert programmers who can code complex applications rapidly, in a more concise way and with greater flexibility. Finally, the workflow-based model exploited in DMCF and Swift make these frameworks of more general use with respect to Spark, which offers a very restricted set of programming patterns (e.g., map, filter, and reduce), so limiting the variety of data analysis applications that can be implemented with it.\n<\/p><p>These and other related systems are currently used for the development of big data analysis applications on HPC and cloud platforms. However, additional research in this field must be done and the development of new models, solutions, and tools is needed.<sup id=\"rdp-ebb-cite_ref-ReedExa15_7-1\" class=\"reference\"><a href=\"#cite_note-ReedExa15-7\">[7]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-LucasTopTen14_16-0\" class=\"reference\"><a href=\"#cite_note-LucasTopTen14-16\">[16]<\/a><\/sup> Just to mention a few, active and promising research topics are listed here, ordered by importance:\n<\/p><p>1. <b>Programming models for big data analytics<\/b>: New abstract programming models and constructs hiding the system complexity are needed for big data analytics tools. The MapReduce model and workflow models are often used on HPC and cloud implementations, but more research effort is needed to develop other scalable, adaptive, general-purpose higher-level models and tools. Research in this area is even more important for exascale systems; in the next section we will discuss some of these topics in exascale computing.\n<\/p><p>2. <b>Reliability in scalable data analysis<\/b>: As the number of processing elements increases, reliability of systems and applications decreases, and therefore mechanisms for detecting and handling hardware and software faults are needed. Although Fekete <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-FeketeTheImp93_17-0\" class=\"reference\"><a href=\"#cite_note-FeketeTheImp93-17\">[17]<\/a><\/sup> have proven that no reliable communication protocol can tolerate crashes of processors on which the protocol runs, some ways in which systems cope with the impossibility result can be found. Among them, at the programming level it is necessary to design constructs for handling communication, data access, and computing failures and for recovering from them. Programming models, languages, and APIs must provide general and data-oriented mechanisms for failure detection and isolation, preventing an entire application from failing and assuring its completion. Reliability is a much more important issue in the exascale domain, where the number of processing elements is massive and fault occurrence increases, making detection and recovering vital.\n<\/p><p>3. <b>Application reproducibility<\/b>: Reproducibility is another open research issue for designers of complex applications running on parallel systems. Reproducibility in scalable data analysis must, for example, face with data communication, data parallel manipulation, and dynamic computing environments. Reproducibility demands that current data analysis frameworks (like those based on MapReduce and on workflows) and the future ones, especially those implemented on exascale systems, must provide additional information and knowledge on how data are managed, on algorithm characteristics, and on configuration of software and execution environments.\n<\/p><p>4. <b>Data and tool integration and openness<\/b>: Code coordination and data integration are main issues in large-scale applications that use data and computing resources. Standard formats, data exchange models, and common <a href=\"https:\/\/www.limswiki.org\/index.php\/Application_programming_interface\" title=\"Application programming interface\" class=\"wiki-link\" data-key=\"36fc319869eba4613cb0854b421b0934\">application programming interfaces<\/a> (APIs) are needed to support interoperability and ease cooperation among design teams using different data formats and tools.\n<\/p><p>5. <b>Interoperability of big data analytics frameworks<\/b>: The service-oriented paradigm allows running large-scale distributed applications on cloud heterogeneous platforms along with software components developed using different programming languages or tools. Cloud service paradigms must be designed to allow worldwide integration of multiple data analytics frameworks.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Exascale_and_big_data_analysis\">Exascale and big data analysis<\/span><\/h2>\n<p>As we discussed in the previous sections, data analysis gained a primary role because of the very large availability of datasets and the continuous advancement of methods and algorithms for finding knowledge in them. Data analysis solutions advance by exploiting the power of data mining and machine learning techniques and are changing several scientific and industrial areas. For example, the amount of data that social media daily generate is impressive and continuous. Some hundreds of terabyte of data, including several hundreds of millions of photos, are uploaded daily to Facebook and Twitter.\n<\/p><p>Therefore it is central to design scalable solutions for processing and analyzing such massive datasets. As a general forecast, IDC experts estimate data generated to reach about 45 zettabytes worldwide by 2020.<sup id=\"rdp-ebb-cite_ref-IDCTheDig14_18-0\" class=\"reference\"><a href=\"#cite_note-IDCTheDig14-18\">[18]<\/a><\/sup> This impressive amount of digital data asks for scalable high-performance data analysis solutions. However, today only one-quarter of digital data available would be a candidate for analysis, and about five percent of that is actually analyzed. By 2020, the useful percentage could grow to about 35 percent, thanks to data mining technologies.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Extreme_data_sources_and_scientific_computing\">Extreme data sources and scientific computing<\/span><\/h3>\n<p>Scalability and performance requirements are challenging conventional data storage, file systems, and database management systems. Architectures of such systems have reached limits in handling extremely large processing tasks involving petabytes of data because they have not been built for scaling after a given threshold. New architectures and analytics platform solutions that must process big data for extracting complex predictive and descriptive models have become necessary.<sup id=\"rdp-ebb-cite_ref-ChenSyn13_19-0\" class=\"reference\"><a href=\"#cite_note-ChenSyn13-19\">[19]<\/a><\/sup> Exascale systems, both from the hardware and the software side, can play a key role in supporting solutions to these problems.<sup id=\"rdp-ebb-cite_ref-PetcuOnProc15_1-1\" class=\"reference\"><a href=\"#cite_note-PetcuOnProc15-1\">[1]<\/a><\/sup>\n<\/p><p>An IBM study reports that we are generating around 2.5 exabytes of data per day.<sup id=\"rdp-ebb-cite_ref-IBMWhat13_20-0\" class=\"reference\"><a href=\"#cite_note-IBMWhat13-20\">[20]<\/a><\/sup> Because of that continuous and explosive growth of data, many applications require the use of scalable data analysis platforms. A well-known example is the ATLAS detector from the Large Hadron Collider at CERN in Geneva. The ATLAS infrastructure has a capacity of 200\u2009PB of disk space and 300,000 processor cores, with more than 100 computing centers connected via 10 Gbps links. The data collection rate is massive, and only a portion of the data produced by the collider is stored. Several teams of scientists run complex applications to analyze subsets of those huge volumes of data. This analysis would be impossible without a high-performance infrastructure that supports data storage, communication, and processing. Also computational astronomers are collecting and producing increasingly larger datasets each year that without scalable infrastructures cannot be stored and processed. Another significant case is represented by the Energy Sciences Network (ESnet), the U.S. Department of Energy\u2019s high-performance network managed by Berkeley Lab that in late 2012 rolled out a 100 gigabits-per-second national network to accommodate the growing scale of scientific data.\n<\/p><p>If we go from science to society, social data and <a href=\"https:\/\/www.limswiki.org\/index.php\/EHealth\" title=\"EHealth\" class=\"wiki-link\" data-key=\"39df6aac1fbe4ad737280794f3a81d80\">eHealth<\/a> are good examples to discuss. Social networks, such as Facebook and Twitter, have become very popular and are receiving increasing attention from the research community because of the huge amount of user-generated data, which provide valuable information concerning human behavior, habits, and travel. When the volume of data to be analyzed is of the order of terabytes or petabytes (billions of tweets or posts), scalable storage and computing solutions must be used, but no clear solutions today exist for the analysis of exascale datasets. The same occurs in the eHealth domain, where huge amounts of patient data are available and can be used for improving therapies, for forecasting and tracking of health data, and for the management of <a href=\"https:\/\/www.limswiki.org\/index.php\/Hospital\" title=\"Hospital\" class=\"wiki-link\" data-key=\"b8f070c66d8123fe91063594befebdff\">hospitals<\/a> and health centers. Very complex data analysis in this area will need novel hardware\/software solutions; however, exascale computing is still promising in other scientific fields where scalable storage and databases are not used\/required. Examples of scientific disciplines where future exascale computing will be extensively used are quantum chromodynamics, materials simulation, molecular dynamics, materials design, earthquake simulations, subsurface geophysics, climate forecasting, nuclear energy, and combustion. All those applications require the use of sophisticated models and algorithms to solve complex equation systems that will benefit from the exploitation of exascale systems.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Programming_model_features_for_exascale_data_analysis\">Programming model features for exascale data analysis<\/span><\/h3>\n<p>Implementing scalable data analysis applications in exascale computing systems is a complex job requiring high-level fine-grain parallel models, appropriate programming constructs, and skills in parallel and distributed programming. In particular, mechanisms and expertise are needed for expressing task dependencies and inter-task parallelism, for designing synchronization and load balancing mechanisms, handling failures, and properly managing distributed memory and concurrent communication among a very large number of tasks. Moreover, when the target computing infrastructures are heterogeneous and require different libraries and tools to program applications on them, the programming issues are even more complex. To cope with some of these issues in data-intensive applications, different scalable programming models have been proposed.<sup id=\"rdp-ebb-cite_ref-DiazASurv12_21-0\" class=\"reference\"><a href=\"#cite_note-DiazASurv12-21\">[21]<\/a><\/sup>\n<\/p><p>Scalable programming models may be categorized by:\n<\/p>\n<dl><dd>i. Their level of abstraction, expressing high-level and low-level programming mechanisms, and<\/dd><\/dl>\n<dl><dd>ii. How they allow programmers to develop applications, using visual or script-based formalisms.<\/dd><\/dl>\n<p>Using high-level scalable models, a programmer defines only the high-level logic of an application while hiding the low-level details that are not essential for application design, including infrastructure-dependent execution details. A programmer is assisted in application definition, and application performance depends on the compiler that analyzes the application code and optimizes its execution on the underlying infrastructure. On the other hand, low-level scalable models allow programmers to interact directly with computing and storage elements composing the underlying infrastructure and thus define the application's parallelism directly.\n<\/p><p>Data analysis applications implemented by some frameworks can be programmed through a visual interface, which is a convenient design approach for high-level users, for instance domain-expert analysts having a limited understanding of programming. In addition, a visual representation of workflows or components intrinsically captures parallelism at the task level, without the need to make parallelism explicit through control structures.<sup id=\"rdp-ebb-cite_ref-MaheshwariScientific10_6-1\" class=\"reference\"><a href=\"#cite_note-MaheshwariScientific10-6\">[6]<\/a><\/sup> Visual-based data analysis typically is implemented by providing workflow-based languages or component-based paradigms (Fig. 3). Dataflow-based approaches that share with workflows the same application structure are also used. However, in dataflow models, the grain of parallelism and the size of data items are generally smaller with respect to workflows. In general, visual programming tools are not very flexible because they often implement a limited set of visual patterns and provide restricted manners to configure them. For addressing this issue, some visual languages provide users with the possibility to customize the behavior of patterns by adding code that can specify operations that execute a specific pattern when an event occurs.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Talia_JOfCloudComp2019_8.png\" class=\"image wiki-link\" data-key=\"ff9b3c741f2cd3b085fb69f063b8e0fb\"><img alt=\"Fig3 Talia JOfCloudComp2019 8.png\" src=\"https:\/\/www.limswiki.org\/images\/9\/95\/Fig3_Talia_JOfCloudComp2019_8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 3.<\/b> Main visual and script-based programming models used today for data analysis programming<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>On the other hand, code-based (or script-based) formalism allows users to program complex applications more rapidly, in a more concise way, and with higher flexibility.<sup id=\"rdp-ebb-cite_ref-MarozzoJS4_15_13-1\" class=\"reference\"><a href=\"#cite_note-MarozzoJS4_15-13\">[13]<\/a><\/sup> Script-based applications can be designed in different ways (see Fig. 3):\n<\/p>\n<ul><li> Use complete language or a language extension that allows to express parallelism in applications, according to a general purpose or a domain-specific approach. This approach requires the design and implementation of a new parallel programming language or a complete set of data types and parallel constructs to be fully inserted in an existing language.<\/li><\/ul>\n<ul><li> Use annotations in the application code that allow the compiler to identify which instructions will be executed in parallel. According to this approach, parallel statements are separated from sequential constructs, and they are clearly identified in the program code because they are denoted by special symbols or keywords.<\/li><\/ul>\n<ul><li> Use a library in the application code that adds parallelism to the data analysis application. Currently this is the most-used approach since it is orthogonal to host languages. MPI and MapReduce are two well-known examples of this approach.<\/li><\/ul>\n<p>Given the variety of data analysis applications and classes of users (from skilled programmers to end users) that can be envisioned for future exascale systems, there is a need for scalable programming models with different levels of abstractions (high-level and low-level) and different design formalisms (visual and script-based), according to the classification outlined above.\n<\/p><p>As we discussed, data-intensive applications are software programs that have a significant need to process large volumes of data.<sup id=\"rdp-ebb-cite_ref-GortonData08_22-0\" class=\"reference\"><a href=\"#cite_note-GortonData08-22\">[22]<\/a><\/sup> Such applications devote most of their processing time to running I\/O operations and exchanging and moving data among the processing elements of a parallel computing infrastructure. Parallel processing in data analysis applications typically involves accessing, pre-processing, partitioning, distributing, aggregating, querying, mining, and visualizing data that can be processed independently.\n<\/p><p>The main challenges for programming data analysis applications on exascale computing systems come from potential scalability; network latency and reliability; reproducibility of data analysis; and resilience of mechanisms and operations offered to developers for accessing, exchanging, and managing data. Indeed, processing extremely large data volumes requires operations and new algorithms able to scale in loading, storing, and processing massive amounts of data that generally must be partitioned in very small data grains, on which thousands to millions of simple parallel operations do analysis.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Exascale_programming_systems\">Exascale programming systems<\/span><\/h2>\n<p>Exascale systems force new requirements on programming systems to target platforms with hundreds of homogeneous and heterogeneous cores. Evolutionary models have been recently proposed for exascale programming that extend or adapt traditional parallel programming models like MPI (e.g., EPiGRAM<sup id=\"rdp-ebb-cite_ref-MarkidisTheEPi16_23-0\" class=\"reference\"><a href=\"#cite_note-MarkidisTheEPi16-23\">[23]<\/a><\/sup> that uses a library-based approach, Open MPI for exascale in the ECP initiative), OpenMP (e.g., OmpSs<sup id=\"rdp-ebb-cite_ref-Fern.C3.A1ndezTask14_24-0\" class=\"reference\"><a href=\"#cite_note-Fern.C3.A1ndezTask14-24\">[24]<\/a><\/sup> that exploits an annotation-based approach, the SOLLVE project), and MapReduce (e.g., Pig Latin<sup id=\"rdp-ebb-cite_ref-OlstonPig08_25-0\" class=\"reference\"><a href=\"#cite_note-OlstonPig08-25\">[25]<\/a><\/sup> that implements a domain-specific complete language). These new frameworks limit the communication overhead in message passing paradigms or limit the synchronization control if a shared-memory model is used.<sup id=\"rdp-ebb-cite_ref-GroppProg13_26-0\" class=\"reference\"><a href=\"#cite_note-GroppProg13-26\">[26]<\/a><\/sup>\n<\/p><p>As exascale systems are likely to be based on large distributed memory hardware, MPI is one of the most natural programming systems. MPI is currently used on over one million cores, and therefore it is reasonable to have MPI as one programming paradigm used on exascale systems. The same possibility occurs for MapReduce-based libraries that today are run on very large HPC and cloud systems. Both these paradigms are largely used for implementing big data analysis applications. As expected, general MPI all-to-all communication does not scale well in exascale environments; thus, to solve this issue new MPI releases introduced neighbor collectives to support sparse \u201call-to-some\u201d communication patterns that limit the data exchange on limited regions of processors.<sup id=\"rdp-ebb-cite_ref-GroppProg13_26-1\" class=\"reference\"><a href=\"#cite_note-GroppProg13-26\">[26]<\/a><\/sup>\n<\/p><p>Ensuring the reliability of exascale systems requires a holistic approach, including several hardware and software technologies for both predicting crashes and keeping systems stable despite failures. In the runtime of parallel APIs (like MPI and MapReduce-based libraries like Hadoop), a reliable communication layer must be provided if incorrect behavior in case of processor failure is to be mitigated. The lower unreliable layer is used by implementing a correct protocol that works safely with every implementation of the unreliable layer that cannot tolerate crashes of the processors on which it runs. Concerning MapReduce frameworks, reference<sup id=\"rdp-ebb-cite_ref-MarozzoP2P12_27-0\" class=\"reference\"><a href=\"#cite_note-MarozzoP2P12-27\">[27]<\/a><\/sup> reports on an adaptive MapReduce framework, called P2P-MapReduce\u2014which has been developed to manage node churn, master node failures, and job recovery in a decentralized way\u2014provide a more reliable MapReduce middleware that can be effectively exploited in dynamic large-scale infrastructures.\n<\/p><p>On the other hand, new complete languages such as X10<sup id=\"rdp-ebb-cite_ref-TardieuX10_14_28-0\" class=\"reference\"><a href=\"#cite_note-TardieuX10_14-28\">[28]<\/a><\/sup>, ECL<sup id=\"rdp-ebb-cite_ref-YooEval09_29-0\" class=\"reference\"><a href=\"#cite_note-YooEval09-29\">[29]<\/a><\/sup>, UPC<sup id=\"rdp-ebb-cite_ref-NishtalaTuning11_30-0\" class=\"reference\"><a href=\"#cite_note-NishtalaTuning11-30\">[30]<\/a><\/sup>, Legion<sup id=\"rdp-ebb-cite_ref-BauerLegion12_31-0\" class=\"reference\"><a href=\"#cite_note-BauerLegion12-31\">[31]<\/a><\/sup>, and Chapel<sup id=\"rdp-ebb-cite_ref-ChamberlainParallel07_32-0\" class=\"reference\"><a href=\"#cite_note-ChamberlainParallel07-32\">[32]<\/a><\/sup> have been defined by exploiting in them a data-centric approach. Furthermore, new APIs based on a revolutionary approach, such as GA<sup id=\"rdp-ebb-cite_ref-NieplochaAdvances06_33-0\" class=\"reference\"><a href=\"#cite_note-NieplochaAdvances06-33\">[33]<\/a><\/sup> and SHMEM<sup id=\"rdp-ebb-cite_ref-MeswaniTools12_34-0\" class=\"reference\"><a href=\"#cite_note-MeswaniTools12-34\">[34]<\/a><\/sup>, have been implemented according to a library-based model. These novel parallel paradigms are devised to address the requirements of data processing using massive parallelism. In particular, languages such as X10, UPC, and Chapel and the GA library are based on a partitioned global address space (PGAS) memory model that is suited to implement data-intensive exascale applications because it uses private data structures and limits the amount of shared data among parallel threads.\n<\/p><p>Together with different approaches, such as Pig Latin and ECL, those programming models, languages, and APIs, must be further investigated, designed, and adapted, for providing data-centric scalable programming models useful in supporting the reliable and effective implementation of exascale data analysis applications composed of up to millions of computing units that process small data elements and exchange them with a very limited set of processing elements. PGAS-based models, data-flow and data-driven paradigms, and local-data approaches today represent promising solutions that could be used for exascale data analysis programming. The APGAS model is, for example, implemented in the X10 language, based on the notions of places and asynchrony. A place is an abstraction of shared, mutable data and worker threads operating on the data. A single APGAS computation can consist of hundreds or potentially tens of thousands of places. Asynchrony is implemented by a single block-structured control construct async. Given a statement <i>ST<\/i>, the construct async <i>ST<\/i> executes <i>ST<\/i> in a separate thread of control. Memory locations in one place can contain references to locations at other places. To compute upon data at another place, the following statement must be used:\n<\/p><p><br \/>\n<span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/725aa6771bf80ade0582e5313cd0a0892624829c'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.838ex; width:8.958ex; height:2.843ex;\" \/><\/span>\n<\/p><p><br \/>\nThis allows the task to change its place of execution to <i>p<\/i>, execute <i>ST<\/i> at <i>p<\/i> and return, leaving behind tasks that may have been spawned during the execution of <i>ST<\/i>.\n<\/p><p>Another interesting language based on the PGAS model is Chapel.<sup id=\"rdp-ebb-cite_ref-ChamberlainParallel07_32-1\" class=\"reference\"><a href=\"#cite_note-ChamberlainParallel07-32\">[32]<\/a><\/sup> Its locality mechanisms can be effectively used for scalable data analysis where light data mining (sub-)tasks are run on local processing elements and partial results must be exchanged. Chapel's data locality provides control over where data values are stored and where tasks execute so that developers can ensure parallel data analysis computations execute near the variables they access, or vice-versa for minimizing the communication and synchronization costs. For example, Chapel programmers can specify how domains and arrays are distributed among the system nodes. Another appealing feature in Chapel is the expression of synchronization in a data-centric style. By associating synchronization constructs with data (variables), locality is enforced and data-driven parallelism can be easily expressed also at large scale. In Chapel, \"locales\" and \"domains\" are abstractions for referring to machine resources and map tasks and data to them. Locales are language abstractions for naming a portion of a target architecture (e.g., a GPU, a single core, or a multicore node) that has processing and storage capabilities. A locale specifies where (on which processing node) to execute tasks\/statements\/operations. For example, in a system composed of four locales:\n<\/p><p><br \/>\n<span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2b3f845e09bfb20e947e55d687dd936e9ba7e09c'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.838ex; width:22.807ex; height:2.843ex;\" \/><\/span>\n<\/p><p><br \/>\nwe can use the following for executing the method <tt>Filter (D)<\/tt> on the first locale:\n<\/p><p><br \/>\n<span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42ced52ef92060f5c1c6aab5572ecf9c1e5359eb'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.838ex; width:25.172ex; height:2.843ex;\" \/><\/span>.\n<\/p><p><br \/>\nAnd to execute the <tt>K-means()<\/tt> algorithm on the four locales, we can use:\n<\/p><p><br \/>\n<span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/19254292eaa49fe2b25094a29bd88c24dc9b4a8c'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.838ex; width:40.438ex; height:2.843ex;\" \/><\/span>.\n<\/p><p><br \/>\nWhereas locales are used to map tasks to machine nodes, domain maps are used for mapping data to a target architecture. Here is a simple example of a declaration of a rectangular domain:\n<\/p><p><br \/>\n<span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0ceb3b73b6eac4b1dea34b4ea5c4c127c3762f92'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.838ex; width:36.253ex; height:2.843ex;\" \/><\/span>\n<\/p><p><br \/>\nDomains can be also mapped to locales. Similar concepts (logical regions and mapping interfaces) are used in the Legion programming model.<sup id=\"rdp-ebb-cite_ref-BauerLegion12_31-1\" class=\"reference\"><a href=\"#cite_note-BauerLegion12-31\">[31]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-DiazASurv12_21-1\" class=\"reference\"><a href=\"#cite_note-DiazASurv12-21\">[21]<\/a><\/sup>\n<\/p><p><br \/>\nExascale programming is a strongly evolving research field, and it is not possible to discuss in detail all programming models, languages, and libraries that are contributing to provide features and mechanisms useful for exascale data analysis application programming. However, the next section introduces, discusses, and classifies current programming systems for exascale computing according to the most used programming and data management models.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Exascale_programming_systems_comparison\">Exascale programming systems comparison<\/span><\/h2>\n<p>As mentioned, several parallel programming models, languages, and libraries are under development for providing high-level programming interfaces and tools for implementing high-performance applications on future Exascale computers. Here we introduce the most significant proposals and discuss their main features. Table 1 lists and classifies the considered systems and summarizes some pros and fallacies of different classes.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Summary_of_data_security_issues\">Summary of data security issues<\/span><\/h3>\n<p>Based on the literature review, Table 1 summarizes the major data security concerns that IT leaders should consider in order to move their ERP systems into the cloud.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"4\"><b>Table 1.<\/b> Exascale programming systems classification\n<\/td><\/tr>\n<tr>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Programming Models\n<\/th>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Languages\n<\/th>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Libraries\/APIs\n<\/th>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Pros and Fallacies\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Distributed memory\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Charm++, Legion, High Performance Fortran (HPF), ECL, PaRSEC\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">MPI, BSP, Pig Latin, AllScale\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Distributed memory languages\/APIs are very close to the exascale hardware model. Systems in this class consider and deal with communication latency; however, data exchange costs are the main source of overhead. Except AllScale, and some MPI version, systems in this class do not manage network and CPU failures.\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Shared memory\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">TBB, Cilk++\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">OpenMP, OmpSs\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Shared memory models do not map efficiently on exascale systems. Extensions have been proposed to improve performance when dealing with synchronization and network failures. No single convincing solution till now exists.\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Partitioned memory\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">UPC, Chapel, X10, CAF\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">GA, SHMEM, DASH, OpenSHMEM, GASPI\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">The local memory model is very useful, but combining it with global\/shared memory mechanisms introduces too much overhead. GASPI is the only system in this class enabling applications to recover from failures.\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Hybrid models\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">UPC\u2009+\u2009MPI, C++\/MPI\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">MPI\u2009+\u2009OpenMP, Spark-MPI, FLUX, EMPI4Re, DPLASMA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Hybrid models facilitate the mapping to the hardware architectures; however, the different programming routines compete for resources, making it harder to control concurrency and contention. Resilient mechanisms are harder to implement because of the mixing of different constructs and data models.\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Since exascale systems will be composed of millions of processing nodes, distributed memory paradigms and message passing systems in particular are candidate tools to be used as programming systems for such class of systems. In this area, MPI is currently the most used and studied system. Different adaptations of this well-known model are under development such as, for example, Open MPI for Exascale. Other systems based on distributed memory programming are Pig Latin, Charm++, Legion, PaRSEC, Bulk Synchronous Parallel (BSP), AllScale API, and Enterprise Control Language (ECL). Just considering Pig Latin, we notice that some of its parallel operators such as <tt>FILTER<\/tt>, which selects a set of tuples from a relation based on a condition, and <tt>SPLIT<\/tt>, which partitions a relation into two or more relations, can be very useful in many highly parallel big data analysis applications.\n<\/p><p>On the other side, we have shared-memory models, where the major system is OpenMP, which offers a simple parallel programming model, although it does not provide mechanisms to explicitly map and control data distribution and includes non-scalable synchronization operations that make its implementation on massively parallel systems a challenging prospect. Other programming systems in this area are Threading Building Blocks (TBB), OmpSs, and Cilk++. The OpenMP synchronization model based on locks and atomic and sequential sections that limit parallelism exploitation in exascale systems are going to be modified and integrated in recent OpenMP implementations with new techniques and routines that increase asynchronous operations and parallelism exploitation. A similar approach is used in Cilk++, which supports parallel loops and hyperobjects, a new construct designed to solve data race problems created by parallel accesses to global variables. In fact, a hyperobject allows multiple tasks to share state without race conditions and without using explicit locks.\n<\/p><p>As a tradeoff between distributed and shared memory organizations, the Partitioned Global Address Space (PGAS) model has been designed for implementing a global memory address space that is logically partitioned, and portions of it are local to single processes. The main goal of the PGAS model is to limit data exchange and isolate failures in very large-scale systems. Languages and libraries based on PGAS are Unified Parallel C (UPC), Chapel, X10, Global Arrays (GA), Co-Array Fortran (CAF), DASH, and SHMEM. PGAS appears to be suited for implementing data-intensive exascale applications because it uses private data structures and limits the amount of shared data among parallel threads. Its memory-partitioning model facilitates failure detection and resilience. Another programming mechanism useful for decentralized data analysis is related to data synchronization. In the SHMEM library it is implemented through the <tt>shmem_barrier<\/tt> operation, which performs a barrier operation on a subset of processing elements, then enables them to go further by sharing synchronized data.\n<\/p><p>Starting from those three main programming approaches, hybrid systems have been proposed and developed to better map application tasks and data onto hardware architectures of exascale systems. In hybrid systems that combine distributed and shared memory, message-passing routines are used for data communication and inter-node processing, whereas shared-memory operations are used for exploiting intranode parallelism. A major example in this area is given by the different MPI\u2009+\u2009OpenMP systems recently implemented. Hybrid systems have been also designed by combining message passing models, like MPI, with PGAS models for restricting data communication overhead and improving MPI efficiency in execution time and memory consumption. The PGAS-based MPI implementation EMPI4Re, developed in the EPiGRAM project, is an example of this class of hybrid system.\n<\/p><p>Associated with the programming model issues, a set of challenges concern the design of runtime systems that in exascale computing systems must be tightly integrated with the programming tools level. The main challenges for runtime systems obviously include parallelism exploitation, limited data communication, data dependence management, data-aware task scheduling, processor heterogeneity, and energy efficiency. However, together with those main issues, other aspects are addressed in runtime systems like storage\/memory hierarchies, storage and processor heterogeneity, performance adaptability, resource allocation, performance analysis, and performance portability. In addressing those issues, the currently used approaches aim at providing simplified abstractions and machine models that allow algorithm developers and application programmers to generate code that can run and scale on a wide range of exascale computing systems.\n<\/p><p>This is a complex task that can be achieved by exploiting techniques that allow the runtime system to cooperate with the compiler, the libraries, and the operating system to find integrated solutions and make smarter use of hardware resources by efficient ways to map the application code to the exascale hardware. Finally, due to the specific features of exascale hardware, runtime systems need to find methods and techniques that allow bringing the computing system closer to the application requirements. Research work in this area is carried out in projects like XPRESS, StarPU, Corvette DEGAS, libWater<sup id=\"rdp-ebb-cite_ref-GrassoLibWater13_35-0\" class=\"reference\"><a href=\"#cite_note-GrassoLibWater13-35\">[35]<\/a><\/sup>, Traleika-Glacier, OmpSs<sup id=\"rdp-ebb-cite_ref-Fern.C3.A1ndezTask14_24-1\" class=\"reference\"><a href=\"#cite_note-Fern.C3.A1ndezTask14-24\">[24]<\/a><\/sup>, SnuCL, D-TEC, SLEEC, PIPER, and X-TUNE that are proposing innovative solutions for large-scale parallel computing systems that can be used in exascale machines. For instance, a system that aims at integrating the runtime with the language level is OmpSs, where mechanisms for data dependence management (based on DAG analysis like in libWater) and for mapping tasks to computing nodes and handling processor heterogeneity (the target construct) are provided. Another issue to be taken into account in the interaction between the programming level and the runtime is performance and scalability monitoring. In the StarPU project, for example, performance feedback through task profiling and trace analysis is provided.\n<\/p><p>In large-scale high-performance machines and in exascale systems, the runtime systems are more complex than in traditional parallel computers. In fact, performance and scalability issues must be addressed at the inter-node runtime level, and they must be appropriately integrated with intra-node runtime mechanisms.<sup id=\"rdp-ebb-cite_ref-Sarkar2014_16_36-0\" class=\"reference\"><a href=\"#cite_note-Sarkar2014_16-36\">[36]<\/a><\/sup> All these issues relate to system and application scalability. In fact, vertical scaling of systems with multicore parallelism within a single node must be addressed. Scalability is still an open issue in exascale systems also because speed-up requirements for system software and runtimes are much higher than in traditional HPC systems, and different portions of code in applications or runtimes can generate performance bottlenecks.\n<\/p><p>Concerning application resiliency, the runtime of exascale systems must include mechanisms for restarting tasks and accessing data in case of software or hardware faults without requiring developer involvement. Traditional approaches for providing reliability in HPC include checkpointing and restart (see for instance MPI_Checkpoint), reliable data storage (through file and in-memory replication or double buffering), and message logging for minimizing the checkpointing overhead. In fact, whereas the global checkpointing\/restart technique is the most used to limit system\/application faults, in the exascale scenario, new mechanisms with low overhead and highly scalability must be designed. These mechanisms should limit task and data duplication through smart approaches for selective replication. For example, silent data corruption (SDC) is recognized to be a critical problem in exascale computing. However, although replication is useful, their inherent inefficiency must be limited. Research work is carried out in this area to define techniques that limit replication costs while offering protection from SDC. For application\/task checkpointing, instead of checkpointing the entire address space of the application, as occurs in OpenMP and MPI, the minimal state of the tasks needed to be checkpointed for the fault recovery must be identified, thus limiting data size and recovery overhead.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Requirements_of_exascale_runtime_for_data_analysis\">Requirements of exascale runtime for data analysis<\/span><\/h2>\n<p>One of the most important aspect to ponder in applications that run on exascale systems and analyze big datasets is the tradeoff between sharing data among processing elements and computing things locally to reduce communication and energy costs, while keeping performance and fault-tolerance levels. A scalable programming model founded on basic operations for data intensive\/data-driven applications must include mechanisms and operations for:\n<\/p>\n<ul><li> parallel data access that allows increasing data access bandwidth by partitioning data into multiple chunks, according to different methods, and accessing several data elements in parallel to meet high throughput requirements;<\/li><\/ul>\n<ul><li> fault resiliency, a major issue as machines expand in size and complexity; on exascale systems with huge amounts of processes, non-local communication must be prepared for a potential failure of one of the communication sides; runtimes must features failure-handling mechanisms for recovering from node and communication faults;<\/li><\/ul>\n<ul><li> data-driven local communication that is useful for limiting the data exchange overhead in massively parallel systems composed of many cores; in this case, data availability among neighbor nodes dictates the operations taken by those nodes;<\/li><\/ul>\n<ul><li> data processing on limited groups of cores, which allows concentrating data analysis operations involving limited sets of cores and large amount of data on localities of exascale machines facilitating a type of data affinity co-locating related data and computation;<\/li><\/ul>\n<ul><li> near-data synchronization to limit the overhead generated by synchronization mechanisms and protocols that involve several far away cores in keeping data up-to-date;<\/li><\/ul>\n<ul><li> in-memory querying and analytics, needed to reduce query response times and execution of analytics operations by caching large volumes of data in the computing node RAMs and issuing queries and other operations in parallel on the main memory of computing nodes;<\/li><\/ul>\n<ul><li> group-level data aggregation in parallel systems, which is useful for efficient summarization, graph traversal, and matrix operations, making it of great importance in programming models for data analysis on massively parallel systems; and<\/li><\/ul>\n<ul><li> locality-based data selection and classification, for limiting the latency of basic data analysis operations running in parallel on large scale machines in a way that the subset of data needed together in a given phase are locally available (in a subset of nearby cores).<\/li><\/ul>\n<p>A reliable and high-level programming model and its associated runtime must be able to manage and provide implementation solutions for those operations, together with the reliable exploitation of a very large amount of parallelism.\n<\/p><p>Real-world big data analysis applications cannot be practically solved on sequential machines. If we refer to real-world applications, each large-scale data mining and machine learning software platform that today is under development in the areas of social data analysis and <a href=\"https:\/\/www.limswiki.org\/index.php\/Bioinformatics\" title=\"Bioinformatics\" class=\"wiki-link\" data-key=\"8f506695fdbb26e3f314da308f8c053b\">bioinformatics<\/a> will certainly benefit from the availability of exascale computing systems. They will also benefit from the use of exascale programming environments that will offer massive and adaptive-grain parallelism, data locality, local communication, and synchronization mechanisms, together with the other features discussed in the previous sections that are needed for reducing execution time and making feasible the solution of new problems and challenges. For example, in bioinformatics applications, parallel data partitioning is a key feature for running statistical analysis or machine learning algorithms on high-performance computing systems. After that, clever and complex data mining algorithms must be run on each single core\/node of an exascale machine on subsets of data to produce data models in parallel. When partial models are produced, they can be checked locally and must be merged among nearby processors to obtain, for example, a general model of gene expression correlations or of drug-gene interactions. Therefore for those applications, data locality, highly parallel correlation algorithms, and limited communication structures are very important to reduce execution time from several days to a few minutes. Moreover, fault tolerance software mechanisms are also useful in long-running bioinformatics applications to avoid restarting them from the beginning when a software\/hardware failure occurs.\n<\/p><p>Moving to social media applications, nowadays the huge volume of user-generated data in social media platforms such as Facebook, Twitter and Instagram are very precious sources of data from which to extract insights concerning human dynamics and behaviors. In fact, social media analysis is a fast growing research area that will benefit from the use of exascale computing systems. For example, social media users moving through a sequence of places in a city or a region may create a huge amount of geo-referenced data that include extensive knowledge about human dynamics and mobility behaviors. A methodology for discovering behavior and mobility patterns of users from social media posts and tweets includes a set of steps such as collection and pre-processing of geotagged items, organization of the input dataset, data analysis and trajectory mining algorithm execution, and results visualization. In all those data analysis steps, the utilization of scalable programming techniques and tools is vital to obtain practical results in feasible time when massive datasets are analyzed. The exascale programming features and requirements discussed here and in the previous sections will be very useful in social data analysis, particularly for executing parallel tasks like concurrent data acquisition (thus data items are collected exploiting parallel queries from different data sources), parallel data filtering ,and data partitioning by the exploitation of local and in-memory algorithms, classification, clustering and association mining algorithms that are computing intensive and need a large number of processing elements working asynchronously to produce learning models from billions of posts containing text, photos, and videos. The management and processing of terabytes of data that are involved in those applications cannot be done efficiently without solving issues like data locality, near-data processing, large asynchronous execution, and the other similar issues addressed in exascale computing systems.\n<\/p><p>Together with an accurate modeling of basic operations and of the programming languages\/APIs that include them, supporting correct and effective data-intensive applications on exascale systems will require also a significant programming effort of developers when they need to implement complex algorithms and data-driven applications such as those used, for example, in big data analysis and distributed data mining. Parallel and distributed data mining strategies, like collective learning, meta-learning, and ensemble learning must be devised using fine-grain parallel approaches to be adapted on exascale computers. Programmers must be able to design and implement scalable algorithms by using the operations sketched above specifically adapted to those new systems. To reach this goal, a coordinated effort between the operation\/language designers and the application developers would be fruitful.\n<\/p><p>In Exascale systems, the cost of accessing, moving, and processing data across a parallel system is enormous.<sup id=\"rdp-ebb-cite_ref-ChenSyn13_19-1\" class=\"reference\"><a href=\"#cite_note-ChenSyn13-19\">[19]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ReedExa15_7-2\" class=\"reference\"><a href=\"#cite_note-ReedExa15-7\">[7]<\/a><\/sup> This requires mechanisms, techniques, and operations for capable data access, placement, and querying. In addition, scalable operations must be designed in such a way to avoid global synchronizations, centralized control, and global communications. Many data scientists want to be abstracted away from these tricky, lower level aspects of HPC until at least they have their code working, afterwards potentially tweaking communication and distribution choices in a high-level manner in order to further tune their code. Interoperability and integration with the MapReduce model and MPI must be investigated, with the main goal of achieving scalability on large-scale data processing.\n<\/p><p>Different data-driven abstractions can be combined for providing a programming model and an API that allow the reliable and productive programming of very large-scale heterogeneous and distributed memory systems. In order to simplify the development of applications in heterogeneous distributed memory environments, large-scale data-parallelism can be exploited on top of the abstraction of <i>n<\/i>-dimensional arrays subdivided in partitions, so that different array partitions are placed on different cores\/nodes that will process in parallel the array partitions. This approach can allow the computing nodes to process in parallel data partitions at each core\/node using a set of statements\/library calls that hide the complexity of the underlying process. Data dependency in this scenario limits scalability, so it should be avoided or limited to a local scale.\n<\/p><p>Abstract data types provided by libraries, so that they can be easily integrated in existing applications, should support this abstraction. As we mentioned above, another issue is the gap between users with HPC needs and experts with the skills to make the most of these technologies. An appropriate directive-based approach can be to design, implement, and evaluate a compiler framework that allows generic translations from high-level languages to exascale heterogeneous platforms. A programming model should be designed at a level that is higher than that of standards, such as OpenCL, including also checkpointing and fault resiliency. Efforts must be carried out to show the feasibility of transparent checkpointing of exascale programs and quantitatively evaluate the runtime overhead. Approaches like CheCL show that it is also possible to enable transparent checkpoint and restart in high-performance and dependable GPU computing, including support for process migration among different processors such as a CPU and a GPU.\n<\/p><p>The model should enable rapid development with reduced effort for different heterogeneous platforms. These heterogeneous platforms need to include low-energy architectures and mobile devices. The new model should allow a preliminary evaluation of results on the target architectures.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Concluding_remarks_and_future_work\">Concluding remarks and future work<\/span><\/h2>\n<p>Cloud-based solutions for big data analysis tools and systems are in an advanced phase both on the research and the commercial sides. On the other hand, new exascale hardware\/software solutions must be studied and designed to allow the mining of very large-scale datasets on those new platforms.\n<\/p><p>Exascale systems raise new requirements on application developers and programming systems to target architectures composed of a significantly large number of homogeneous and heterogeneous cores. General issues like energy consumption, multitasking, scheduling, reproducibility, and resiliency must be addressed together with other data-oriented issues like data distribution and mapping, data access, data communication, and synchronization. Programming constructs and runtime systems will play a crucial role in enabling future data analysis programming models, runtime models, and hardware platforms to address these challenges, supporting the scalable implementation of real big data analysis applications.\n<\/p><p>In particular, here we summarize a set of open design challenges that are critical for designing exascale programming systems and for their scalable implementation. The following design choices, among others, must be taken into account:\n<\/p>\n<ul><li> <i>Application reliability<\/i>: Data analysis programming models must include constructs and\/or mechanisms for handling task and data access failures as well as system recoveries. As new data analysis platforms appear ever larger, the fully reliable operations cannot be implicit, and this assumption becomes less credible, therefore explicit solutions must be proposed.<\/li><\/ul>\n<ul><li> <i>Reproducibility requirements<\/i>: Big data analysis running on massively parallel systems demands reproducibility. New data analysis programming frameworks must collect and generate metadata and provenance information about algorithm characteristics, software configuration, and execution environment for supporting application reproducibility on large-scale computing platforms.<\/li><\/ul>\n<ul><li> <i>Communication mechanisms<\/i>: Novel approaches must be devised for facing network unreliability<sup id=\"rdp-ebb-cite_ref-FeketeTheImp93_17-1\" class=\"reference\"><a href=\"#cite_note-FeketeTheImp93-17\">[17]<\/a><\/sup> and network latency, for example by expressing asynchronous data communications and locality-based data exchange\/sharing.<\/li><\/ul>\n<ul><li> <i>Communication patterns<\/i>: A correct paradigm design should include communication patterns allowing application-dependent features and data access models, limiting data movement, and simplifying the burden on exascale runtimes and interconnection.<\/li><\/ul>\n<ul><li> <i>Data handling and sharing patterns<\/i>: Data locality mechanisms\/constructs like near-data computing must be designed and evaluated on big data applications when subsets of data are stored in nearby processors, and by avoiding that locality, this is imposed when data must be moved. Other challenges concern data affinity control data querying (NoSQL approach), global data distribution, and sharing patterns.<\/li><\/ul>\n<ul><li> <i>Data-parallel constructs<\/i>: Useful models like data-driven\/data-centric constructs, dataflow parallel operations, independent data parallelism, and SPMD patterns must be deeply considered and studied.<\/li><\/ul>\n<ul><li> <i>Grain of parallelism<\/i>: Anything from fine-grain to process-grain parallelism must be analyzed also in combination with the different parallelism degree that the exascale hardware supports. Perhaps different grain size should be considered in a single model to address hardware needs and heterogeneity.<\/li><\/ul>\n<p>Finally, since big data mining algorithms often require the exchange of raw data or, better, of mining parameters and partial models, to achieve scalability and reliability on thousands of processing elements, metadata-based information, limited-communication programming mechanisms, and partition-based data structures with associated parallel operations must be proposed and implemented.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Abbreviations\">Abbreviations<\/span><\/h2>\n<p><b>APGAS<\/b>: asynchronous partitioned global address space\n<\/p><p><b>BSP<\/b>: bulk synchronous parallel\n<\/p><p><b>CAF<\/b>: Co-Array Fortran\n<\/p><p><b>DAaaS<\/b>: data analysis as a service\n<\/p><p><b>DAIaaS<\/b>: data analysis infrastructure as a service\n<\/p><p><b>DAPaaS<\/b>: data analysis platform as a service\n<\/p><p><b>DASaaS<\/b>: data analysis software as a service\n<\/p><p><b>DMCF<\/b>: Data Mining Cloud Framework\n<\/p><p><b>ECL<\/b>: Enterprise Control Language\n<\/p><p><b>ESnet<\/b>: Energy Sciences Network\n<\/p><p><b>GA<\/b>: global array\n<\/p><p><b>HPC<\/b>: high-performance computing\n<\/p><p><b>IaaS<\/b>: infrastructure as a service\n<\/p><p><b>JS4Cloud<\/b>: JavaScript for Cloud\n<\/p><p><b>PaaS<\/b>: platform as a service\n<\/p><p><b>PGAS<\/b>: partitioned global address space\n<\/p><p><b>RDD<\/b>: resilient distributed dataset\n<\/p><p><b>SaaS<\/b>: software as a service\n<\/p><p><b>SOA<\/b>: service oriented computing\n<\/p><p><b>TBB<\/b>: threading building blocks\n<\/p><p><b>VL4Cloud<\/b>: Visual Language for Cloud\n<\/p><p><b>XaaS<\/b>: everything as a service\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Appendix\">Appendix<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Scalability_in_parallel_systems\">Scalability in parallel systems<\/span><\/h3>\n<p>Parallel computing systems aim at exploiting the capacity of usefully employing all their processing elements during application execution. Indeed, only an ideal parallel system can do that fully because of its sequential times that cannot be parallelized (as Amdahl\u2019s law suggests<sup id=\"rdp-ebb-cite_ref-AmdahlValid67_37-0\" class=\"reference\"><a href=\"#cite_note-AmdahlValid67-37\">[37]<\/a><\/sup>) and due to several sources of overhead such as sequential operations, communication, synchronization, I\/O and memory access, network speed, I\/O system speed, hardware and software failures, problem size, and program input. All these issues related to the ability of parallel systems to fully exploit their resources are referred to as system or program scalability.<sup id=\"rdp-ebb-cite_ref-BaileyTwelve91_38-0\" class=\"reference\"><a href=\"#cite_note-BaileyTwelve91-38\">[38]<\/a><\/sup>\n<\/p><p>The scalability of a parallel computing system is a measure of its capacity to reduce program execution time in proportion to the number of its processing elements. According to this definition, scalable computing refers to the ability of a hardware\/software parallel system to exploit increasing computing resources effectively in the execution of a software application.<sup id=\"rdp-ebb-cite_ref-GramaIntro03_39-0\" class=\"reference\"><a href=\"#cite_note-GramaIntro03-39\">[39]<\/a><\/sup>\n<\/p><p>Despite the difficulties that can be faced in the parallel implementation of an application, a framework, or a programming system, a scalable parallel computation can always be made cost-optimal if the number of processing elements, the size of memory, the network bandwidth, and the size of the problem are chosen appropriately.\n<\/p><p>For evaluation and measuring scalability of a parallel program, some metrics have been defined and are largely used: parallel runtime <i>T(p)<\/i>, speedup <i>S(p)<\/i> and efficiency <i>E(p)<\/i>. Parallel runtime is the total processing time of the program using <i>p<\/i> processor (with <i>p<\/i>\u2009>\u20091). Speedup is the ratio between the total processing time of the program on one processor and the total processing time on <i>p<\/i> processors: <i>S(p)<\/i>\u2009=\u2009<i>T(1)<\/i>\/<i>T(p)<\/i>. Efficiency is the ratio between speedup and the total number of used processors: <i>E(p)<\/i>\u2009=\u2009<i>S(p)<\/i>\/<i>p<\/i>.\n<\/p><p>Application scalability is influenced by the available hardware and software resources, their performance and reliability, and by the sources of overhead discussed before. In particular, scalability of data analysis applications are tight related to the exploitation of parallelism in data-driven operations and the overhead generated by data management mechanisms and techniques. Moreover, application scalability also depends on the programmer's ability to design the algorithms, reducing sequential time and exploiting parallel operations. Finally, the instruction designers and the runtime implementers contribute to exploitation of scalability.<sup id=\"rdp-ebb-cite_ref-GustafsonReeval88_40-0\" class=\"reference\"><a href=\"#cite_note-GustafsonReeval88-40\">[40]<\/a><\/sup> All these arguments mean that for realizing exascale computing in practice, many issues and aspects must be taken into account by considering all the layers of hardware\/software stack involved in the execution of exascale programs.\n<\/p><p>In addressing parallel system scalability, it must be also tackled system dependability. As the number of processors and network interconnections increases\u2014and as tasks, threads, and message exchanges increase\u2014the rate of failures and faults increases too.<sup id=\"rdp-ebb-cite_ref-ShiProg12_41-0\" class=\"reference\"><a href=\"#cite_note-ShiProg12-41\">[41]<\/a><\/sup> As discussed in reference<sup id=\"rdp-ebb-cite_ref-SchroederALarge10_42-0\" class=\"reference\"><a href=\"#cite_note-SchroederALarge10-42\">[42]<\/a><\/sup>, the design of scalable parallel systems requires assuring system dependability. Therefore understanding of failure characteristics is a key issue to couple high performance and reliability in massive parallel systems at exascale size.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h3>\n<p>This work has been partially funded by the ASPIDE Project funded by the European Union\u2019s Horizon 2020 research and innovation programme under grant agreement No 801091.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Availability_of_data_and_materials\">Availability of data and materials<\/span><\/h3>\n<p>Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Authors.E2.80.99_contributions\">Authors\u2019 contributions<\/span><\/h3>\n<p>DT carried out all the work presented in the paper. The author read and approved the final manuscript.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h3>\n<p>The author declare that he\/she has no competing interests.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-PetcuOnProc15-1\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-PetcuOnProc15_1-0\">1.0<\/a><\/sup> <sup><a href=\"#cite_ref-PetcuOnProc15_1-1\">1.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Petcu, D.; Iuhasz, G.; Pop, D. et al. (2015). \"On Processing Extreme Data\". <i>Scalable Computing: Practice and Experience<\/i> <b>16<\/b> (4). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.12694%2Fscpe.v16i4.1134\" data-key=\"73236db4cbd5c48fcf61416489862fb4\">10.12694\/scpe.v16i4.1134<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+Processing+Extreme+Data&rft.jtitle=Scalable+Computing%3A+Practice+and+Experience&rft.aulast=Petcu%2C+D.%3B+Iuhasz%2C+G.%3B+Pop%2C+D.+et+al.&rft.au=Petcu%2C+D.%3B+Iuhasz%2C+G.%3B+Pop%2C+D.+et+al.&rft.date=2015&rft.volume=16&rft.issue=4&rft_id=info:doi\/10.12694%2Fscpe.v16i4.1134&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TardieuX10_16-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-TardieuX10_16_2-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Tardieu, O.; Herta, B.; Cunningham, D. et al. (2016). \"X10 and APGAS at Petascale\". <i>ACM Transactions on Parallel Computing (TOPC)<\/i> <b>2<\/b> (4): 25. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F2894746\" data-key=\"2d4350b6777806c6c1cd3ec6aee11538\">10.1145\/2894746<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=X10+and+APGAS+at+Petascale&rft.jtitle=ACM+Transactions+on+Parallel+Computing+%28TOPC%29&rft.aulast=Tardieu%2C+O.%3B+Herta%2C+B.%3B+Cunningham%2C+D.+et+al.&rft.au=Tardieu%2C+O.%3B+Herta%2C+B.%3B+Cunningham%2C+D.+et+al.&rft.date=2016&rft.volume=2&rft.issue=4&rft.pages=25&rft_id=info:doi\/10.1145%2F2894746&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TaliaMaking15-3\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-TaliaMaking15_3-0\">3.0<\/a><\/sup> <sup><a href=\"#cite_ref-TaliaMaking15_3-1\">3.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Talia, D. (2015). \"Making knowledge discovery services scalable on clouds for big data mining\". <i>Proceedings from the Second IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM)<\/i>: 1\u20134. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FICSDM.2015.7298015\" data-key=\"b03b78d7163c969a967c011476c987d7\">10.1109\/ICSDM.2015.7298015<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Making+knowledge+discovery+services+scalable+on+clouds+for+big+data+mining&rft.jtitle=Proceedings+from+the+Second+IEEE+International+Conference+on+Spatial+Data+Mining+and+Geographical+Knowledge+Services+%28ICSDM%29&rft.aulast=Talia%2C+D.&rft.au=Talia%2C+D.&rft.date=2015&rft.pages=1%E2%80%934&rft_id=info:doi\/10.1109%2FICSDM.2015.7298015&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AmarasingheExa09-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AmarasingheExa09_4-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Amarasinghe, S.; Campbell, D.; Carlson, W. et al. (14 September 2009). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.205.3944\" data-key=\"66f5ac6decf5dd107b457c445548f1e3\">\"ExaScale Software Study: Software Challenges in Extreme Scale Systems\"<\/a>. DARPA IPTO. pp. 153. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1.1.205.3944\" data-key=\"c45f069bb89122cc0c1dcea4a81e5092\">10.1.1.205.3944<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.205.3944\" data-key=\"66f5ac6decf5dd107b457c445548f1e3\">http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.205.3944<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=ExaScale+Software+Study%3A+Software+Challenges+in+Extreme+Scale+Systems&rft.atitle=&rft.aulast=Amarasinghe%2C+S.%3B+Campbell%2C+D.%3B+Carlson%2C+W.+et+al.&rft.au=Amarasinghe%2C+S.%3B+Campbell%2C+D.%3B+Carlson%2C+W.+et+al.&rft.date=14+September+2009&rft.pages=pp.+153&rft.pub=DARPA+IPTO&rft_id=info:doi\/10.1.1.205.3944&rft_id=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fsummary%3Fdoi%3D10.1.1.205.3944&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ZahariaApache16-5\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-ZahariaApache16_5-0\">5.0<\/a><\/sup> <sup><a href=\"#cite_ref-ZahariaApache16_5-1\">5.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Zaharia. M.; Xin, R.S.; Wendell, P. et al. (2016). \"Apache Spark: A unified engine for big data processing\". <i>Communications of the ACM<\/i> <b>59<\/b> (11): 56\u201365. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F2934664\" data-key=\"8de293ee9a58d376f1dcfa46b992c1af\">10.1145\/2934664<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Apache+Spark%3A+A+unified+engine+for+big+data+processing&rft.jtitle=Communications+of+the+ACM&rft.aulast=Zaharia.+M.%3B+Xin%2C+R.S.%3B+Wendell%2C+P.+et+al.&rft.au=Zaharia.+M.%3B+Xin%2C+R.S.%3B+Wendell%2C+P.+et+al.&rft.date=2016&rft.volume=59&rft.issue=11&rft.pages=56%E2%80%9365&rft_id=info:doi\/10.1145%2F2934664&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MaheshwariScientific10-6\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MaheshwariScientific10_6-0\">6.0<\/a><\/sup> <sup><a href=\"#cite_ref-MaheshwariScientific10_6-1\">6.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Maheshwari, K.; Montagnat, J. (2010). \"Scientific Workflow Development Using Both Visual and Script-Based Representation\". <i>6th World Congress on Services<\/i>: 328\u201335. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FSERVICES.2010.14\" data-key=\"8ae3c6f434af2ab1763860a6c12b23af\">10.1109\/SERVICES.2010.14<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scientific+Workflow+Development+Using+Both+Visual+and+Script-Based+Representation&rft.jtitle=6th+World+Congress+on+Services&rft.aulast=Maheshwari%2C+K.%3B+Montagnat%2C+J.&rft.au=Maheshwari%2C+K.%3B+Montagnat%2C+J.&rft.date=2010&rft.pages=328%E2%80%9335&rft_id=info:doi\/10.1109%2FSERVICES.2010.14&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ReedExa15-7\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-ReedExa15_7-0\">7.0<\/a><\/sup> <sup><a href=\"#cite_ref-ReedExa15_7-1\">7.1<\/a><\/sup> <sup><a href=\"#cite_ref-ReedExa15_7-2\">7.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Reed, D.A.; Dongarra, J. (2015). \"Exascale computing and big data\". <i>Communications of the ACM<\/i> <b>58<\/b> (7): 56\u201368. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F2699414\" data-key=\"3925fd371d421d862a45bb95a6482452\">10.1145\/2699414<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Exascale+computing+and+big+data&rft.jtitle=Communications+of+the+ACM&rft.aulast=Reed%2C+D.A.%3B+Dongarra%2C+J.&rft.au=Reed%2C+D.A.%3B+Dongarra%2C+J.&rft.date=2015&rft.volume=58&rft.issue=7&rft.pages=56%E2%80%9368&rft_id=info:doi\/10.1145%2F2699414&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ArmbrustAView10-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ArmbrustAView10_8-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Armbrust, M.; Fox, A.; Griffith, R. et al. (2010). \"A view of cloud computing\". <i>Communications of the ACM<\/i> <b>53<\/b> (4): 50\u201358. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F1721654.1721672\" data-key=\"1fd5b22236b4a75e8a63c0390b08bcba\">10.1145\/1721654.1721672<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+view+of+cloud+computing&rft.jtitle=Communications+of+the+ACM&rft.aulast=Armbrust%2C+M.%3B+Fox%2C+A.%3B+Griffith%2C+R.+et+al.&rft.au=Armbrust%2C+M.%3B+Fox%2C+A.%3B+Griffith%2C+R.+et+al.&rft.date=2010&rft.volume=53&rft.issue=4&rft.pages=50%E2%80%9358&rft_id=info:doi\/10.1145%2F1721654.1721672&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GuSector09-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GuSector09_9-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Gu, Y.; Grossman, R.L. (2009). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3391065\" data-key=\"7726005e92c728797a9871c2bf21653c\">\"Sector and Sphere: The design and implementation of a high-performance data cloud\"<\/a>. <i>Philosophical Transactions, Series A: Mathematical, Physical, and Engineering Sciences<\/i> <b>367<\/b> (1897): 2429\u201345. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1098%2Frsta.2009.0053\" data-key=\"414d192726fc6ce534c75673090f4931\">10.1098\/rsta.2009.0053<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3391065\/\" data-key=\"8d7bbdc76619d44096aaabaa278cd121\">PMC3391065<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/19451100\" data-key=\"ab149cf3bf0593265007414593e2ce46\">19451100<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3391065\" data-key=\"7726005e92c728797a9871c2bf21653c\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3391065<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sector+and+Sphere%3A+The+design+and+implementation+of+a+high-performance+data+cloud&rft.jtitle=Philosophical+Transactions%2C+Series+A%3A+Mathematical%2C+Physical%2C+and+Engineering+Sciences&rft.aulast=Gu%2C+Y.%3B+Grossman%2C+R.L.&rft.au=Gu%2C+Y.%3B+Grossman%2C+R.L.&rft.date=2009&rft.volume=367&rft.issue=1897&rft.pages=2429%E2%80%9345&rft_id=info:doi\/10.1098%2Frsta.2009.0053&rft_id=info:pmc\/PMC3391065&rft_id=info:pmid\/19451100&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3391065&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TaliaData15-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-TaliaData15_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Talia, D.; Trunfio, P.; Marozzo, F. (2015). <i>Data Analysis in the Cloud<\/i>. Elsevier. pp. 150. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780128029145.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Data+Analysis+in+the+Cloud&rft.aulast=Talia%2C+D.%3B+Trunfio%2C+P.%3B+Marozzo%2C+F.&rft.au=Talia%2C+D.%3B+Trunfio%2C+P.%3B+Marozzo%2C+F.&rft.date=2015&rft.pages=pp.%26nbsp%3B150&rft.pub=Elsevier&rft.isbn=9780128029145&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HwangCloud17-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HwangCloud17_11-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Hwang, K. (2017). <i>Cloud Computing for Machine Learning and Cognitive Applications<\/i>. MIT Press. pp. 624. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780262036412.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Cloud+Computing+for+Machine+Learning+and+Cognitive+Applications&rft.aulast=Hwang%2C+K.&rft.au=Hwang%2C+K.&rft.date=2017&rft.pages=pp.%26nbsp%3B624&rft.pub=MIT+Press&rft.isbn=9780262036412&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MarozzoACloud13-12\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MarozzoACloud13_12-0\">12.0<\/a><\/sup> <sup><a href=\"#cite_ref-MarozzoACloud13_12-1\">12.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Marozzo, F.; Talia, D.; Trunfio, P. (2013). \"A Cloud Framework for Big Data Analytics Workflows on Azure\". In Catlett, C., Gentzsch, W., Grandinetti, L. et al.. <i>Cloud Computing and Big Data<\/i>. Advances in Parallel Computing. <b>23<\/b>. pp. 182\u201391. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.3233%2F978-1-61499-322-3-182\" data-key=\"528b8e7e68a3b477ba8ad0c813fd923f\">10.3233\/978-1-61499-322-3-182<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9781614993223.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=A+Cloud+Framework+for+Big+Data+Analytics+Workflows+on+Azure&rft.atitle=Cloud+Computing+and+Big+Data&rft.aulast=Marozzo%2C+F.%3B+Talia%2C+D.%3B+Trunfio%2C+P.&rft.au=Marozzo%2C+F.%3B+Talia%2C+D.%3B+Trunfio%2C+P.&rft.date=2013&rft.series=Advances+in+Parallel+Computing&rft.volume=23&rft.pages=pp.%26nbsp%3B182%E2%80%9391&rft_id=info:doi\/10.3233%2F978-1-61499-322-3-182&rft.isbn=9781614993223&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MarozzoJS4_15-13\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MarozzoJS4_15_13-0\">13.0<\/a><\/sup> <sup><a href=\"#cite_ref-MarozzoJS4_15_13-1\">13.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Marozzo, F.; Talia, D.; Trunfio, P. (2015). \"JS4Cloud: script\u2010based workflow programming for scalable data analysis on cloud platforms\". <i>Concurrency and Computation: Practice and Experience<\/i> <b>27<\/b> (17): 5214\u201337. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1002%2Fcpe.3563\" data-key=\"ed68c2ca05b84422a588b045c5d959bd\">10.1002\/cpe.3563<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=JS4Cloud%3A+script%E2%80%90based+workflow+programming+for+scalable+data+analysis+on+cloud+platforms&rft.jtitle=Concurrency+and+Computation%3A+Practice+and+Experience&rft.aulast=Marozzo%2C+F.%3B+Talia%2C+D.%3B+Trunfio%2C+P.&rft.au=Marozzo%2C+F.%3B+Talia%2C+D.%3B+Trunfio%2C+P.&rft.date=2015&rft.volume=27&rft.issue=17&rft.pages=5214%E2%80%9337&rft_id=info:doi\/10.1002%2Fcpe.3563&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TaliaClouds13-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-TaliaClouds13_14-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Talia, D. (2013). \"Clouds for Scalable Big Data Analytics\". <i>Computer<\/i> <b>46<\/b> (5): 98\u2013101. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FMC.2013.162\" data-key=\"280493acc9ce7f8ce39dadeff325b70c\">10.1109\/MC.2013.162<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Clouds+for+Scalable+Big+Data+Analytics&rft.jtitle=Computer&rft.aulast=Talia%2C+D.&rft.au=Talia%2C+D.&rft.date=2013&rft.volume=46&rft.issue=5&rft.pages=98%E2%80%93101&rft_id=info:doi\/10.1109%2FMC.2013.162&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WozniakLang14-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WozniakLang14_15-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wozniak, J.M.; Wilde, M.; Foster, I.T. (2014). \"Language Features for Scalable Distributed-Memory Dataflow Computing\". <i>Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing<\/i>: 50\u201353. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FDFM.2014.17\" data-key=\"74c8f31bdf369402f827a0e4e8ee5dc7\">10.1109\/DFM.2014.17<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Language+Features+for+Scalable+Distributed-Memory+Dataflow+Computing&rft.jtitle=Fourth+Workshop+on+Data-Flow+Execution+Models+for+Extreme+Scale+Computing&rft.aulast=Wozniak%2C+J.M.%3B+Wilde%2C+M.%3B+Foster%2C+I.T.&rft.au=Wozniak%2C+J.M.%3B+Wilde%2C+M.%3B+Foster%2C+I.T.&rft.date=2014&rft.pages=50%E2%80%9353&rft_id=info:doi\/10.1109%2FDFM.2014.17&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LucasTopTen14-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LucasTopTen14_16-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Lucas, R.; Ang, J.; Bergman, K. et al. (10 February 2014). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/science.energy.gov\/~\/media\/ascr\/ascac\/pdf\/meetings\/20140210\/Top10reportFEB14.pdf\" data-key=\"bca80125254aafabea3ac21845e85093\">\"Top Ten Exascale Research Challenges\"<\/a> (PDF). U.S. Department of Energy. pp. 80<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/science.energy.gov\/~\/media\/ascr\/ascac\/pdf\/meetings\/20140210\/Top10reportFEB14.pdf\" data-key=\"bca80125254aafabea3ac21845e85093\">https:\/\/science.energy.gov\/~\/media\/ascr\/ascac\/pdf\/meetings\/20140210\/Top10reportFEB14.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Top+Ten+Exascale+Research+Challenges&rft.atitle=&rft.aulast=Lucas%2C+R.%3B+Ang%2C+J.%3B+Bergman%2C+K.+et+al.&rft.au=Lucas%2C+R.%3B+Ang%2C+J.%3B+Bergman%2C+K.+et+al.&rft.date=10+February+2014&rft.pages=pp.+80&rft.pub=U.S.+Department+of+Energy&rft_id=https%3A%2F%2Fscience.energy.gov%2F%7E%2Fmedia%2Fascr%2Fascac%2Fpdf%2Fmeetings%2F20140210%2FTop10reportFEB14.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FeketeTheImp93-17\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-FeketeTheImp93_17-0\">17.0<\/a><\/sup> <sup><a href=\"#cite_ref-FeketeTheImp93_17-1\">17.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Fekete, A.; Lynch, N.; Mansour, Y.; Spinelli, J. (1993). \"The impossibility of implementing reliable communication in the face of crashes\". <i>Journal of the ACM<\/i> <b>40<\/b> (5): 1087\u20131107. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F174147.169676\" data-key=\"7aca4e22b4851941a81cac72139cba22\">10.1145\/174147.169676<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+impossibility+of+implementing+reliable+communication+in+the+face+of+crashes&rft.jtitle=Journal+of+the+ACM&rft.aulast=Fekete%2C+A.%3B+Lynch%2C+N.%3B+Mansour%2C+Y.%3B+Spinelli%2C+J.&rft.au=Fekete%2C+A.%3B+Lynch%2C+N.%3B+Mansour%2C+Y.%3B+Spinelli%2C+J.&rft.date=1993&rft.volume=40&rft.issue=5&rft.pages=1087%E2%80%931107&rft_id=info:doi\/10.1145%2F174147.169676&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-IDCTheDig14-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-IDCTheDig14_18-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">IDC (April 2014). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.emc.com\/leadership\/digital-universe\/2014iview\/executive-summary.htm\" data-key=\"0045d1ff58c97f86698c54e5f9029fac\">\"The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things\"<\/a>. Dell EMC<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.emc.com\/leadership\/digital-universe\/2014iview\/executive-summary.htm\" data-key=\"0045d1ff58c97f86698c54e5f9029fac\">https:\/\/www.emc.com\/leadership\/digital-universe\/2014iview\/executive-summary.htm<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+Digital+Universe+of+Opportunities%3A+Rich+Data+and+the+Increasing+Value+of+the+Internet+of+Things&rft.atitle=&rft.aulast=IDC&rft.au=IDC&rft.date=April+2014&rft.pub=Dell+EMC&rft_id=https%3A%2F%2Fwww.emc.com%2Fleadership%2Fdigital-universe%2F2014iview%2Fexecutive-summary.htm&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ChenSyn13-19\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-ChenSyn13_19-0\">19.0<\/a><\/sup> <sup><a href=\"#cite_ref-ChenSyn13_19-1\">19.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Chen, J.; Choudhary, A.; Feldman, S. et al. (March 2013). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.scholars.northwestern.edu\/en\/publications\/synergistic-challenges-in-data-intensive-science-and-exascale-com\" data-key=\"bd7a7da744f43b3b0cb7f64535f3cafe\">\"Synergistic Challenges in Data-Intensive Science and Exascale Computing: DOE ASCAC Data Subcommittee Report\"<\/a>. Department of Energy, Office of Science<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.scholars.northwestern.edu\/en\/publications\/synergistic-challenges-in-data-intensive-science-and-exascale-com\" data-key=\"bd7a7da744f43b3b0cb7f64535f3cafe\">https:\/\/www.scholars.northwestern.edu\/en\/publications\/synergistic-challenges-in-data-intensive-science-and-exascale-com<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Synergistic+Challenges+in+Data-Intensive+Science+and+Exascale+Computing%3A+DOE+ASCAC+Data+Subcommittee+Report&rft.atitle=&rft.aulast=Chen%2C+J.%3B+Choudhary%2C+A.%3B+Feldman%2C+S.+et+al.&rft.au=Chen%2C+J.%3B+Choudhary%2C+A.%3B+Feldman%2C+S.+et+al.&rft.date=March+2013&rft.pub=Department+of+Energy%2C+Office+of+Science&rft_id=https%3A%2F%2Fwww.scholars.northwestern.edu%2Fen%2Fpublications%2Fsynergistic-challenges-in-data-intensive-science-and-exascale-com&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-IBMWhat13-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-IBMWhat13_20-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.ibm.com\/annualreport\/2013\/bin\/assets\/2013_ibm_annual.pdf\" data-key=\"0fefe4f6068c34c3f39acfb707bf2dc9\">\"What will we make of this moment?\"<\/a> (PDF). IBM. 2013. pp. 151<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.ibm.com\/annualreport\/2013\/bin\/assets\/2013_ibm_annual.pdf\" data-key=\"0fefe4f6068c34c3f39acfb707bf2dc9\">https:\/\/www.ibm.com\/annualreport\/2013\/bin\/assets\/2013_ibm_annual.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=What+will+we+make+of+this+moment%3F&rft.atitle=&rft.date=2013&rft.pages=pp.+151&rft.pub=IBM&rft_id=https%3A%2F%2Fwww.ibm.com%2Fannualreport%2F2013%2Fbin%2Fassets%2F2013_ibm_annual.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DiazASurv12-21\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-DiazASurv12_21-0\">21.0<\/a><\/sup> <sup><a href=\"#cite_ref-DiazASurv12_21-1\">21.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Diaz, J.; Mu\u00f1oz-Caro, C.; Ni\u00f1o, A. (2012). \"A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era\". <i>IEEE Transactions on Parallel and Distributed Systems<\/i> <b>23<\/b> (8): 1369\u201386. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FTPDS.2011.308\" data-key=\"5ff6a76fe160b7783bd17b4d12a9935a\">10.1109\/TPDS.2011.308<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Survey+of+Parallel+Programming+Models+and+Tools+in+the+Multi+and+Many-Core+Era&rft.jtitle=IEEE+Transactions+on+Parallel+and+Distributed+Systems&rft.aulast=Diaz%2C+J.%3B+Mu%C3%B1oz-Caro%2C+C.%3B+Ni%C3%B1o%2C+A.&rft.au=Diaz%2C+J.%3B+Mu%C3%B1oz-Caro%2C+C.%3B+Ni%C3%B1o%2C+A.&rft.date=2012&rft.volume=23&rft.issue=8&rft.pages=1369%E2%80%9386&rft_id=info:doi\/10.1109%2FTPDS.2011.308&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GortonData08-22\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GortonData08_22-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Gorton, I.; Greenfield, P.; Szalay, A.; Willimas, R. (2008). \"Data-Intensive Computing in the 21st Century\". <i>Computer<\/i> <b>41<\/b> (4): 30\u201332. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FMC.2008.122\" data-key=\"3f51ae8b7a4ab2d4fb57b05b35368607\">10.1109\/MC.2008.122<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data-Intensive+Computing+in+the+21st+Century&rft.jtitle=Computer&rft.aulast=Gorton%2C+I.%3B+Greenfield%2C+P.%3B+Szalay%2C+A.%3B+Willimas%2C+R.&rft.au=Gorton%2C+I.%3B+Greenfield%2C+P.%3B+Szalay%2C+A.%3B+Willimas%2C+R.&rft.date=2008&rft.volume=41&rft.issue=4&rft.pages=30%E2%80%9332&rft_id=info:doi\/10.1109%2FMC.2008.122&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MarkidisTheEPi16-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MarkidisTheEPi16_23-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Markidis, S.; Peng, I.B.; Larsson, J. et al. (2016). \"The EPiGRAM Project: Preparing Parallel Programming Models for Exascale\". <i>High Performance Computing - ISC High Performance 2016<\/i>: 56\u201368. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-319-46079-6_5\" data-key=\"e015023dea4d2cf374a4c360ca93d6d9\">10.1007\/978-3-319-46079-6_5<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+EPiGRAM+Project%3A+Preparing+Parallel+Programming+Models+for+Exascale&rft.jtitle=High+Performance+Computing+-+ISC+High+Performance+2016&rft.aulast=Markidis%2C+S.%3B+Peng%2C+I.B.%3B+Larsson%2C+J.+et+al.&rft.au=Markidis%2C+S.%3B+Peng%2C+I.B.%3B+Larsson%2C+J.+et+al.&rft.date=2016&rft.pages=56%E2%80%9368&rft_id=info:doi\/10.1007%2F978-3-319-46079-6_5&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Fern.C3.A1ndezTask14-24\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-Fern.C3.A1ndezTask14_24-0\">24.0<\/a><\/sup> <sup><a href=\"#cite_ref-Fern.C3.A1ndezTask14_24-1\">24.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Fern\u00e1ndez, A.; Beltran, V.; Martorell, X. et al. (2014). \"Task-Based Programming with OmpSs and Its Application\". <i>Euro-Par 2014: Parallel Processing Workshops<\/i>: 601\u201312. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-319-14313-2_51\" data-key=\"5dda3a1863368c980d4a1102f254eafd\">10.1007\/978-3-319-14313-2_51<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Task-Based+Programming+with+OmpSs+and+Its+Application&rft.jtitle=Euro-Par+2014%3A+Parallel+Processing+Workshops&rft.aulast=Fern%C3%A1ndez%2C+A.%3B+Beltran%2C+V.%3B+Martorell%2C+X.+et+al.&rft.au=Fern%C3%A1ndez%2C+A.%3B+Beltran%2C+V.%3B+Martorell%2C+X.+et+al.&rft.date=2014&rft.pages=601%E2%80%9312&rft_id=info:doi\/10.1007%2F978-3-319-14313-2_51&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-OlstonPig08-25\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-OlstonPig08_25-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Olston, C.; Reed, B.; Srivastava, U. et al. (2008). \"Pig Latin: A not-so-foreign language for data processing\". <i>Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data<\/i>: 1099\u20131110. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F1376616.1376726\" data-key=\"a8d5a406f0033f19dbbf6a31cbb6560f\">10.1145\/1376616.1376726<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Pig+Latin%3A+A+not-so-foreign+language+for+data+processing&rft.jtitle=Proceedings+of+the+2008+ACM+SIGMOD+International+Conference+on+Management+of+Data&rft.aulast=Olston%2C+C.%3B+Reed%2C+B.%3B+Srivastava%2C+U.+et+al.&rft.au=Olston%2C+C.%3B+Reed%2C+B.%3B+Srivastava%2C+U.+et+al.&rft.date=2008&rft.pages=1099%E2%80%931110&rft_id=info:doi\/10.1145%2F1376616.1376726&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GroppProg13-26\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-GroppProg13_26-0\">26.0<\/a><\/sup> <sup><a href=\"#cite_ref-GroppProg13_26-1\">26.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Gropp, W.; Snir, M. (2013). \"Programming for Exascale Computers\". <i>Computing in Science & Engineering<\/i> <b>15<\/b> (6): 27\u201335. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FMCSE.2013.96\" data-key=\"79d20bdd3ccadce8a46a3474e1aed219\">10.1109\/MCSE.2013.96<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Programming+for+Exascale+Computers&rft.jtitle=Computing+in+Science+%26+Engineering&rft.aulast=Gropp%2C+W.%3B+Snir%2C+M.&rft.au=Gropp%2C+W.%3B+Snir%2C+M.&rft.date=2013&rft.volume=15&rft.issue=6&rft.pages=27%E2%80%9335&rft_id=info:doi\/10.1109%2FMCSE.2013.96&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MarozzoP2P12-27\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MarozzoP2P12_27-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Marozzo, F.; Talia, D.; Trunfio, P. (2012). \"P2P-MapReduce: Parallel data processing in dynamic cloud environments\". <i>Journal of Computer and System Sciences<\/i> <b>78<\/b> (5): 1382\u20131402. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.jcss.2011.12.021\" data-key=\"46525d8150311c4f7226a9f6fb75a3a5\">10.1016\/j.jcss.2011.12.021<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=P2P-MapReduce%3A+Parallel+data+processing+in+dynamic+cloud+environments&rft.jtitle=Journal+of+Computer+and+System+Sciences&rft.aulast=Marozzo%2C+F.%3B+Talia%2C+D.%3B+Trunfio%2C+P.&rft.au=Marozzo%2C+F.%3B+Talia%2C+D.%3B+Trunfio%2C+P.&rft.date=2012&rft.volume=78&rft.issue=5&rft.pages=1382%E2%80%931402&rft_id=info:doi\/10.1016%2Fj.jcss.2011.12.021&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TardieuX10_14-28\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-TardieuX10_14_28-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Tardieu, O.; Herta, B.; Cunningham, D. et al. (2014). \"X10 and APGAS at Petascale\". <i>Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming<\/i>: 53\u201366. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F2555243.2555245\" data-key=\"d7aee3a646864bba7998c7c196a26759\">10.1145\/2555243.2555245<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=X10+and+APGAS+at+Petascale&rft.jtitle=Proceedings+of+the+19th+ACM+SIGPLAN+Symposium+on+Principles+and+Practice+of+Parallel+Programming&rft.aulast=Tardieu%2C+O.%3B+Herta%2C+B.%3B+Cunningham%2C+D.+et+al.&rft.au=Tardieu%2C+O.%3B+Herta%2C+B.%3B+Cunningham%2C+D.+et+al.&rft.date=2014&rft.pages=53%E2%80%9366&rft_id=info:doi\/10.1145%2F2555243.2555245&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-YooEval09-29\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-YooEval09_29-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Yoo, A.; Kaplan, I. (2009). \"Evaluating use of data flow systems for large graph analysis\". <i>Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers<\/i>: 5. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F1646468.1646473\" data-key=\"b9ece0bfdeb1d0f215512f3d4173d2e4\">10.1145\/1646468.1646473<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluating+use+of+data+flow+systems+for+large+graph+analysis&rft.jtitle=Proceedings+of+the+2nd+Workshop+on+Many-Task+Computing+on+Grids+and+Supercomputers&rft.aulast=Yoo%2C+A.%3B+Kaplan%2C+I.&rft.au=Yoo%2C+A.%3B+Kaplan%2C+I.&rft.date=2009&rft.pages=5&rft_id=info:doi\/10.1145%2F1646468.1646473&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NishtalaTuning11-30\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NishtalaTuning11_30-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Nishtala, R.; Zheng, Y.; Hargrove, P.H. et al. (2011). \"Tuning collective communication for Partitioned Global Address Space programming models\". <i>Parallel Computing<\/i> <b>37<\/b> (9): 576\u201391. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.parco.2011.05.006\" data-key=\"cb5bb65bc725a66f80c16f282891d9f8\">10.1016\/j.parco.2011.05.006<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Tuning+collective+communication+for+Partitioned+Global+Address+Space+programming+models&rft.jtitle=Parallel+Computing&rft.aulast=Nishtala%2C+R.%3B+Zheng%2C+Y.%3B+Hargrove%2C+P.H.+et+al.&rft.au=Nishtala%2C+R.%3B+Zheng%2C+Y.%3B+Hargrove%2C+P.H.+et+al.&rft.date=2011&rft.volume=37&rft.issue=9&rft.pages=576%E2%80%9391&rft_id=info:doi\/10.1016%2Fj.parco.2011.05.006&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BauerLegion12-31\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BauerLegion12_31-0\">31.0<\/a><\/sup> <sup><a href=\"#cite_ref-BauerLegion12_31-1\">31.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bauer, M.; Treichler, S.; Slaughter, E.; Aiken, A. (2012). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/dl.acm.org\/citation.cfm?id=2389086\" data-key=\"f15d90bc7d517ae60f34563f9c16d9a7\">\"Legion: Expressing locality and independence with logical regions\"<\/a>. <i>Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis<\/i>: 66<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/dl.acm.org\/citation.cfm?id=2389086\" data-key=\"f15d90bc7d517ae60f34563f9c16d9a7\">https:\/\/dl.acm.org\/citation.cfm?id=2389086<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Legion%3A+Expressing+locality+and+independence+with+logical+regions&rft.jtitle=Proceedings+of+the+International+Conference+on+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.aulast=Bauer%2C+M.%3B+Treichler%2C+S.%3B+Slaughter%2C+E.%3B+Aiken%2C+A.&rft.au=Bauer%2C+M.%3B+Treichler%2C+S.%3B+Slaughter%2C+E.%3B+Aiken%2C+A.&rft.date=2012&rft.pages=66&rft_id=https%3A%2F%2Fdl.acm.org%2Fcitation.cfm%3Fid%3D2389086&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ChamberlainParallel07-32\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-ChamberlainParallel07_32-0\">32.0<\/a><\/sup> <sup><a href=\"#cite_ref-ChamberlainParallel07_32-1\">32.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Chamberlain, B.L.; Callahan, D.; Zima, H.P. (2007). \"Parallel Programmability and the Chapel Language\". <i>The International Journal of High Performance Computing Applications<\/i> <b>21<\/b> (3): 291\u2013312. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1177%2F1094342007078442\" data-key=\"c3fe9c3ce5bc162451a0d2455be4abfc\">10.1177\/1094342007078442<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+Programmability+and+the+Chapel+Language&rft.jtitle=The+International+Journal+of+High+Performance+Computing+Applications&rft.aulast=Chamberlain%2C+B.L.%3B+Callahan%2C+D.%3B+Zima%2C+H.P.&rft.au=Chamberlain%2C+B.L.%3B+Callahan%2C+D.%3B+Zima%2C+H.P.&rft.date=2007&rft.volume=21&rft.issue=3&rft.pages=291%E2%80%93312&rft_id=info:doi\/10.1177%2F1094342007078442&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NieplochaAdvances06-33\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NieplochaAdvances06_33-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Nieplocha, J.; Palmer, B.; Tipparaju, V. et al. (2006). \"Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit\". <i>The International Journal of High Performance Computing Applications<\/i> <b>20<\/b> (2): 203\u201331. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1177%2F1094342006064503\" data-key=\"096bda1090311f199f72a789fb62b631\">10.1177\/1094342006064503<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Advances%2C+Applications+and+Performance+of+the+Global+Arrays+Shared+Memory+Programming+Toolkit&rft.jtitle=The+International+Journal+of+High+Performance+Computing+Applications&rft.aulast=Nieplocha%2C+J.%3B+Palmer%2C+B.%3B+Tipparaju%2C+V.+et+al.&rft.au=Nieplocha%2C+J.%3B+Palmer%2C+B.%3B+Tipparaju%2C+V.+et+al.&rft.date=2006&rft.volume=20&rft.issue=2&rft.pages=203%E2%80%9331&rft_id=info:doi\/10.1177%2F1094342006064503&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MeswaniTools12-34\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MeswaniTools12_34-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Meswani, M.R.; Carrington, L.; Snavely, A.; Poole, S. (2012). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/cug.org\/proceedings\/attendee_program_cug2012\/by_auth.html\" data-key=\"11756f9b1fcac9219d1527746fc20f2c\">\"Tools for Benchmarking, Tracing, and Simulating SHMEM Applications\"<\/a>. <i>CUG2012 Final Proceedings<\/i>: 1\u20136<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/cug.org\/proceedings\/attendee_program_cug2012\/by_auth.html\" data-key=\"11756f9b1fcac9219d1527746fc20f2c\">https:\/\/cug.org\/proceedings\/attendee_program_cug2012\/by_auth.html<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Tools+for+Benchmarking%2C+Tracing%2C+and+Simulating+SHMEM+Applications&rft.jtitle=CUG2012+Final+Proceedings&rft.aulast=Meswani%2C+M.R.%3B+Carrington%2C+L.%3B+Snavely%2C+A.%3B+Poole%2C+S.&rft.au=Meswani%2C+M.R.%3B+Carrington%2C+L.%3B+Snavely%2C+A.%3B+Poole%2C+S.&rft.date=2012&rft.pages=1%E2%80%936&rft_id=https%3A%2F%2Fcug.org%2Fproceedings%2Fattendee_program_cug2012%2Fby_auth.html&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GrassoLibWater13-35\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GrassoLibWater13_35-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Crasso, I.; Pellagrini, S.; Cosenza, B.; Fahringer, T. (2013). \"LibWater: Heterogeneous distributed computing made easy\". <i>Proceedings of the 27th International ACM conference on Supercomputing<\/i>: 161\u201372. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F2464996.2465008\" data-key=\"4f7b57f60352ae1b1facb9324e5044ce\">10.1145\/2464996.2465008<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=LibWater%3A+Heterogeneous+distributed+computing+made+easy&rft.jtitle=Proceedings+of+the+27th+International+ACM+conference+on+Supercomputing&rft.aulast=Crasso%2C+I.%3B+Pellagrini%2C+S.%3B+Cosenza%2C+B.%3B+Fahringer%2C+T.&rft.au=Crasso%2C+I.%3B+Pellagrini%2C+S.%3B+Cosenza%2C+B.%3B+Fahringer%2C+T.&rft.date=2013&rft.pages=161%E2%80%9372&rft_id=info:doi\/10.1145%2F2464996.2465008&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Sarkar2014_16-36\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Sarkar2014_16_36-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Sarkar, V.; Budimlic, Z.; Kulkani, M. (19 September 2016). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.osti.gov\/biblio\/1341724-runtime-systems-summit-runtime-systems-report\" data-key=\"c579402d3fb3f59f94015f38a1fc4f29\">\"2014 Runtime Systems Summit. Runtime Systems Report\"<\/a>. U.S. Department of Energy. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.2172%2F1341724\" data-key=\"6d38482bb1c202acf60e1136e760351a\">10.2172\/1341724<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.osti.gov\/biblio\/1341724-runtime-systems-summit-runtime-systems-report\" data-key=\"c579402d3fb3f59f94015f38a1fc4f29\">https:\/\/www.osti.gov\/biblio\/1341724-runtime-systems-summit-runtime-systems-report<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=2014+Runtime+Systems+Summit.+Runtime+Systems+Report&rft.atitle=&rft.aulast=Sarkar%2C+V.%3B+Budimlic%2C+Z.%3B+Kulkani%2C+M.&rft.au=Sarkar%2C+V.%3B+Budimlic%2C+Z.%3B+Kulkani%2C+M.&rft.date=19+September+2016&rft.pub=U.S.+Department+of+Energy&rft_id=info:doi\/10.2172%2F1341724&rft_id=https%3A%2F%2Fwww.osti.gov%2Fbiblio%2F1341724-runtime-systems-summit-runtime-systems-report&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AmdahlValid67-37\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AmdahlValid67_37-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Amdahl, G.M. (1967). \"Validity of single-processor approach to achieving large-scale computing capability\". <i>Proceedings of AFIPS Conference<\/i>: 483\u201385.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Validity+of+single-processor+approach+to+achieving+large-scale+computing+capability&rft.jtitle=Proceedings+of+AFIPS+Conference&rft.aulast=Amdahl%2C+G.M.&rft.au=Amdahl%2C+G.M.&rft.date=1967&rft.pages=483%E2%80%9385&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BaileyTwelve91-38\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BaileyTwelve91_38-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bailey, D.H. (1991). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/crd-legacy.lbl.gov\/~dhbailey\/dhbpapers\/twelve-ways.pdf\" data-key=\"aeb389b76059c213d24ee3b7167dde17\">\"Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers\"<\/a> (PDF). <i>Supercomputing Review<\/i>: 54\u201355<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/crd-legacy.lbl.gov\/~dhbailey\/dhbpapers\/twelve-ways.pdf\" data-key=\"aeb389b76059c213d24ee3b7167dde17\">https:\/\/crd-legacy.lbl.gov\/~dhbailey\/dhbpapers\/twelve-ways.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Twelve+Ways+to+Fool+the+Masses+When+Giving+Performance+Results+on+Parallel+Computers&rft.jtitle=Supercomputing+Review&rft.aulast=Bailey%2C+D.H.&rft.au=Bailey%2C+D.H.&rft.date=1991&rft.pages=54%E2%80%9355&rft_id=https%3A%2F%2Fcrd-legacy.lbl.gov%2F%7Edhbailey%2Fdhbpapers%2Ftwelve-ways.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GramaIntro03-39\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GramaIntro03_39-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Grama, A.; Karypis, G.; Kumar, V.; Gupta, A. (2003). <i>Introduction to Parallel Computing<\/i> (2nd ed.). Pearson. pp. 656. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780201648652.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Introduction+to+Parallel+Computing&rft.aulast=Grama%2C+A.%3B+Karypis%2C+G.%3B+Kumar%2C+V.%3B+Gupta%2C+A.&rft.au=Grama%2C+A.%3B+Karypis%2C+G.%3B+Kumar%2C+V.%3B+Gupta%2C+A.&rft.date=2003&rft.pages=pp.%26nbsp%3B656&rft.edition=2nd&rft.pub=Pearson&rft.isbn=9780201648652&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GustafsonReeval88-40\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GustafsonReeval88_40-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Gustafson, J.L. (1988). \"Reevaluating Amdahl's law\". <i>Communications of the ACM<\/i> <b>31<\/b> (5): 532\u201333. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F42411.42415\" data-key=\"283f40c824f1f69cb3b3fd2420bbef32\">10.1145\/42411.42415<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reevaluating+Amdahl%27s+law&rft.jtitle=Communications+of+the+ACM&rft.aulast=Gustafson%2C+J.L.&rft.au=Gustafson%2C+J.L.&rft.date=1988&rft.volume=31&rft.issue=5&rft.pages=532%E2%80%9333&rft_id=info:doi\/10.1145%2F42411.42415&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ShiProg12-41\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ShiProg12_41-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Shi, J.Y.; Taifi, M.; Pradeep, A. et al. (2012). \"Program Scalability Analysis for HPC Cloud: Applying Amdahl's Law to NAS Benchmarks\". <i>2012 SC Companion: High Performance Computing, Networking Storage and Analysis<\/i>: 1215\u20131225. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FSC.Companion.2012.147\" data-key=\"ffaa3833e7138df044fed65d8ef194ce\">10.1109\/SC.Companion.2012.147<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Program+Scalability+Analysis+for+HPC+Cloud%3A+Applying+Amdahl%27s+Law+to+NAS+Benchmarks&rft.jtitle=2012+SC+Companion%3A+High+Performance+Computing%2C+Networking+Storage+and+Analysis&rft.aulast=Shi%2C+J.Y.%3B+Taifi%2C+M.%3B+Pradeep%2C+A.+et+al.&rft.au=Shi%2C+J.Y.%3B+Taifi%2C+M.%3B+Pradeep%2C+A.+et+al.&rft.date=2012&rft.pages=1215%E2%80%931225&rft_id=info:doi\/10.1109%2FSC.Companion.2012.147&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SchroederALarge10-42\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SchroederALarge10_42-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Schroeder, B.; Gibson, G. (2010). \"A Large-Scale Study of Failures in High-Performance Computing Systems\". <i>IEEE Transactions on Dependable and Secure Computing<\/i> <b>7<\/b> (4): 337\u201350. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FTDSC.2009.4\" data-key=\"5c0902fda68c87aa0c8aabe2ee7fd5c2\">10.1109\/TDSC.2009.4<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Large-Scale+Study+of+Failures+in+High-Performance+Computing+Systems&rft.jtitle=IEEE+Transactions+on+Dependable+and+Secure+Computing&rft.aulast=Schroeder%2C+B.%3B+Gibson%2C+G.&rft.au=Schroeder%2C+B.%3B+Gibson%2C+G.&rft.date=2010&rft.volume=7&rft.issue=4&rft.pages=337%E2%80%9350&rft_id=info:doi\/10.1109%2FTDSC.2009.4&rfr_id=info:sid\/en.wikipedia.org:Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. Some grammar and punctuation was cleaned up to improve readability. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version\u2014by design\u2014lists them in order of appearance. The lone footnote was turned into an inline reference.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185654\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 1.067 seconds\nReal time usage: 1.989 seconds\nPreprocessor visited node count: 30987\/1000000\nPreprocessor generated node count: 40013\/1000000\nPost\u2010expand include size: 202521\/2097152 bytes\nTemplate argument size: 71686\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 955.320 1 - -total\n 86.05% 822.075 1 - Template:Reflist\n 75.82% 724.300 42 - Template:Citation\/core\n 59.98% 572.959 32 - Template:Cite_journal\n 12.60% 120.323 6 - Template:Cite_web\n 8.08% 77.226 1 - Template:Infobox_journal_article\n 7.81% 74.589 1 - Template:Infobox\n 7.70% 73.561 4 - Template:Cite_book\n 5.35% 51.156 37 - Template:Citation\/identifier\n 4.87% 46.478 80 - Template:Infobox\/row\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10920-0!*!0!!en!5!*!math=5 and timestamp 20190401185652 and revision id 35047\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale\">https:\/\/www.limswiki.org\/index.php\/Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","804be563fdd6e10a6921069440e3e962_images":["https:\/\/www.limswiki.org\/images\/b\/b6\/Fig1_Talia_JOfCloudComp2019_8.png","https:\/\/www.limswiki.org\/images\/4\/41\/Fig2_Talia_JOfCloudComp2019_8.png","https:\/\/www.limswiki.org\/images\/9\/95\/Fig3_Talia_JOfCloudComp2019_8.png"],"804be563fdd6e10a6921069440e3e962_timestamp":1554145012,"86da8ed36fc493b6a573df8d0f7095ac_type":"article","86da8ed36fc493b6a573df8d0f7095ac_title":"What Is health information quality? Ethical dimension and perception by users (Al-Jefri et al. 2018)","86da8ed36fc493b6a573df8d0f7095ac_url":"https:\/\/www.limswiki.org\/index.php\/Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users","86da8ed36fc493b6a573df8d0f7095ac_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:What Is health information quality? Ethical dimension and perception by users\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nWhat Is health information quality? Ethical dimension and perception by usersJournal\n \nFrontiers in MedicineAuthor(s)\n \nAl-Jefri, Majed; Evans, Roger; Uchyigit, Gulden; Ghezzi, PietroAuthor affiliation(s)\n \nUniversity of Brighton, Brighton and Sussex Medical SchoolPrimary contact\n \nEmail: pietro dot ghezzi at gmail dot comEditors\n \nSampaio, CristinaYear published\n \n2018Volume and issue\n \n5Page(s)\n \n260DOI\n \n10.3389\/fmed.2018.00260ISSN\n \n2296-858XDistribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttps:\/\/www.frontiersin.org\/articles\/10.3389\/fmed.2018.00260\/fullDownload\n \nhttps:\/\/www.frontiersin.org\/articles\/10.3389\/fmed.2018.00260\/pdf (PDF)\n\nContents\n\n1 Abstract \n2 Introduction \n3 Methods \n4 Results \n\n4.1 Sample characteristics \n4.2 Ranking of IQ criteria \n4.3 Identification of main dimensions of HIQ \n4.4 Subgroup analysis by educational subject, gender, and language \n4.5 Importance of the scientific correctness of the information provided \n\n\n5 Discussion \n6 Supplementary material \n7 Acknowledgements \n\n7.1 Author contributions \n7.2 Conflict of interest statement \n\n\n8 References \n9 Notes \n\n\n\nAbstract \nIntroduction: The popularity of seeking health information online makes information quality (IQ) a public health issue. The present study aims at building a theoretical framework of health information quality (HIQ) that can be applied to websites and defines which IQ criteria are important for a website to be trustworthy and meet users' expectations.\nMethods: We have identified a list of HIQ criteria from existing tools and assessment criteria and elaborated them into a questionnaire that was promoted via social media and, mainly, the university. Responses (329) were used to rank the different criteria for their importance in trusting a website and to identify patterns of criteria using hierarchical cluster analysis.\nResults: HIQ criteria were organized in five dimensions based on previous theoretical frameworks, as well as on how they cluster together in the questionnaire response. We could identify a top-ranking dimension (scientific completeness) that describes what the user is expecting to know from the websites (in particular: description of symptoms, treatments, side effects). Cluster analysis also identified a number of criteria borrowed from existing tools for assessing HIQ that could be subsumed to a broad \u201cethical\u201d dimension (such as conflict of interests, privacy, advertising policies) that were, in general, ranked of low importance by the participants. Subgroup analysis revealed significant differences in the importance assigned to the various criteria based on gender, language, and whether or not a biomedical educational background was evident.\nConclusions: We identified criteria of HIQ and organized them in dimensions. We observed that ethical criteria, while regarded highly in the academic and medical environment, are not considered highly by the public.\nKeywords: internet, information quality, ethics, online information, public health\n\nIntroduction \nWith the diffusion of the internet, many have been concerned that, due to its unregulated and unfiltered nature, it could misinform or disinform the public. The lack of widely used search engines (Google was founded in 1998) left entirely up to the users which websites to trust among the relatively few ones (compared to 2018) available. These concerns led to the development, in the late 1990s, of instruments and organizations to assess health information quality (HIQ) of websites, including the Journal of the American Medical Association (JAMA) criteria[1], DISCERN[2], and the criteria for meeting the health-on-the-net (HON) code of conduct.[3] These instruments were developed for different purposes: the JAMA and DISCERN tools were aimed at providing customers with instruments to assess websites[1][2]; the HON criteria are used by the HON foundation to certify health websites with the display of the HONCode quality seal, and this was originally aimed at organizations to help them develop websites.[3] The criteria of HIQ considered by these three approaches are listed in Table 1.\n\r\n\n\n\n\n\n\n\n\n\n\n Table 1. Established HIQ instruments and criteria\n\n\n\nThere are no data available to know how many information seekers have used these tools to make assessments. On the other hand, the high number of citations in the scientific literature for the JAMA (1100) and DISCERN (600) tools indicate that these are also widely used, particularly the JAMA criteria, in academic research analyzing HIQ. It should be noted, however, that DISCERN was developed by an expert panel, but then it was actually tested on 13 self-help group members.[2]\nAn important issue, and one that is not assessed by the existing HIQ instruments, is whether websites informing the public on therapies mention therapies approved by regulatory agencies or public health authorities, or non-approved ones. Drug approval requires a high level of evidence of efficacy and benefit\/risk ratio, an approach termed \u201cevidence-based medicine\u201d (EBM).[4] In a way, this is related to the reliability of the information. For instance, a website describing AIDS as a disease due to the HIV virus that can be treated with antiretroviral therapy is higher quality than one stating that AIDS is not due to a virus and should be treated with nutritional supplements.[5]\nHealth information quality should be seen in the wider context of information quality (IQ) generally. The latter has been extensively studied for its applications in business and manufacturing. Information quality is generally considered as a concept with multiple dimensions[6]; depending on an author's philosophical view-point, information quality can have different attributes and characteristics.[7][8] Several studies have developed IQ frameworks based on the definition of IQ dimensions.[6] The best known of these frameworks was developed by Wang[9] and Wang[10], based on a survey among 355 Masters in Business and Administration alumni, aiming to capture aspects of IQ that are important for consumers in the business field. A second study by the same group involved 52 information professionals from the financial, healthcare, and manufacturing sectors.[11] These studies defined 15 IQ criteria, that were grouped into four dimensions[9][10] as shown in Table 2.\n\r\n\n\n\n\n\n\n\n\n\n\n Table 2. Dimensions of IQ\n\n\n\nIt is probably difficult to fit the HIQ criteria from Table 1\u2014which are centered on trustworthiness and scientific correctness\u2014into the theoretical framework of IQ dimensions in Table 2, which are borrowed from other fields. Recent studies have proposed a categorization of HIQ criteria into classical IQ dimensions focusing on IQ criteria identified through focus group, and focusing on the scientific content of webpages.[12]\nWe undertook this project to define the IQ criteria and dimensions relevant to HIQ. To do so, we have used a mixed approach, identifying relevant HIQ criteria using a theoretical approach broadly based on the existing criteria, the JAMA score, HONcode, and DISCERN, as well as an empirical approach, based on a questionnaire, to rank the importance of the various criteria to the end user. In particular, our aim was to evaluate user perceptions of HIQ criteria and their relative importance in trusting health-related websites. Criteria of HIQ were then classified in dimensions based on the existing literature and, using cluster analysis, the ranking by users.\n\nMethods \nTo design a questionnaire, we first identified relevant IQ criteria. These were based on the existing literature on HIQ, the instruments described above (Table 1) the standard IQ criteria listed in Table 2 and other studies.[10][13][14] General criteria, such as correct spelling and grammar or the importance of the presence of multimedia or the ranking by the search engine were also included. Other questions are related to the content of the webpage, such as whether the webpage explains disease symptoms, therapies, how to take medications and their side effects, and if responders are wary of webpages offering quick solutions and miracle cures (we defined this as \u201chyperbole\u201d). The respondents were also asked to rate importance that the information describes treatments based on evidence-based medicine or complementary medicine, as this question would be defining a criterion of reliability (from the scientific point of view) of the information.\nThe full list of HIQ criteria considered is provided in Table 3, that also reports the questions aiming at identifying the importance of those criteria in trusting a health-related website that were used in the questionnaire. The table also shows which criteria were derived from the ones in the known HIQ tools (JAMA, HON, DISCERN). For most of the criteria, the questions were formulated in the form \u201cI trust a health webpage more if\u2026\u201d or \u201cI prefer webpages that\u2026\u201d that were assessed using a 5-point Likert scale (5 = strongly agree, 4 = somewhat agree, 3 = neither agree not disagree, 2 = somewhat disagree, 1 = strongly disagree). Other questions were aiming at defining the demographics of the sample (gender, age, country, education, whether studying in a medically-related subject of not and others) or internet usage (time spent, main search engine used, device used, how often they searched health information, whether searching symptoms or therapies). The entire questionnaire (42 questions) is available as supplementary online information (Supplementary Table 1).\n\r\n\n\n\n\n\n\n\n\n\n\n Table 3. Criteria of HIQ and questions used in the survey\n\n\n\nThe project was approved on January 26, 2017 by the Research Ethics Panel of the School of Computer Engineering and Mathematics of the University of Brighton. The questionnaire was published online using Google forms and promoted using social media such as Twitter, Facebook, and via email, including students and staff at the University of Brighton and students at the Brighton and Sussex Medical School. We set the Google forms to limit one response per user to avoid duplicate responses. Eligibility criteria for participation were understanding the English language and to be over 18 years of age. A total of 329 anonymous responses were recorded in the period February 1\u2013June 16. We considered this a sufficient number as previous studies in the field of IQ and its dimensions are based on surveys with a number of responses ranging from 235 to 355.[10][11][15]\nStatistical analysis of the responses was performed using the statistical analysis software package SPSS, and the specific test is described in the legend of each figure or table. Hierarchical cluster analysis of questionnaire responses (average linkage clustering using the weighted pair group method with arithmetic mean) was performed using GENE-E (Broad Institute, Cambridge, MA) for Windows.\n\nResults \nSample characteristics \nWe received 329 responses, 66% male and 33.7% female. Age groups were: 18\u201325 years, 26.4%; 26\u201340, 52.3%; 41\u201360, 18.8%; over 60, 1.5%. The responses came from 32 different countries: United Kingdom 41.5%, Yemen 20.4%, Saudi Arabia 13.4%, Germany 5.1%, Canada 3.8%, and 15.8% various other countries. Of the respondents, 49.5% had, or were studying toward, a postgraduate degree, 40.7% another higher education diploma, and 9.8% high school; of them 26.5% were of a biomedical background (a degree or studying toward a degree in medicine, pharmacology or biomedical sciences). Ten out of 329 participants responded that they do not seek health information online, and these were excluded from the analyses.\n\nRanking of IQ criteria \nFigure 1 show how all respondents ranked each of the IQ criteria described in Table 3. The full results of the questionnaire (raw data, mean, median) are provided as a supplementary file (Supplementary File 1). All responses had a satisfactory inter-rater reliability, with an overall Cronbach's Alpha for all 27 questions of 0.882 (for individual questions, Cronbach's Alpha ranged between 0.874 and 0.883).\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Ranking of HIQ criteria based on questionnaire responses. The horizontal axis indicates the number of responses (total, 319). Criteria are ranked based on the average of the mean Likert scale (right).\n\n\n\nThe ranking by the average Likert score is shown in Table 4 (first two columns). The median score of all 27 responses listed here was 3.87. It can be seen that a group of criteria that relate to the very specific context of health and disease (symptoms, side effects, treatments and instructions; in bold-italics in Table 4) are ranked high, indicating that users want information that is, above all, relevant and helpful.\n\r\n\n\n\n\n\n\n\n\n\n\n Table 4. Ranking of criteria by perceived importance\n\n\n\nOn the other hand, criteria related to the four JAMA criteria (authorship, currency, sources, financial disclosure) are not considered particularly important and, with the only exception of \u201csources,\u201d are all ranked below the median value.\nOf the eight criteria related to the HONcode principles, only one was slightly above the median (affiliation; termed \u201cauthority\u201d in the HON principles) while all the others (complementarity, privacy, attribution\/sources, transparency, financial disclosure, advertising policy) were not deemed highly important (one criterion, \u201cjustifiability,\u201d was not assessed in the questionnaire). With the exception of \u201csources,\u201d a criterion that belongs to those in the JAMA criteria, all the criteria above could be broadly related to \u201cethics\u201d and are highlighted in bold in Table 4. Authority, which we define as the affiliation of the website\u2014whether governmental, from an international health organization, for instance (while we define \"affiliation\" as that of the author)\u2014also ranked low.\n\nIdentification of main dimensions of HIQ \nWe attempted to group the various criteria in IQ dimensions. To do so we have used a mixed approach. In part we relied on an ontological\/theoretical approach and the existing classification described in Table 2. Then, with an empirical approach, we assessed whether some of these criteria followed a similar pattern in the responses to the questionnaire. For this purpose, we analyzed all individual responses using hierarchical cluster analysis.\nAs shown in Figure 2, we identified five main clusters. Cluster A includes three of the JAMA criteria (authorship, currency and sources) and affiliation. Cluster B includes financial disclosure, complementarity, advertising policy, copyright, privacy and transparency, all criteria that somewhat relate to ethical aspects of IQ. Cluster C includes basic features of webpages (number of advertisements, spelling, grammar and objectivity) as well as hyperbole and payment info. Cluster D includes IQ criteria (conciseness, ranking, and multimedia) that specifically relate to online information in addition to understandability.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. Clusters of HIQ criteria. Hierarchical cluster analysis of the Likert scale score for different criteria among 319 participants.\n\n\n\nCluster E includes criteria that relate to the practical usefulness for an information seeker in the specific context of health and disease (focus, symptoms, treatments, side effects of drugs, and information on their usage). This cluster also includes readability, and although at first one may think that this is a feature of the text (like spelling or grammar), it has probably a more practical value.\nWe now propose an organization of criteria of HIQ into dimensions, as outlined in Table 5. A first dimension relates to trustworthiness but could be better defined as \u201caccountability\u201d and includes information that defines basic criteria such as not being anonymous. This dimension includes four of the components of the JAMA score that are present in cluster A. We also included in this dimension \u201cauthority\u201d that did not belong to any cluster. In fact, our questionnaire defined authority as features of a website (such as the domain, whether a .com, .edu, or .org) and this is very similar to \u201caffiliation,\u201d defined as the affiliation of the individual author. We also included in this dimension \u201ctransparency\u201d because, although in cluster B, it was defined as the presence of contact information for the author or website. The criteria of accountability are all intrinsic dimensions of HIQ and would apply equally well to information online and in print and would also apply to non-health related information.\n\r\n\n\n\n\n\n\n\n\n\n\n Table 5. Proposed criteria and dimensions of HIQ\n\n\n\nA second dimension, ethics, defines ethical aspects of trustworthiness and includes all the criteria in cluster B except transparency (see above). We also included here \u201cobjectivity,\u201d \u201cadvertisement,\u201d and \u201cpayment information\u201d although they clustered elsewhere, as this would fit with the description of this dimension. These are criteria of HIQ that could also be applied to non-health IQ with the exception of complementarity (the presence of a statement to say information supports, but does not replace, the relationship between patient and physician). Financial disclosure might be important in other types of information, but the issue of funding and conflict of interest is regarded as particularly important in health.\nA third dimension defines textual accuracy and includes spelling, grammar, readability, and use of hyperbole or exaggeration. To define this dimension, we started from cluster C. However, because \u201chyperbole\u201d can be considered a characteristic of the text, we decided to subsume it under \u201caccuracy.\u201d This dimension could apply equally to non-health, and in print, information, with the possible exception of hyperbole or exaggeration, that is more common in the news about scientific advancements.\nA fourth dimension, defined as \u201crepresentational\u201d dimension comprises criteria (understandability, conciseness, search engine ranking, and presence of multimedia) that is probably more important in online information (that one wants to access quickly and concisely, so it can be read on a small screen) but would apply to non-health subjects. These criteria are present exactly in cluster D.\nA last dimension defines the much sought-after elements of information that characterize its scientific completeness: presence of information specific to the medical condition or its treatment, as well as focus. In fact, all these criteria relate to focus. As such, even if these specific criteria relate to health, it would be easy to identify homologous criteria in other fields. Also, this dimension could also apply to printed information, although focus is probably more important when information is accessed online, often on a small mobile device.\n\nSubgroup analysis by educational subject, gender, and language \nWe first analyzed differences in the ranking given by participants based on whether they studied, or had a degree in, a biomedical field or not. Then we looked at native language (English vs. non-English) and gender.\nThe results are shown in Table 4 that reports, in columns 3rd to 14th, the ranking (as mean score) for all subgroups. When comparing biomedical students\/graduates with non-biomedical ones, it was clear that biomedical education was associated with giving higher importance to text accuracy (spelling, grammar, sources). Higher importance to text accuracy (spelling, grammar, hyperbole) was also evident for English speakers, compared to non-English. There were also significant gender differences with textual accuracy being ranked higher in females, while males ranked higher \u201cinstructions\u201d and \u201cunderstandability.\u201d\n\nImportance of the scientific correctness of the information provided \nWe noted earlier that information about disease diagnosis and treatment is ranked highest in the whole sample (in the top quartile). However, the fact that a webpage describes a treatment for a disease does not mean that website is scientifically correct. One could come up with a webpage that meets all the criteria in the \u201ccompleteness\u201d dimension but misinforms the reader.\nWe recently proposed to use the information about the treatment suggested or promoted as a proxy for the scientific soundness of a web page.[16] Therefore, we asked participants whether they prefer websites that provide EBM information, complementary or alternative medicine (CAM), or don't care. The results shown in Table 6 indicate that only 6% preferred websites on CAM, 35% preferred EBM, and 37% did not assign this a particular importance. However, the preference for EBM was higher with biomedical education, English speakers, and females, and in these groups, there was a lower percentage of participants who did not know whether they prefer EBM or CAM. The association with biomedical education, language, and gender was statistically significant (P = 0.02, P < 0.001, P = 0.029, respectively, by the Pearson Chi-Square test). There was no significant association with EBM preference and education level (P = 0.866, data not shown).\n\r\n\n\n\n\n\n\n\n\n\n\n Table 6. Preference for EBM- or CAM-based information\n\n\n\nDiscussion \nWe propose dimensions and criteria of HIQ based on the importance assigned to them by internet users. We used an empirical approach, like what was done 20 years ago by Wang and Strong[10] and Lee[11] for IQ in the context of industries and organizations, with two major differences: the focus on the health-related content of the information provided by websites and that on trustworthiness, and that on online information. The results were not only used to rank the different criteria in order of perceived importance but also, using cluster analysis, to help with classifying them into dimensions.\nAlthough the terminology is always ambiguous, we suggest that criteria of HIQ could be subsumed to dimensions as described in Table 5, bearing in mind that there may be areas of overlap. For instance, we assigned the criterion \u201chyperbole,\u201d that in the context of HIQ means presenting a potential treatment as a \u201cmiracle drug,\u201d to the dimension of textual accuracy but on theoretical grounds it could also fit the ethical dimension of trustworthiness.\nOf the criteria in the dimension \u201caccountability,\u201d which includes the four JAMA criteria (authorship, currency, sources financial disclosure), \u201csources\u201d is the one that ranks highest, but sill only 11th. Authorship (19th) ranked lower than authority (17th) and affiliation (12th), indicating that the link to an institution or a medical degree, or the type of website (for instance whether a government website or a commercial one) are considered more important than the indication of the name of the author. The generally low importance given to the JAMA criteria was also observed in a survey by Eysenbach and Kohler[17], as they reported that \u201c[c]ontrary to the statements made in the focus groups, in practice we observed that none of the participants actively searched for information on who stood behind the sites or how the information had been compiled.\u201d[17]\nThe ethics dimension of trustworthiness includes aspects that are particularly important in medicine (conflict of interest, data privacy, financial disclosure). Of note, one criterion, \u201ccomplementarity\u201d (whether information should support, not replace, the doctor-patient relationship) is one of the HONcode principles[3] and specific for health.\nThe contextual information that we define as \u201ctextual accuracy\u201d are also ranked high, and these include spelling and grammar but also include the health-specific criterion \u201chyperbole,\u201d that is very common in health news stories and web pages when authors portray a treatment with an overly positive tone or \u201cspin.\u201d[18]\nThe \u201ccompleteness\u201d dimension defines contextual information, which is necessary for the information to fulfill its task.[19] It includes both basic IQ criteria as well as some that are specific to health, and we could define it as \u201cscientific completeness,\u201d the information that users look for and rank high in our questionnaire. This is in agreement with a recent study performed in the United States showing that completeness of the information\u2014which the authors defined as \u201cthe proportion of priori-defined elements covered by the website; breath of information\u201d\u2014also ranked higher in a study where participants were asked to rank health websites for some IQ criteria.[12] The importance given by participants to criteria related to \u201ccompleteness\/purposeness,\u201d as indicated by the high ranking of information on symptoms, side effects, and treatments in Table 4, reflects the main use of the internet when searching health information. In fact, a survey of 622 patients in the MetroNet practices in the Detroit area reported that of the topics most often searched online, specific disease conditions and treatments were at the top.[20] To \u201cfind out about treatments\u201d was also the top purpose of health-related internet use in a survey of patients of a general practice surgery in semi-rural England.[21]\nOf the representational criteria, understandability ranked rather high. On the other hand, representational criteria specific for webpages (ranking by the search engine, presence of multimedia, conciseness) are deemed as the least important.\nAnother aspect highlighted by the present study is that the ranking of criteria of HIQ is not a one-size-fits-all situation, differing depending on education, gender, and linguistic background. This is not a novel concept, and already Wang and Strong suggested that the classification of IQ criteria in dimensions is different for academic and practitioners, in a way, an extension of the concept of data \u201cfit-for-use.\u201d[10] Floridi also noted that IQ should also consider purposeness, and that the value of IQ criteria may be different in different users.[22]\nIn this sense, the difference in the ranking of HIQ criteria among subjects with a biomedical degree or biomedical students (pharmacy, biomedical science, medicine) and those in other education areas could be extrapolated to the difference between health professionals and lay persons. Those with a biomedical background give more importance to criteria such as correct spelling and grammar than those with non-biomedical background. Not surprisingly, \u201csources\u201d are ranked higher in a biomedical background, as identifying and citing references is key to this field. On the other hand, in a non-biomedical background, \u201cunderstandability\u201d is ranked higher. Interestingly, we have not found any significant difference in the ranking of the \u201cethical\u201d criteria by subjects with a biomedical background.\nNative English speakers also assign more importance to textual accuracy (spelling, grammar,), as well as to the ethical criterion of \u201cobjectivity.\u201d Attention to \u201chyperbole\u201d is also ranked higher by this group and we discussed above how this criterion has also an \u201cethical\u201d value. A very similar pattern was observed in females, when compared with males, with the added higher importance assigned to \u201cpayment information,\u201d suggesting a stronger ethical focus in females.\nThe differences in ranking identified in the subgroup analysis hint at a limitation of any classification into dimensions based on a questionnaire, as the results will vary with the population investigated, and any subsequent analysis (including the cluster analysis used here) will vary accordingly. This suggests that when IQ is defined, the target user should be well defined.\nThe other aspect of this study was on which criteria are regarded as important and which are not. The fact that the ranking by the search engine is not seen as an indicator of trustworthiness of a website is very interesting, but this does not mean that the user is likely to go through several search engine result pages rather than limiting to the first 10 to 20. The significance of this response should be assessed experimentally, for instance using eye-tracking software to validate the importance of the different criteria.\nThe low ranking on \u201cethical\u201d trustworthiness criteria is worrying, as it might indicate that users are somewhat vulnerable to information that has a conflict of interest, such as that from commercial sources promoting potentially ineffective treatments or to other types of health misinformation. This is probably something that educators, particularly those in the biomedical field, should consider improving, and males seem to be more \u201cat risk\u201d as they value \u201cethical\u201d criteria lower than females. This difference is supported by a recent study reporting that males are more likely to disseminate fake health information than females.[23] It should be noted that this is at variance with results from the MetroNet study cited above, where patients ranked \u201cEndorsement by a government agency or professional organization\u201d and \u201creliable source\/author\u201d as the most important factors influencing their trust (\u201cperceived accuracy\u201d) of healthcare websites.[20] Likewise, \u201creputable\/trustworthy organization\u201d was the most important factor in trusting health information in a 2002\u20132003 survey of 55 participants to United Kingdom health support groups, although this study was not restricted to online information but included information provided by healthcare professionals, brochures, books, TV\/radio, and others.[24] It is difficult to say whether these differences are due to the different time periods when those studies were carried out or whether this is due to the difference in the population, and patients exposed to medical research and support groups may have a higher health literacy than our sample.\nWe suggest that our proposed dimensions of HIQ may be an attempt to build a more comprehensive theoretical framework than the one that can be derived from the existing studies. For instance, the recent paper by Tao et al.[12] proposing a definition of HIQ dimensions does not take into account some of the criteria that we derived from the HONcode and DISCERN tools, particularly those related to what we call \u201cethical criteria.\u201d[12]\nIn conclusion, this study describes a possible organization of HIQ criteria into dimensions that identifies dimensions not previously recognized as such in IQ, such as the ethical dimension, which was identified through this ranking approach. Contrary to our expectations, given this is a hot topic in the news, we observed that ethical criteria, while regarded highly in the academic and medical environment, are not considered highly by the public.\nClearly, the main limitation of this study, which could affect its external validity, is that the focus on university-level participants mainly may lead to an underestimation of the importance of criteria aimed at the average user. It would be important to extend this study to a more general sample of the public, and particularly patients and carers, to see whether there is a different perception of HIQ and if this goes in the same direction of the results in the comparison between non-biomedical and biomedical educational background reported here. Another important point to consider when extrapolating the conclusions of this study is that our survey asked generically what users would look for to trust a website when searching for a health topic. It is possible that the factors that account for trust in a webpage with health-related information is different depending on the topic searched, and this may be particularly important for highly controversial topics, such as abortion, vaccines, or genetic modifications.\n\nSupplementary material \nThe supplementary material for this article can be found online at: https:\/\/www.frontiersin.org\/articles\/10.3389\/fmed.2018.00260\/full#supplementary-material\n\nAcknowledgements \nMA-J was supported by a Ph.D. studentship from the University of Brighton. We thank Audrey Marshall for critical review of the questionnaire.\n\nAuthor contributions \nAll authors designed research, analyzed the data, wrote the paper. MA-J designed research, performed research, analyzed the data, wrote the paper.\n\nConflict of interest statement \nThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.\n\nReferences \n\n\n\u2191 1.0 1.1 Silberg, W.M.; Lundberg, G.D.; Musacchio, R.A. (1997). \"Assessing, controlling, and assuring the quality of medical information on the Internet: Caveant lector et viewor--Let the reader and viewer beware\". JAMA 277 (15): 1244\u20135. doi:10.1001\/jama.1997.03540390074039. PMID 9103351.   \n\n\u2191 2.0 2.1 2.2 Charnock, D.; Shepperd, S.; Needham, G.; Gann, R. (1999). \"DISCERN: An instrument for judging the quality of written consumer health information on treatment choices\". Journal of Epidemiology and Community Health 53 (2): 105\u201311. doi:10.1136\/jech.53.2.105. PMC PMC1756830. PMID 10396471. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1756830 .   \n\n\u2191 3.0 3.1 3.2 Boyer, C.; Selby, M.; Scherrer, J.R.; Appel, R.D. (1998). \"The Health On the Net Code of Conduct for medical and health Websites\". Computers in Biology and Medicine 28 (5): 603-10. doi:10.1016\/S0010-4825(98)00037-7. PMID 9861515.   \n\n\u2191 Howick, J. (2011). The Philosophy of Evidence-Based Medicine. John Wiley & Sons. ISBN 9781405196673.   \n\n\u2191 Smith, T.C.; Novella, S.P. (2007). \"HIV denial in the Internet era\". PLoS Medicine 4 (8): e256. doi:10.1371\/journal.pmed.0040256. PMC PMC1949841. PMID 17713982. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1949841 .   \n\n\u2191 6.0 6.1 Illari, P.; Floridi, L. (2014). \"Information Quality, Data and Philosophy\". The Philosophy of Information Quality. 358. Springer. doi:10.1007\/978-3-319-07121-3_2. ISBN 9783319071213.   \n\n\u2191 Klein, B.D. (2001). \"User Perceptions of Data Quality: Internet and Traditional Text Sources\". Journal of Computer Information Systems 41 (4): 9\u201315. doi:10.1080\/08874417.2001.11647016.   \n\n\u2191 Knight, S.-A.; Burn, J. (2005). \"Developing a Framework for Assessing Information Quality on the World Wide Web\". Informing Science: The International Journal of an Emerging Transdiscipline 8: 159\u201372. doi:10.28945\/493.   \n\n\u2191 9.0 9.1 Wand, Y.; Wang, R.Y. (1996). \"Anchoring data quality dimensions in ontological foundations\". Communications of the ACM 39 (11): 86-95. doi:10.1145\/240455.240479.   \n\n\u2191 10.0 10.1 10.2 10.3 10.4 10.5 Wang, R.Y.; Strong, D.M. (2015). \"Beyond Accuracy: What Data Quality Means to Data Consumers\". Journal of Management Information Systems 12 (4): 5\u201333. doi:10.1080\/07421222.1996.11518099.   \n\n\u2191 11.0 11.1 11.2 Lee, Y.W.; Strong, D.M.; Kahn, B.K. et al. (2002). \"AIMQ: A methodology for information quality assessment\". Information & Management 40 (2): 133-146. doi:10.1016\/S0378-7206(02)00043-5.   \n\n\u2191 12.0 12.1 12.2 12.3 Tao, D.; LeRouge, C.; Smith, K.J.; De Leo, G. (2017). \"Defining Information Quality Into Health Websites: A Conceptual Framework of Health Website Information Quality for Educated Young Adults\". JMIR Human Factors 4 (4): e25. doi:10.2196\/humanfactors.6455. PMC PMC5650677. PMID 28986336. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5650677 .   \n\n\u2191 Bernstam, E.V.; Shelton, D.M.; Walji, M. et al. (2005). \"Instruments to assess the quality of health information on the World Wide Web: what can our patients actually use?\". International Journal of Medical Informatics 74 (1): 13\u201319. doi:10.1016\/j.ijmedinf.2004.10.001. PMID 15626632.   \n\n\u2191 Zhang, Y.; Sun, Y.; Xie, B. (2015). \"Quality of health information for consumers on the web: A systematic review of indicators, criteria, tools, and evaluation results\". Journal of the Association for Information Science and Technology 66 (10): 2071\u201384. doi:10.1002\/asi.23311.   \n\n\u2191 Pitt, L.F.; Watson, R.T.; Kavan, C.B. (1995). \"Service Quality: A Measure of Information Systems Effectiveness\". MIS Quarterly 19 (2): 173\u201387. doi:10.2307\/249687.   \n\n\u2191 Yaqub, M.; Ghezzi, P. (2015). \"Adding Dimensions to the Analysis of the Quality of Health Information of Websites Returned by Google: Cluster Analysis Identifies Patterns of Websites According to their Classification and the Type of Intervention Described\". Frontiers in Public Health 3: 204. doi:10.3389\/fpubh.2015.00204. PMC PMC4548082. PMID 26380250. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4548082 .   \n\n\u2191 17.0 17.1 Eysenbach, G.; K\u00f6hler, C. (2002). \"How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews\". BMJ 324 (7337): 573\u20137. doi:10.1136\/bmj.324.7337.573. PMC PMC78994. PMID 11884321. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC78994 .   \n\n\u2191 Walsh-Childers, K.; Braddock, J.; Rabaza, C.; et al. (2018). \"One Step Forward, One Step Back: Changes in News Coverage of Medical Interventions\". Health Communication 33 (2): 174\u201387. doi:10.1080\/10410236.2016.1250706. PMID 27983868.   \n\n\u2191 Sebastian-Coleman, L. (2013). Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework (1st ed.). Morgan Kaufmann. pp. 376. ISBN 9780123970336.   \n\n\u2191 20.0 20.1 Schwartz, K.L.; Roe, T.; Northrup. J. et al. (2006). \"Family medicine patients' use of the Internet for health information: a MetroNet study\". Journal of the American Board of Family Medicine 19 (1): 39\u201345. doi:10.3122\/jabfm.19.1.39. PMID 16492004.   \n\n\u2191 Rose, P.W.; Jenkins, L.; Fuller, A. et al. (2002). \"Doctors' and patients' use of the Internet for healthcare: a study from one general practice\". Health Information and Libraries Journal 19 (4): 233-5. doi:10.1046\/j.1471-1842.2002.00402.x. PMID 12485155.   \n\n\u2191 Floridi, L. (2013). \"Information Quality\". Philosophy & Technology 26 (1): 1\u20136. doi:10.1007\/s13347-013-0101-3.   \n\n\u2191 Yuelin, L.; Zhang, X.; Wang, S. (2017). \"Fake vs. real health information in social media in China\". Proceedings of the Association for Information Science and Technology 54 (1): 742\u201343. doi:10.1002\/pra2.2017.14505401139.   \n\n\u2191 Childs, S. (2004). \"Developing health website quality assessment guidelines for the voluntary sector: Outcomes from the Judge Project\". Health Information and Libraries Journal 21 (Suppl. 2): 14\u201326. doi:10.1111\/j.1740-3324.2004.00520.x. PMID 15317572.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\">https:\/\/www.limswiki.org\/index.php\/Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles on data qualityLIMSwiki journal articles on information retrieval\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 18 March 2019, at 23:31.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 217 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","86da8ed36fc493b6a573df8d0f7095ac_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_What_Is_health_information_quality_Ethical_dimension_and_perception_by_users skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:What Is health information quality? Ethical dimension and perception by users<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><b>Introduction<\/b>: The popularity of seeking health <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> online makes information quality (IQ) a public health issue. The present study aims at building a theoretical framework of health information quality (HIQ) that can be applied to websites and defines which IQ criteria are important for a website to be trustworthy and meet users' expectations.\n<\/p><p><b>Methods<\/b>: We have identified a list of HIQ criteria from existing tools and assessment criteria and elaborated them into a questionnaire that was promoted via social media and, mainly, the university. Responses (329) were used to rank the different criteria for their importance in trusting a website and to identify patterns of criteria using hierarchical cluster analysis.\n<\/p><p><b>Results<\/b>: HIQ criteria were organized in five dimensions based on previous theoretical frameworks, as well as on how they cluster together in the questionnaire response. We could identify a top-ranking dimension (scientific completeness) that describes what the user is expecting to know from the websites (in particular: description of symptoms, treatments, side effects). Cluster analysis also identified a number of criteria borrowed from existing tools for assessing HIQ that could be subsumed to a broad \u201cethical\u201d dimension (such as conflict of interests, privacy, advertising policies) that were, in general, ranked of low importance by the participants. Subgroup analysis revealed significant differences in the importance assigned to the various criteria based on gender, language, and whether or not a biomedical educational background was evident.\n<\/p><p><b>Conclusions<\/b>: We identified criteria of HIQ and organized them in dimensions. We observed that ethical criteria, while regarded highly in the academic and medical environment, are not considered highly by the public.\n<\/p><p><b>Keywords<\/b>: internet, information quality, ethics, online information, public health\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>With the diffusion of the internet, many have been concerned that, due to its unregulated and unfiltered nature, it could misinform or disinform the public. The lack of widely used search engines (Google was founded in 1998) left entirely up to the users which websites to trust among the relatively few ones (compared to 2018) available. These concerns led to the development, in the late 1990s, of instruments and organizations to assess health information quality (HIQ) of websites, including the Journal of the American Medical Association (JAMA) criteria<sup id=\"rdp-ebb-cite_ref-SilbergAssessing97_1-0\" class=\"reference\"><a href=\"#cite_note-SilbergAssessing97-1\">[1]<\/a><\/sup>, DISCERN<sup id=\"rdp-ebb-cite_ref-CharnockDISCERN99_2-0\" class=\"reference\"><a href=\"#cite_note-CharnockDISCERN99-2\">[2]<\/a><\/sup>, and the criteria for meeting the health-on-the-net (HON) code of conduct.<sup id=\"rdp-ebb-cite_ref-BoyerTheHealth98_3-0\" class=\"reference\"><a href=\"#cite_note-BoyerTheHealth98-3\">[3]<\/a><\/sup> These instruments were developed for different purposes: the JAMA and DISCERN tools were aimed at providing customers with instruments to assess websites<sup id=\"rdp-ebb-cite_ref-SilbergAssessing97_1-1\" class=\"reference\"><a href=\"#cite_note-SilbergAssessing97-1\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-CharnockDISCERN99_2-1\" class=\"reference\"><a href=\"#cite_note-CharnockDISCERN99-2\">[2]<\/a><\/sup>; the HON criteria are used by the HON foundation to certify health websites with the display of the HONCode quality seal, and this was originally aimed at organizations to help them develop websites.<sup id=\"rdp-ebb-cite_ref-BoyerTheHealth98_3-1\" class=\"reference\"><a href=\"#cite_note-BoyerTheHealth98-3\">[3]<\/a><\/sup> The criteria of HIQ considered by these three approaches are listed in Table 1.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab1_Al-Jefri_FrontInMedicine2018_5.jpg\" class=\"image wiki-link\" data-key=\"b4823fde7a22152a8f1f48c9c1102c1a\"><img alt=\"Tab1 Al-Jefri FrontInMedicine2018 5.jpg\" src=\"https:\/\/www.limswiki.org\/images\/a\/a5\/Tab1_Al-Jefri_FrontInMedicine2018_5.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 1.<\/b> Established HIQ instruments and criteria<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>There are no data available to know how many information seekers have used these tools to make assessments. On the other hand, the high number of citations in the scientific literature for the JAMA (1100) and DISCERN (600) tools indicate that these are also widely used, particularly the JAMA criteria, in academic research analyzing HIQ. It should be noted, however, that DISCERN was developed by an expert panel, but then it was actually tested on 13 self-help group members.<sup id=\"rdp-ebb-cite_ref-CharnockDISCERN99_2-2\" class=\"reference\"><a href=\"#cite_note-CharnockDISCERN99-2\">[2]<\/a><\/sup>\n<\/p><p>An important issue, and one that is not assessed by the existing HIQ instruments, is whether websites informing the public on therapies mention therapies approved by regulatory agencies or public health authorities, or non-approved ones. Drug approval requires a high level of evidence of efficacy and benefit\/risk ratio, an approach termed \u201cevidence-based medicine\u201d (EBM).<sup id=\"rdp-ebb-cite_ref-HowickThePhil11_4-0\" class=\"reference\"><a href=\"#cite_note-HowickThePhil11-4\">[4]<\/a><\/sup> In a way, this is related to the reliability of the information. For instance, a website describing AIDS as a disease due to the HIV virus that can be treated with antiretroviral therapy is higher quality than one stating that AIDS is not due to a virus and should be treated with nutritional supplements.<sup id=\"rdp-ebb-cite_ref-SmithHIV07_5-0\" class=\"reference\"><a href=\"#cite_note-SmithHIV07-5\">[5]<\/a><\/sup>\n<\/p><p>Health information quality should be seen in the wider context of information quality (IQ) generally. The latter has been extensively studied for its applications in business and manufacturing. Information quality is generally considered as a concept with multiple dimensions<sup id=\"rdp-ebb-cite_ref-IllariInfo14_6-0\" class=\"reference\"><a href=\"#cite_note-IllariInfo14-6\">[6]<\/a><\/sup>; depending on an author's philosophical view-point, information quality can have different attributes and characteristics.<sup id=\"rdp-ebb-cite_ref-KleinUser16_7-0\" class=\"reference\"><a href=\"#cite_note-KleinUser16-7\">[7]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-KnightDevelop05_8-0\" class=\"reference\"><a href=\"#cite_note-KnightDevelop05-8\">[8]<\/a><\/sup> Several studies have developed IQ frameworks based on the definition of IQ dimensions.<sup id=\"rdp-ebb-cite_ref-IllariInfo14_6-1\" class=\"reference\"><a href=\"#cite_note-IllariInfo14-6\">[6]<\/a><\/sup> The best known of these frameworks was developed by Wang<sup id=\"rdp-ebb-cite_ref-WandAnchoring96_9-0\" class=\"reference\"><a href=\"#cite_note-WandAnchoring96-9\">[9]<\/a><\/sup> and Wang<sup id=\"rdp-ebb-cite_ref-WangBeyond15_10-0\" class=\"reference\"><a href=\"#cite_note-WangBeyond15-10\">[10]<\/a><\/sup>, based on a survey among 355 Masters in Business and Administration alumni, aiming to capture aspects of IQ that are important for consumers in the business field. A second study by the same group involved 52 information professionals from the financial, healthcare, and manufacturing sectors.<sup id=\"rdp-ebb-cite_ref-LeeAIMQ02_11-0\" class=\"reference\"><a href=\"#cite_note-LeeAIMQ02-11\">[11]<\/a><\/sup> These studies defined 15 IQ criteria, that were grouped into four dimensions<sup id=\"rdp-ebb-cite_ref-WandAnchoring96_9-1\" class=\"reference\"><a href=\"#cite_note-WandAnchoring96-9\">[9]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-WangBeyond15_10-1\" class=\"reference\"><a href=\"#cite_note-WangBeyond15-10\">[10]<\/a><\/sup> as shown in Table 2.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab2_Al-Jefri_FrontInMedicine2018_5.jpg\" class=\"image wiki-link\" data-key=\"6f76b886c200fa54ad2e724a5ce4421d\"><img alt=\"Tab2 Al-Jefri FrontInMedicine2018 5.jpg\" src=\"https:\/\/www.limswiki.org\/images\/4\/44\/Tab2_Al-Jefri_FrontInMedicine2018_5.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 2.<\/b> Dimensions of IQ<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>It is probably difficult to fit the HIQ criteria from Table 1\u2014which are centered on trustworthiness and scientific correctness\u2014into the theoretical framework of IQ dimensions in Table 2, which are borrowed from other fields. Recent studies have proposed a categorization of HIQ criteria into classical IQ dimensions focusing on IQ criteria identified through focus group, and focusing on the scientific content of webpages.<sup id=\"rdp-ebb-cite_ref-TaoDefining17_12-0\" class=\"reference\"><a href=\"#cite_note-TaoDefining17-12\">[12]<\/a><\/sup>\n<\/p><p>We undertook this project to define the IQ criteria and dimensions relevant to HIQ. To do so, we have used a mixed approach, identifying relevant HIQ criteria using a theoretical approach broadly based on the existing criteria, the JAMA score, HONcode, and DISCERN, as well as an empirical approach, based on a questionnaire, to rank the importance of the various criteria to the end user. In particular, our aim was to evaluate user perceptions of HIQ criteria and their relative importance in trusting health-related websites. Criteria of HIQ were then classified in dimensions based on the existing literature and, using cluster analysis, the ranking by users.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Methods\">Methods<\/span><\/h2>\n<p>To design a questionnaire, we first identified relevant IQ criteria. These were based on the existing literature on HIQ, the instruments described above (Table 1) the standard IQ criteria listed in Table 2 and other studies.<sup id=\"rdp-ebb-cite_ref-WangBeyond15_10-2\" class=\"reference\"><a href=\"#cite_note-WangBeyond15-10\">[10]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-BernstamInstruments05_13-0\" class=\"reference\"><a href=\"#cite_note-BernstamInstruments05-13\">[13]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ZhangQuality15_14-0\" class=\"reference\"><a href=\"#cite_note-ZhangQuality15-14\">[14]<\/a><\/sup> General criteria, such as correct spelling and grammar or the importance of the presence of multimedia or the ranking by the search engine were also included. Other questions are related to the content of the webpage, such as whether the webpage explains disease symptoms, therapies, how to take medications and their side effects, and if responders are wary of webpages offering quick solutions and miracle cures (we defined this as \u201chyperbole\u201d). The respondents were also asked to rate importance that the information describes treatments based on evidence-based medicine or complementary medicine, as this question would be defining a criterion of reliability (from the scientific point of view) of the information.\n<\/p><p>The full list of HIQ criteria considered is provided in Table 3, that also reports the questions aiming at identifying the importance of those criteria in trusting a health-related website that were used in the questionnaire. The table also shows which criteria were derived from the ones in the known HIQ tools (JAMA, HON, DISCERN). For most of the criteria, the questions were formulated in the form \u201cI trust a health webpage more if\u2026\u201d or \u201cI prefer webpages that\u2026\u201d that were assessed using a 5-point Likert scale (5 = strongly agree, 4 = somewhat agree, 3 = neither agree not disagree, 2 = somewhat disagree, 1 = strongly disagree). Other questions were aiming at defining the demographics of the sample (gender, age, country, education, whether studying in a medically-related subject of not and others) or internet usage (time spent, main search engine used, device used, how often they searched health information, whether searching symptoms or therapies). The entire questionnaire (42 questions) is available as supplementary online information (Supplementary Table 1).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab3_Al-Jefri_FrontInMedicine2018_5.jpg\" class=\"image wiki-link\" data-key=\"21a57e3357f8c661fcaac28fa7f36635\"><img alt=\"Tab3 Al-Jefri FrontInMedicine2018 5.jpg\" src=\"https:\/\/www.limswiki.org\/images\/2\/2e\/Tab3_Al-Jefri_FrontInMedicine2018_5.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 3.<\/b> Criteria of HIQ and questions used in the survey<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The project was approved on January 26, 2017 by the Research Ethics Panel of the School of Computer Engineering and Mathematics of the University of Brighton. The questionnaire was published online using Google forms and promoted using social media such as Twitter, Facebook, and via email, including students and staff at the University of Brighton and students at the Brighton and Sussex Medical School. We set the Google forms to limit one response per user to avoid duplicate responses. Eligibility criteria for participation were understanding the English language and to be over 18 years of age. A total of 329 anonymous responses were recorded in the period February 1\u2013June 16. We considered this a sufficient number as previous studies in the field of IQ and its dimensions are based on surveys with a number of responses ranging from 235 to 355.<sup id=\"rdp-ebb-cite_ref-WangBeyond15_10-3\" class=\"reference\"><a href=\"#cite_note-WangBeyond15-10\">[10]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-LeeAIMQ02_11-1\" class=\"reference\"><a href=\"#cite_note-LeeAIMQ02-11\">[11]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-PittService95_15-0\" class=\"reference\"><a href=\"#cite_note-PittService95-15\">[15]<\/a><\/sup>\n<\/p><p>Statistical analysis of the responses was performed using the statistical analysis software package SPSS, and the specific test is described in the legend of each figure or table. Hierarchical cluster analysis of questionnaire responses (average linkage clustering using the weighted pair group method with arithmetic mean) was performed using GENE-E (Broad Institute, Cambridge, MA) for Windows.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Results\">Results<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Sample_characteristics\">Sample characteristics<\/span><\/h3>\n<p>We received 329 responses, 66% male and 33.7% female. Age groups were: 18\u201325 years, 26.4%; 26\u201340, 52.3%; 41\u201360, 18.8%; over 60, 1.5%. The responses came from 32 different countries: United Kingdom 41.5%, Yemen 20.4%, Saudi Arabia 13.4%, Germany 5.1%, Canada 3.8%, and 15.8% various other countries. Of the respondents, 49.5% had, or were studying toward, a postgraduate degree, 40.7% another higher education diploma, and 9.8% high school; of them 26.5% were of a biomedical background (a degree or studying toward a degree in medicine, pharmacology or biomedical sciences). Ten out of 329 participants responded that they do not seek health information online, and these were excluded from the analyses.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Ranking_of_IQ_criteria\">Ranking of IQ criteria<\/span><\/h3>\n<p>Figure 1 show how all respondents ranked each of the IQ criteria described in Table 3. The full results of the questionnaire (raw data, mean, median) are provided as a supplementary file (Supplementary File 1). All responses had a satisfactory inter-rater reliability, with an overall Cronbach's Alpha for all 27 questions of 0.882 (for individual questions, Cronbach's Alpha ranged between 0.874 and 0.883).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Al-Jefri_FrontInMedicine2018_5.jpg\" class=\"image wiki-link\" data-key=\"a0ad8c5d7d5c8aeac01895b170ff85bb\"><img alt=\"Fig1 Al-Jefri FrontInMedicine2018 5.jpg\" src=\"https:\/\/www.limswiki.org\/images\/5\/5e\/Fig1_Al-Jefri_FrontInMedicine2018_5.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Ranking of HIQ criteria based on questionnaire responses. The horizontal axis indicates the number of responses (total, 319). Criteria are ranked based on the average of the mean Likert scale (right).<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The ranking by the average Likert score is shown in Table 4 (first two columns). The median score of all 27 responses listed here was 3.87. It can be seen that a group of criteria that relate to the very specific context of health and disease (symptoms, side effects, treatments and instructions; in bold-italics in Table 4) are ranked high, indicating that users want information that is, above all, relevant and helpful.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab4_Al-Jefri_FrontInMedicine2018_5.jpg\" class=\"image wiki-link\" data-key=\"34f0ad592475858da5b4c56aa96307b1\"><img alt=\"Tab4 Al-Jefri FrontInMedicine2018 5.jpg\" src=\"https:\/\/www.limswiki.org\/images\/2\/29\/Tab4_Al-Jefri_FrontInMedicine2018_5.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 4.<\/b> Ranking of criteria by perceived importance<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>On the other hand, criteria related to the four JAMA criteria (authorship, currency, sources, financial disclosure) are not considered particularly important and, with the only exception of \u201csources,\u201d are all ranked below the median value.\n<\/p><p>Of the eight criteria related to the HONcode principles, only one was slightly above the median (affiliation; termed \u201cauthority\u201d in the HON principles) while all the others (complementarity, privacy, attribution\/sources, transparency, financial disclosure, advertising policy) were not deemed highly important (one criterion, \u201cjustifiability,\u201d was not assessed in the questionnaire). With the exception of \u201csources,\u201d a criterion that belongs to those in the JAMA criteria, all the criteria above could be broadly related to \u201cethics\u201d and are highlighted in bold in Table 4. Authority, which we define as the affiliation of the website\u2014whether governmental, from an international health organization, for instance (while we define \"affiliation\" as that of the author)\u2014also ranked low.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Identification_of_main_dimensions_of_HIQ\">Identification of main dimensions of HIQ<\/span><\/h3>\n<p>We attempted to group the various criteria in IQ dimensions. To do so we have used a mixed approach. In part we relied on an ontological\/theoretical approach and the existing classification described in Table 2. Then, with an empirical approach, we assessed whether some of these criteria followed a similar pattern in the responses to the questionnaire. For this purpose, we analyzed all individual responses using hierarchical cluster analysis.\n<\/p><p>As shown in Figure 2, we identified five main clusters. Cluster A includes three of the JAMA criteria (authorship, currency and sources) and affiliation. Cluster B includes financial disclosure, complementarity, advertising policy, copyright, privacy and transparency, all criteria that somewhat relate to ethical aspects of IQ. Cluster C includes basic features of webpages (number of advertisements, spelling, grammar and objectivity) as well as hyperbole and payment info. Cluster D includes IQ criteria (conciseness, ranking, and multimedia) that specifically relate to online information in addition to understandability.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Al-Jefri_FrontInMedicine2018_5.jpg\" class=\"image wiki-link\" data-key=\"80b40d6637f533dcc104b8e6b9555c1d\"><img alt=\"Fig2 Al-Jefri FrontInMedicine2018 5.jpg\" src=\"https:\/\/www.limswiki.org\/images\/b\/b7\/Fig2_Al-Jefri_FrontInMedicine2018_5.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> Clusters of HIQ criteria. Hierarchical cluster analysis of the Likert scale score for different criteria among 319 participants.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Cluster E includes criteria that relate to the practical usefulness for an information seeker in the specific context of health and disease (focus, symptoms, treatments, side effects of drugs, and information on their usage). This cluster also includes readability, and although at first one may think that this is a feature of the text (like spelling or grammar), it has probably a more practical value.\n<\/p><p>We now propose an organization of criteria of HIQ into dimensions, as outlined in Table 5. A first dimension relates to trustworthiness but could be better defined as \u201caccountability\u201d and includes information that defines basic criteria such as not being anonymous. This dimension includes four of the components of the JAMA score that are present in cluster A. We also included in this dimension \u201cauthority\u201d that did not belong to any cluster. In fact, our questionnaire defined authority as features of a website (such as the domain, whether a .com, .edu, or .org) and this is very similar to \u201caffiliation,\u201d defined as the affiliation of the individual author. We also included in this dimension \u201ctransparency\u201d because, although in cluster B, it was defined as the presence of contact information for the author or website. The criteria of accountability are all intrinsic dimensions of HIQ and would apply equally well to information online and in print and would also apply to non-health related information.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab5_Al-Jefri_FrontInMedicine2018_5.jpg\" class=\"image wiki-link\" data-key=\"131a7ccc5cf13b46ab61fe43a2fc2073\"><img alt=\"Tab5 Al-Jefri FrontInMedicine2018 5.jpg\" src=\"https:\/\/www.limswiki.org\/images\/1\/1f\/Tab5_Al-Jefri_FrontInMedicine2018_5.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 5.<\/b> Proposed criteria and dimensions of HIQ<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>A second dimension, ethics, defines ethical aspects of trustworthiness and includes all the criteria in cluster B except transparency (see above). We also included here \u201cobjectivity,\u201d \u201cadvertisement,\u201d and \u201cpayment information\u201d although they clustered elsewhere, as this would fit with the description of this dimension. These are criteria of HIQ that could also be applied to non-health IQ with the exception of complementarity (the presence of a statement to say information supports, but does not replace, the relationship between patient and physician). Financial disclosure might be important in other types of information, but the issue of funding and conflict of interest is regarded as particularly important in health.\n<\/p><p>A third dimension defines textual accuracy and includes spelling, grammar, readability, and use of hyperbole or exaggeration. To define this dimension, we started from cluster C. However, because \u201chyperbole\u201d can be considered a characteristic of the text, we decided to subsume it under \u201caccuracy.\u201d This dimension could apply equally to non-health, and in print, information, with the possible exception of hyperbole or exaggeration, that is more common in the news about scientific advancements.\n<\/p><p>A fourth dimension, defined as \u201crepresentational\u201d dimension comprises criteria (understandability, conciseness, search engine ranking, and presence of multimedia) that is probably more important in online information (that one wants to access quickly and concisely, so it can be read on a small screen) but would apply to non-health subjects. These criteria are present exactly in cluster D.\n<\/p><p>A last dimension defines the much sought-after elements of information that characterize its scientific completeness: presence of information specific to the medical condition or its treatment, as well as focus. In fact, all these criteria relate to focus. As such, even if these specific criteria relate to health, it would be easy to identify homologous criteria in other fields. Also, this dimension could also apply to printed information, although focus is probably more important when information is accessed online, often on a small mobile device.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Subgroup_analysis_by_educational_subject.2C_gender.2C_and_language\">Subgroup analysis by educational subject, gender, and language<\/span><\/h3>\n<p>We first analyzed differences in the ranking given by participants based on whether they studied, or had a degree in, a biomedical field or not. Then we looked at native language (English vs. non-English) and gender.\n<\/p><p>The results are shown in Table 4 that reports, in columns 3rd to 14th, the ranking (as mean score) for all subgroups. When comparing biomedical students\/graduates with non-biomedical ones, it was clear that biomedical education was associated with giving higher importance to text accuracy (spelling, grammar, sources). Higher importance to text accuracy (spelling, grammar, hyperbole) was also evident for English speakers, compared to non-English. There were also significant gender differences with textual accuracy being ranked higher in females, while males ranked higher \u201cinstructions\u201d and \u201cunderstandability.\u201d\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Importance_of_the_scientific_correctness_of_the_information_provided\">Importance of the scientific correctness of the information provided<\/span><\/h3>\n<p>We noted earlier that information about disease diagnosis and treatment is ranked highest in the whole sample (in the top quartile). However, the fact that a webpage describes a treatment for a disease does not mean that website is scientifically correct. One could come up with a webpage that meets all the criteria in the \u201ccompleteness\u201d dimension but misinforms the reader.\n<\/p><p>We recently proposed to use the information about the treatment suggested or promoted as a proxy for the scientific soundness of a web page.<sup id=\"rdp-ebb-cite_ref-YaqubAdding15_16-0\" class=\"reference\"><a href=\"#cite_note-YaqubAdding15-16\">[16]<\/a><\/sup> Therefore, we asked participants whether they prefer websites that provide EBM information, complementary or alternative medicine (CAM), or don't care. The results shown in Table 6 indicate that only 6% preferred websites on CAM, 35% preferred EBM, and 37% did not assign this a particular importance. However, the preference for EBM was higher with biomedical education, English speakers, and females, and in these groups, there was a lower percentage of participants who did not know whether they prefer EBM or CAM. The association with biomedical education, language, and gender was statistically significant (P = 0.02, P < 0.001, P = 0.029, respectively, by the Pearson Chi-Square test). There was no significant association with EBM preference and education level (P = 0.866, data not shown).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab6_Al-Jefri_FrontInMedicine2018_5.jpg\" class=\"image wiki-link\" data-key=\"e12d69f3decde6711bc13117eacd25c4\"><img alt=\"Tab6 Al-Jefri FrontInMedicine2018 5.jpg\" src=\"https:\/\/www.limswiki.org\/images\/9\/96\/Tab6_Al-Jefri_FrontInMedicine2018_5.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 6.<\/b> Preference for EBM- or CAM-based information<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Discussion\">Discussion<\/span><\/h2>\n<p>We propose dimensions and criteria of HIQ based on the importance assigned to them by internet users. We used an empirical approach, like what was done 20 years ago by Wang and Strong<sup id=\"rdp-ebb-cite_ref-WangBeyond15_10-4\" class=\"reference\"><a href=\"#cite_note-WangBeyond15-10\">[10]<\/a><\/sup> and Lee<sup id=\"rdp-ebb-cite_ref-LeeAIMQ02_11-2\" class=\"reference\"><a href=\"#cite_note-LeeAIMQ02-11\">[11]<\/a><\/sup> for IQ in the context of industries and organizations, with two major differences: the focus on the health-related content of the information provided by websites and that on trustworthiness, and that on online information. The results were not only used to rank the different criteria in order of perceived importance but also, using cluster analysis, to help with classifying them into dimensions.\n<\/p><p>Although the terminology is always ambiguous, we suggest that criteria of HIQ could be subsumed to dimensions as described in Table 5, bearing in mind that there may be areas of overlap. For instance, we assigned the criterion \u201chyperbole,\u201d that in the context of HIQ means presenting a potential treatment as a \u201cmiracle drug,\u201d to the dimension of textual accuracy but on theoretical grounds it could also fit the ethical dimension of trustworthiness.\n<\/p><p>Of the criteria in the dimension \u201caccountability,\u201d which includes the four JAMA criteria (authorship, currency, sources financial disclosure), \u201csources\u201d is the one that ranks highest, but sill only 11th. Authorship (19th) ranked lower than authority (17th) and affiliation (12th), indicating that the link to an institution or a medical degree, or the type of website (for instance whether a government website or a commercial one) are considered more important than the indication of the name of the author. The generally low importance given to the JAMA criteria was also observed in a survey by Eysenbach and Kohler<sup id=\"rdp-ebb-cite_ref-EysenbachHowDo02_17-0\" class=\"reference\"><a href=\"#cite_note-EysenbachHowDo02-17\">[17]<\/a><\/sup>, as they reported that \u201c[c]ontrary to the statements made in the focus groups, in practice we observed that none of the participants actively searched for information on who stood behind the sites or how the information had been compiled.\u201d<sup id=\"rdp-ebb-cite_ref-EysenbachHowDo02_17-1\" class=\"reference\"><a href=\"#cite_note-EysenbachHowDo02-17\">[17]<\/a><\/sup>\n<\/p><p>The ethics dimension of trustworthiness includes aspects that are particularly important in medicine (conflict of interest, data privacy, financial disclosure). Of note, one criterion, \u201ccomplementarity\u201d (whether information should support, not replace, the doctor-patient relationship) is one of the HONcode principles<sup id=\"rdp-ebb-cite_ref-BoyerTheHealth98_3-2\" class=\"reference\"><a href=\"#cite_note-BoyerTheHealth98-3\">[3]<\/a><\/sup> and specific for health.\n<\/p><p>The contextual information that we define as \u201ctextual accuracy\u201d are also ranked high, and these include spelling and grammar but also include the health-specific criterion \u201chyperbole,\u201d that is very common in health news stories and web pages when authors portray a treatment with an overly positive tone or \u201cspin.\u201d<sup id=\"rdp-ebb-cite_ref-Walsh-ChildersOneStep18_18-0\" class=\"reference\"><a href=\"#cite_note-Walsh-ChildersOneStep18-18\">[18]<\/a><\/sup>\n<\/p><p>The \u201ccompleteness\u201d dimension defines contextual information, which is necessary for the information to fulfill its task.<sup id=\"rdp-ebb-cite_ref-Sebastian-ColemanMeasur13_19-0\" class=\"reference\"><a href=\"#cite_note-Sebastian-ColemanMeasur13-19\">[19]<\/a><\/sup> It includes both basic IQ criteria as well as some that are specific to health, and we could define it as \u201cscientific completeness,\u201d the information that users look for and rank high in our questionnaire. This is in agreement with a recent study performed in the United States showing that completeness of the information\u2014which the authors defined as \u201cthe proportion of priori-defined elements covered by the website; breath of information\u201d\u2014also ranked higher in a study where participants were asked to rank health websites for some IQ criteria.<sup id=\"rdp-ebb-cite_ref-TaoDefining17_12-1\" class=\"reference\"><a href=\"#cite_note-TaoDefining17-12\">[12]<\/a><\/sup> The importance given by participants to criteria related to \u201ccompleteness\/purposeness,\u201d as indicated by the high ranking of information on symptoms, side effects, and treatments in Table 4, reflects the main use of the internet when searching health information. In fact, a survey of 622 patients in the MetroNet practices in the Detroit area reported that of the topics most often searched online, specific disease conditions and treatments were at the top.<sup id=\"rdp-ebb-cite_ref-SchwartzFamily06_20-0\" class=\"reference\"><a href=\"#cite_note-SchwartzFamily06-20\">[20]<\/a><\/sup> To \u201cfind out about treatments\u201d was also the top purpose of health-related internet use in a survey of patients of a general practice surgery in semi-rural England.<sup id=\"rdp-ebb-cite_ref-RoseDoctors02_21-0\" class=\"reference\"><a href=\"#cite_note-RoseDoctors02-21\">[21]<\/a><\/sup>\n<\/p><p>Of the representational criteria, understandability ranked rather high. On the other hand, representational criteria specific for webpages (ranking by the search engine, presence of multimedia, conciseness) are deemed as the least important.\n<\/p><p>Another aspect highlighted by the present study is that the ranking of criteria of HIQ is not a one-size-fits-all situation, differing depending on education, gender, and linguistic background. This is not a novel concept, and already Wang and Strong suggested that the classification of IQ criteria in dimensions is different for academic and practitioners, in a way, an extension of the concept of data \u201cfit-for-use.\u201d<sup id=\"rdp-ebb-cite_ref-WangBeyond15_10-5\" class=\"reference\"><a href=\"#cite_note-WangBeyond15-10\">[10]<\/a><\/sup> Floridi also noted that IQ should also consider purposeness, and that the value of IQ criteria may be different in different users.<sup id=\"rdp-ebb-cite_ref-FloridiInfo13_22-0\" class=\"reference\"><a href=\"#cite_note-FloridiInfo13-22\">[22]<\/a><\/sup>\n<\/p><p>In this sense, the difference in the ranking of HIQ criteria among subjects with a biomedical degree or biomedical students (pharmacy, biomedical science, medicine) and those in other education areas could be extrapolated to the difference between health professionals and lay persons. Those with a biomedical background give more importance to criteria such as correct spelling and grammar than those with non-biomedical background. Not surprisingly, \u201csources\u201d are ranked higher in a biomedical background, as identifying and citing references is key to this field. On the other hand, in a non-biomedical background, \u201cunderstandability\u201d is ranked higher. Interestingly, we have not found any significant difference in the ranking of the \u201cethical\u201d criteria by subjects with a biomedical background.\n<\/p><p>Native English speakers also assign more importance to textual accuracy (spelling, grammar,), as well as to the ethical criterion of \u201cobjectivity.\u201d Attention to \u201chyperbole\u201d is also ranked higher by this group and we discussed above how this criterion has also an \u201cethical\u201d value. A very similar pattern was observed in females, when compared with males, with the added higher importance assigned to \u201cpayment information,\u201d suggesting a stronger ethical focus in females.\n<\/p><p>The differences in ranking identified in the subgroup analysis hint at a limitation of any classification into dimensions based on a questionnaire, as the results will vary with the population investigated, and any subsequent analysis (including the cluster analysis used here) will vary accordingly. This suggests that when IQ is defined, the target user should be well defined.\n<\/p><p>The other aspect of this study was on which criteria are regarded as important and which are not. The fact that the ranking by the search engine is not seen as an indicator of trustworthiness of a website is very interesting, but this does not mean that the user is likely to go through several search engine result pages rather than limiting to the first 10 to 20. The significance of this response should be assessed experimentally, for instance using eye-tracking software to validate the importance of the different criteria.\n<\/p><p>The low ranking on \u201cethical\u201d trustworthiness criteria is worrying, as it might indicate that users are somewhat vulnerable to information that has a conflict of interest, such as that from commercial sources promoting potentially ineffective treatments or to other types of health misinformation. This is probably something that educators, particularly those in the biomedical field, should consider improving, and males seem to be more \u201cat risk\u201d as they value \u201cethical\u201d criteria lower than females. This difference is supported by a recent study reporting that males are more likely to disseminate fake health information than females.<sup id=\"rdp-ebb-cite_ref-YuelinFake17_23-0\" class=\"reference\"><a href=\"#cite_note-YuelinFake17-23\">[23]<\/a><\/sup> It should be noted that this is at variance with results from the MetroNet study cited above, where patients ranked \u201cEndorsement by a government agency or professional organization\u201d and \u201creliable source\/author\u201d as the most important factors influencing their trust (\u201cperceived accuracy\u201d) of healthcare websites.<sup id=\"rdp-ebb-cite_ref-SchwartzFamily06_20-1\" class=\"reference\"><a href=\"#cite_note-SchwartzFamily06-20\">[20]<\/a><\/sup> Likewise, \u201creputable\/trustworthy organization\u201d was the most important factor in trusting health information in a 2002\u20132003 survey of 55 participants to United Kingdom health support groups, although this study was not restricted to online information but included information provided by healthcare professionals, brochures, books, TV\/radio, and others.<sup id=\"rdp-ebb-cite_ref-ChildsDeveloping04_24-0\" class=\"reference\"><a href=\"#cite_note-ChildsDeveloping04-24\">[24]<\/a><\/sup> It is difficult to say whether these differences are due to the different time periods when those studies were carried out or whether this is due to the difference in the population, and patients exposed to medical research and support groups may have a higher health literacy than our sample.\n<\/p><p>We suggest that our proposed dimensions of HIQ may be an attempt to build a more comprehensive theoretical framework than the one that can be derived from the existing studies. For instance, the recent paper by Tao <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-TaoDefining17_12-2\" class=\"reference\"><a href=\"#cite_note-TaoDefining17-12\">[12]<\/a><\/sup> proposing a definition of HIQ dimensions does not take into account some of the criteria that we derived from the HONcode and DISCERN tools, particularly those related to what we call \u201cethical criteria.\u201d<sup id=\"rdp-ebb-cite_ref-TaoDefining17_12-3\" class=\"reference\"><a href=\"#cite_note-TaoDefining17-12\">[12]<\/a><\/sup>\n<\/p><p>In conclusion, this study describes a possible organization of HIQ criteria into dimensions that identifies dimensions not previously recognized as such in IQ, such as the ethical dimension, which was identified through this ranking approach. Contrary to our expectations, given this is a hot topic in the news, we observed that ethical criteria, while regarded highly in the academic and medical environment, are not considered highly by the public.\n<\/p><p>Clearly, the main limitation of this study, which could affect its external validity, is that the focus on university-level participants mainly may lead to an underestimation of the importance of criteria aimed at the average user. It would be important to extend this study to a more general sample of the public, and particularly patients and carers, to see whether there is a different perception of HIQ and if this goes in the same direction of the results in the comparison between non-biomedical and biomedical educational background reported here. Another important point to consider when extrapolating the conclusions of this study is that our survey asked generically what users would look for to trust a website when searching for a health topic. It is possible that the factors that account for trust in a webpage with health-related information is different depending on the topic searched, and this may be particularly important for highly controversial topics, such as abortion, vaccines, or genetic modifications.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Supplementary_material\">Supplementary material<\/span><\/h2>\n<p>The supplementary material for this article can be found online at: <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/fmed.2018.00260\/full#supplementary-material\" data-key=\"804fcb050e7095adce74a913af17119d\">https:\/\/www.frontiersin.org\/articles\/10.3389\/fmed.2018.00260\/full#supplementary-material<\/a>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>MA-J was supported by a Ph.D. studentship from the University of Brighton. We thank Audrey Marshall for critical review of the questionnaire.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Author_contributions\">Author contributions<\/span><\/h3>\n<p>All authors designed research, analyzed the data, wrote the paper. MA-J designed research, performed research, analyzed the data, wrote the paper.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Conflict_of_interest_statement\">Conflict of interest statement<\/span><\/h3>\n<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-SilbergAssessing97-1\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-SilbergAssessing97_1-0\">1.0<\/a><\/sup> <sup><a href=\"#cite_ref-SilbergAssessing97_1-1\">1.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Silberg, W.M.; Lundberg, G.D.; Musacchio, R.A. (1997). \"Assessing, controlling, and assuring the quality of medical information on the Internet: Caveant lector et viewor--Let the reader and viewer beware\". <i>JAMA<\/i> <b>277<\/b> (15): 1244\u20135. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1001%2Fjama.1997.03540390074039\" data-key=\"6ce61e6afef8de34d647f394df848891\">10.1001\/jama.1997.03540390074039<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/9103351\" data-key=\"dc52083ce6c769d33401885a0d7c60bb\">9103351<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Assessing%2C+controlling%2C+and+assuring+the+quality+of+medical+information+on+the+Internet%3A+Caveant+lector+et+viewor--Let+the+reader+and+viewer+beware&rft.jtitle=JAMA&rft.aulast=Silberg%2C+W.M.%3B+Lundberg%2C+G.D.%3B+Musacchio%2C+R.A.&rft.au=Silberg%2C+W.M.%3B+Lundberg%2C+G.D.%3B+Musacchio%2C+R.A.&rft.date=1997&rft.volume=277&rft.issue=15&rft.pages=1244%E2%80%935&rft_id=info:doi\/10.1001%2Fjama.1997.03540390074039&rft_id=info:pmid\/9103351&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CharnockDISCERN99-2\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-CharnockDISCERN99_2-0\">2.0<\/a><\/sup> <sup><a href=\"#cite_ref-CharnockDISCERN99_2-1\">2.1<\/a><\/sup> <sup><a href=\"#cite_ref-CharnockDISCERN99_2-2\">2.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Charnock, D.; Shepperd, S.; Needham, G.; Gann, R. (1999). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1756830\" data-key=\"8c5d9e7f3a1f163702b2b5a82166cfbb\">\"DISCERN: An instrument for judging the quality of written consumer health information on treatment choices\"<\/a>. <i>Journal of Epidemiology and Community Health<\/i> <b>53<\/b> (2): 105\u201311. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1136%2Fjech.53.2.105\" data-key=\"11417ecbf73a786f591db9e23550117c\">10.1136\/jech.53.2.105<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC1756830\/\" data-key=\"4961540150e94fc1da5ea438e03fe100\">PMC1756830<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/10396471\" data-key=\"3cc8ee165946b0ad5b33b36c31352ce9\">10396471<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1756830\" data-key=\"8c5d9e7f3a1f163702b2b5a82166cfbb\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1756830<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DISCERN%3A+An+instrument+for+judging+the+quality+of+written+consumer+health+information+on+treatment+choices&rft.jtitle=Journal+of+Epidemiology+and+Community+Health&rft.aulast=Charnock%2C+D.%3B+Shepperd%2C+S.%3B+Needham%2C+G.%3B+Gann%2C+R.&rft.au=Charnock%2C+D.%3B+Shepperd%2C+S.%3B+Needham%2C+G.%3B+Gann%2C+R.&rft.date=1999&rft.volume=53&rft.issue=2&rft.pages=105%E2%80%9311&rft_id=info:doi\/10.1136%2Fjech.53.2.105&rft_id=info:pmc\/PMC1756830&rft_id=info:pmid\/10396471&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC1756830&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BoyerTheHealth98-3\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BoyerTheHealth98_3-0\">3.0<\/a><\/sup> <sup><a href=\"#cite_ref-BoyerTheHealth98_3-1\">3.1<\/a><\/sup> <sup><a href=\"#cite_ref-BoyerTheHealth98_3-2\">3.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Boyer, C.; Selby, M.; Scherrer, J.R.; Appel, R.D. (1998). \"The Health On the Net Code of Conduct for medical and health Websites\". <i>Computers in Biology and Medicine<\/i> <b>28<\/b> (5): 603-10. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2FS0010-4825%2898%2900037-7\" data-key=\"a8e63eb6adba21f03ba81a10a443599d\">10.1016\/S0010-4825(98)00037-7<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/9861515\" data-key=\"88dae91362611ad70d015970e0361ede\">9861515<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Health+On+the+Net+Code+of+Conduct+for+medical+and+health+Websites&rft.jtitle=Computers+in+Biology+and+Medicine&rft.aulast=Boyer%2C+C.%3B+Selby%2C+M.%3B+Scherrer%2C+J.R.%3B+Appel%2C+R.D.&rft.au=Boyer%2C+C.%3B+Selby%2C+M.%3B+Scherrer%2C+J.R.%3B+Appel%2C+R.D.&rft.date=1998&rft.volume=28&rft.issue=5&rft.pages=603-10&rft_id=info:doi\/10.1016%2FS0010-4825%2898%2900037-7&rft_id=info:pmid\/9861515&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HowickThePhil11-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HowickThePhil11_4-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Howick, J. (2011). <i>The Philosophy of Evidence-Based Medicine<\/i>. John Wiley & Sons. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9781405196673.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=The+Philosophy+of+Evidence-Based+Medicine&rft.aulast=Howick%2C+J.&rft.au=Howick%2C+J.&rft.date=2011&rft.pub=John+Wiley+%26+Sons&rft.isbn=9781405196673&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SmithHIV07-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SmithHIV07_5-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Smith, T.C.; Novella, S.P. (2007). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1949841\" data-key=\"29e1eea3ecf7b1009a3f349793f6c672\">\"HIV denial in the Internet era\"<\/a>. <i>PLoS Medicine<\/i> <b>4<\/b> (8): e256. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pmed.0040256\" data-key=\"f5168016e6a88f0981c68dffc37d4694\">10.1371\/journal.pmed.0040256<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC1949841\/\" data-key=\"5cb997b4995a2b72c2df6e3338565c0e\">PMC1949841<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17713982\" data-key=\"305ed5788503ced80934b97f2556ff6d\">17713982<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1949841\" data-key=\"29e1eea3ecf7b1009a3f349793f6c672\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1949841<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HIV+denial+in+the+Internet+era&rft.jtitle=PLoS+Medicine&rft.aulast=Smith%2C+T.C.%3B+Novella%2C+S.P.&rft.au=Smith%2C+T.C.%3B+Novella%2C+S.P.&rft.date=2007&rft.volume=4&rft.issue=8&rft.pages=e256&rft_id=info:doi\/10.1371%2Fjournal.pmed.0040256&rft_id=info:pmc\/PMC1949841&rft_id=info:pmid\/17713982&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC1949841&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-IllariInfo14-6\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-IllariInfo14_6-0\">6.0<\/a><\/sup> <sup><a href=\"#cite_ref-IllariInfo14_6-1\">6.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Illari, P.; Floridi, L. (2014). \"Information Quality, Data and Philosophy\". <i>The Philosophy of Information Quality<\/i>. <b>358<\/b>. Springer. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-319-07121-3_2\" data-key=\"386d09160a512244c08d5ae802183c55\">10.1007\/978-3-319-07121-3_2<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9783319071213.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Information+Quality%2C+Data+and+Philosophy&rft.atitle=The+Philosophy+of+Information+Quality&rft.aulast=Illari%2C+P.%3B+Floridi%2C+L.&rft.au=Illari%2C+P.%3B+Floridi%2C+L.&rft.date=2014&rft.volume=358&rft.pub=Springer&rft_id=info:doi\/10.1007%2F978-3-319-07121-3_2&rft.isbn=9783319071213&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KleinUser16-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KleinUser16_7-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Klein, B.D. (2001). \"User Perceptions of Data Quality: Internet and Traditional Text Sources\". <i>Journal of Computer Information Systems<\/i> <b>41<\/b> (4): 9\u201315. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1080%2F08874417.2001.11647016\" data-key=\"3cc9d7bebc87b1e3b69e76b9fd785027\">10.1080\/08874417.2001.11647016<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=User+Perceptions+of+Data+Quality%3A+Internet+and+Traditional+Text+Sources&rft.jtitle=Journal+of+Computer+Information+Systems&rft.aulast=Klein%2C+B.D.&rft.au=Klein%2C+B.D.&rft.date=2001&rft.volume=41&rft.issue=4&rft.pages=9%E2%80%9315&rft_id=info:doi\/10.1080%2F08874417.2001.11647016&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KnightDevelop05-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KnightDevelop05_8-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Knight, S.-A.; Burn, J. (2005). \"Developing a Framework for Assessing Information Quality on the World Wide Web\". <i>Informing Science: The International Journal of an Emerging Transdiscipline<\/i> <b>8<\/b>: 159\u201372. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.28945%2F493\" data-key=\"8283d0b46619346ddd169561c068a08c\">10.28945\/493<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Developing+a+Framework+for+Assessing+Information+Quality+on+the+World+Wide+Web&rft.jtitle=Informing+Science%3A+The+International+Journal+of+an+Emerging+Transdiscipline&rft.aulast=Knight%2C+S.-A.%3B+Burn%2C+J.&rft.au=Knight%2C+S.-A.%3B+Burn%2C+J.&rft.date=2005&rft.volume=8&rft.pages=159%E2%80%9372&rft_id=info:doi\/10.28945%2F493&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WandAnchoring96-9\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-WandAnchoring96_9-0\">9.0<\/a><\/sup> <sup><a href=\"#cite_ref-WandAnchoring96_9-1\">9.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wand, Y.; Wang, R.Y. (1996). \"Anchoring data quality dimensions in ontological foundations\". <i>Communications of the ACM<\/i> <b>39<\/b> (11): 86-95. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F240455.240479\" data-key=\"7638bc19cdd85edc474d8ff7b42289ac\">10.1145\/240455.240479<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Anchoring+data+quality+dimensions+in+ontological+foundations&rft.jtitle=Communications+of+the+ACM&rft.aulast=Wand%2C+Y.%3B+Wang%2C+R.Y.&rft.au=Wand%2C+Y.%3B+Wang%2C+R.Y.&rft.date=1996&rft.volume=39&rft.issue=11&rft.pages=86-95&rft_id=info:doi\/10.1145%2F240455.240479&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WangBeyond15-10\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-WangBeyond15_10-0\">10.0<\/a><\/sup> <sup><a href=\"#cite_ref-WangBeyond15_10-1\">10.1<\/a><\/sup> <sup><a href=\"#cite_ref-WangBeyond15_10-2\">10.2<\/a><\/sup> <sup><a href=\"#cite_ref-WangBeyond15_10-3\">10.3<\/a><\/sup> <sup><a href=\"#cite_ref-WangBeyond15_10-4\">10.4<\/a><\/sup> <sup><a href=\"#cite_ref-WangBeyond15_10-5\">10.5<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wang, R.Y.; Strong, D.M. (2015). \"Beyond Accuracy: What Data Quality Means to Data Consumers\". <i>Journal of Management Information Systems<\/i> <b>12<\/b> (4): 5\u201333. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1080%2F07421222.1996.11518099\" data-key=\"fcc8d72504b3f7143642c81d8b926c1a\">10.1080\/07421222.1996.11518099<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Beyond+Accuracy%3A+What+Data+Quality+Means+to+Data+Consumers&rft.jtitle=Journal+of+Management+Information+Systems&rft.aulast=Wang%2C+R.Y.%3B+Strong%2C+D.M.&rft.au=Wang%2C+R.Y.%3B+Strong%2C+D.M.&rft.date=2015&rft.volume=12&rft.issue=4&rft.pages=5%E2%80%9333&rft_id=info:doi\/10.1080%2F07421222.1996.11518099&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LeeAIMQ02-11\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-LeeAIMQ02_11-0\">11.0<\/a><\/sup> <sup><a href=\"#cite_ref-LeeAIMQ02_11-1\">11.1<\/a><\/sup> <sup><a href=\"#cite_ref-LeeAIMQ02_11-2\">11.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Lee, Y.W.; Strong, D.M.; Kahn, B.K. et al. (2002). \"AIMQ: A methodology for information quality assessment\". <i>Information & Management<\/i> <b>40<\/b> (2): 133-146. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2FS0378-7206%2802%2900043-5\" data-key=\"6c61c0e5442945c0829cf56b59f2cbe2\">10.1016\/S0378-7206(02)00043-5<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=AIMQ%3A+A+methodology+for+information+quality+assessment&rft.jtitle=Information+%26+Management&rft.aulast=Lee%2C+Y.W.%3B+Strong%2C+D.M.%3B+Kahn%2C+B.K.+et+al.&rft.au=Lee%2C+Y.W.%3B+Strong%2C+D.M.%3B+Kahn%2C+B.K.+et+al.&rft.date=2002&rft.volume=40&rft.issue=2&rft.pages=133-146&rft_id=info:doi\/10.1016%2FS0378-7206%2802%2900043-5&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TaoDefining17-12\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-TaoDefining17_12-0\">12.0<\/a><\/sup> <sup><a href=\"#cite_ref-TaoDefining17_12-1\">12.1<\/a><\/sup> <sup><a href=\"#cite_ref-TaoDefining17_12-2\">12.2<\/a><\/sup> <sup><a href=\"#cite_ref-TaoDefining17_12-3\">12.3<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Tao, D.; LeRouge, C.; Smith, K.J.; De Leo, G. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5650677\" data-key=\"491c035b6306334df13d935cf3ee0aa4\">\"Defining Information Quality Into Health Websites: A Conceptual Framework of Health Website Information Quality for Educated Young Adults\"<\/a>. <i>JMIR Human Factors<\/i> <b>4<\/b> (4): e25. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.2196%2Fhumanfactors.6455\" data-key=\"cffd7f53263a7038d49b3175fc6fb05a\">10.2196\/humanfactors.6455<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5650677\/\" data-key=\"696fe42d8c031c871e7672a07c79ab7d\">PMC5650677<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/28986336\" data-key=\"0256fa3cc0c45f2169ca0855df89e58f\">28986336<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5650677\" data-key=\"491c035b6306334df13d935cf3ee0aa4\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5650677<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Defining+Information+Quality+Into+Health+Websites%3A+A+Conceptual+Framework+of+Health+Website+Information+Quality+for+Educated+Young+Adults&rft.jtitle=JMIR+Human+Factors&rft.aulast=Tao%2C+D.%3B+LeRouge%2C+C.%3B+Smith%2C+K.J.%3B+De+Leo%2C+G.&rft.au=Tao%2C+D.%3B+LeRouge%2C+C.%3B+Smith%2C+K.J.%3B+De+Leo%2C+G.&rft.date=2017&rft.volume=4&rft.issue=4&rft.pages=e25&rft_id=info:doi\/10.2196%2Fhumanfactors.6455&rft_id=info:pmc\/PMC5650677&rft_id=info:pmid\/28986336&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5650677&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BernstamInstruments05-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BernstamInstruments05_13-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bernstam, E.V.; Shelton, D.M.; Walji, M. et al. (2005). \"Instruments to assess the quality of health information on the World Wide Web: what can our patients actually use?\". <i>International Journal of Medical Informatics<\/i> <b>74<\/b> (1): 13\u201319. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.ijmedinf.2004.10.001\" data-key=\"db48e1c33e240d4a4363606a8cf3f323\">10.1016\/j.ijmedinf.2004.10.001<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/15626632\" data-key=\"e3bf2786b7b95289439d93a8dc3c6144\">15626632<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Instruments+to+assess+the+quality+of+health+information+on+the+World+Wide+Web%3A+what+can+our+patients+actually+use%3F&rft.jtitle=International+Journal+of+Medical+Informatics&rft.aulast=Bernstam%2C+E.V.%3B+Shelton%2C+D.M.%3B+Walji%2C+M.+et+al.&rft.au=Bernstam%2C+E.V.%3B+Shelton%2C+D.M.%3B+Walji%2C+M.+et+al.&rft.date=2005&rft.volume=74&rft.issue=1&rft.pages=13%E2%80%9319&rft_id=info:doi\/10.1016%2Fj.ijmedinf.2004.10.001&rft_id=info:pmid\/15626632&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ZhangQuality15-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ZhangQuality15_14-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Zhang, Y.; Sun, Y.; Xie, B. (2015). \"Quality of health information for consumers on the web: A systematic review of indicators, criteria, tools, and evaluation results\". <i>Journal of the Association for Information Science and Technology<\/i> <b>66<\/b> (10): 2071\u201384. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1002%2Fasi.23311\" data-key=\"7f9183934436f689ffa3217dc475bf89\">10.1002\/asi.23311<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quality+of+health+information+for+consumers+on+the+web%3A+A+systematic+review+of+indicators%2C+criteria%2C+tools%2C+and+evaluation+results&rft.jtitle=Journal+of+the+Association+for+Information+Science+and+Technology&rft.aulast=Zhang%2C+Y.%3B+Sun%2C+Y.%3B+Xie%2C+B.&rft.au=Zhang%2C+Y.%3B+Sun%2C+Y.%3B+Xie%2C+B.&rft.date=2015&rft.volume=66&rft.issue=10&rft.pages=2071%E2%80%9384&rft_id=info:doi\/10.1002%2Fasi.23311&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PittService95-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PittService95_15-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pitt, L.F.; Watson, R.T.; Kavan, C.B. (1995). \"Service Quality: A Measure of Information Systems Effectiveness\". <i>MIS Quarterly<\/i> <b>19<\/b> (2): 173\u201387. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.2307%2F249687\" data-key=\"44386afc75188e8516e868a1ba7ecd28\">10.2307\/249687<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Service+Quality%3A+A+Measure+of+Information+Systems+Effectiveness&rft.jtitle=MIS+Quarterly&rft.aulast=Pitt%2C+L.F.%3B+Watson%2C+R.T.%3B+Kavan%2C+C.B.&rft.au=Pitt%2C+L.F.%3B+Watson%2C+R.T.%3B+Kavan%2C+C.B.&rft.date=1995&rft.volume=19&rft.issue=2&rft.pages=173%E2%80%9387&rft_id=info:doi\/10.2307%2F249687&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-YaqubAdding15-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-YaqubAdding15_16-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Yaqub, M.; Ghezzi, P. (2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4548082\" data-key=\"05529ded0fe36a13b2c2853280143fe0\">\"Adding Dimensions to the Analysis of the Quality of Health Information of Websites Returned by Google: Cluster Analysis Identifies Patterns of Websites According to their Classification and the Type of Intervention Described\"<\/a>. <i>Frontiers in Public Health<\/i> <b>3<\/b>: 204. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.3389%2Ffpubh.2015.00204\" data-key=\"94fbc269b07f3d20be462869fac7b3d0\">10.3389\/fpubh.2015.00204<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4548082\/\" data-key=\"4fdca2187191f297577804de6b87542b\">PMC4548082<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26380250\" data-key=\"015ec528e0874b88ee96556042a79ed1\">26380250<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4548082\" data-key=\"05529ded0fe36a13b2c2853280143fe0\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4548082<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Adding+Dimensions+to+the+Analysis+of+the+Quality+of+Health+Information+of+Websites+Returned+by+Google%3A+Cluster+Analysis+Identifies+Patterns+of+Websites+According+to+their+Classification+and+the+Type+of+Intervention+Described&rft.jtitle=Frontiers+in+Public+Health&rft.aulast=Yaqub%2C+M.%3B+Ghezzi%2C+P.&rft.au=Yaqub%2C+M.%3B+Ghezzi%2C+P.&rft.date=2015&rft.volume=3&rft.pages=204&rft_id=info:doi\/10.3389%2Ffpubh.2015.00204&rft_id=info:pmc\/PMC4548082&rft_id=info:pmid\/26380250&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4548082&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EysenbachHowDo02-17\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-EysenbachHowDo02_17-0\">17.0<\/a><\/sup> <sup><a href=\"#cite_ref-EysenbachHowDo02_17-1\">17.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Eysenbach, G.; K\u00f6hler, C. (2002). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC78994\" data-key=\"4f73c1c1801762f93a3f0ee75193288d\">\"How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews\"<\/a>. <i>BMJ<\/i> <b>324<\/b> (7337): 573\u20137. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1136%2Fbmj.324.7337.573\" data-key=\"8d57948959ea604b0a5e36f7e2a85990\">10.1136\/bmj.324.7337.573<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC78994\/\" data-key=\"e2bc49a2eb2b3a5e5715d7ab1dfc49b3\">PMC78994<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/11884321\" data-key=\"7f170f73c7aafed5337097142743a896\">11884321<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC78994\" data-key=\"4f73c1c1801762f93a3f0ee75193288d\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC78994<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=How+do+consumers+search+for+and+appraise+health+information+on+the+world+wide+web%3F+Qualitative+study+using+focus+groups%2C+usability+tests%2C+and+in-depth+interviews&rft.jtitle=BMJ&rft.aulast=Eysenbach%2C+G.%3B+K%C3%B6hler%2C+C.&rft.au=Eysenbach%2C+G.%3B+K%C3%B6hler%2C+C.&rft.date=2002&rft.volume=324&rft.issue=7337&rft.pages=573%E2%80%937&rft_id=info:doi\/10.1136%2Fbmj.324.7337.573&rft_id=info:pmc\/PMC78994&rft_id=info:pmid\/11884321&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC78994&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Walsh-ChildersOneStep18-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Walsh-ChildersOneStep18_18-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Walsh-Childers, K.; Braddock, J.; Rabaza, C.; et al. (2018). \"One Step Forward, One Step Back: Changes in News Coverage of Medical Interventions\". <i>Health Communication<\/i> <b>33<\/b> (2): 174\u201387. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1080%2F10410236.2016.1250706\" data-key=\"bd42ce4c94026c33d9be2350f0194a33\">10.1080\/10410236.2016.1250706<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/27983868\" data-key=\"775fe3a3e9eded98bfa7ac0561ac7a4f\">27983868<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=One+Step+Forward%2C+One+Step+Back%3A+Changes+in+News+Coverage+of+Medical+Interventions&rft.jtitle=Health+Communication&rft.aulast=Walsh-Childers%2C+K.%3B+Braddock%2C+J.%3B+Rabaza%2C+C.%3B+et+al.&rft.au=Walsh-Childers%2C+K.%3B+Braddock%2C+J.%3B+Rabaza%2C+C.%3B+et+al.&rft.date=2018&rft.volume=33&rft.issue=2&rft.pages=174%E2%80%9387&rft_id=info:doi\/10.1080%2F10410236.2016.1250706&rft_id=info:pmid\/27983868&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Sebastian-ColemanMeasur13-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Sebastian-ColemanMeasur13_19-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Sebastian-Coleman, L. (2013). <i>Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework<\/i> (1st ed.). Morgan Kaufmann. pp. 376. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780123970336.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Measuring+Data+Quality+for+Ongoing+Improvement%3A+A+Data+Quality+Assessment+Framework&rft.aulast=Sebastian-Coleman%2C+L.&rft.au=Sebastian-Coleman%2C+L.&rft.date=2013&rft.pages=pp.%26nbsp%3B376&rft.edition=1st&rft.pub=Morgan+Kaufmann&rft.isbn=9780123970336&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SchwartzFamily06-20\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-SchwartzFamily06_20-0\">20.0<\/a><\/sup> <sup><a href=\"#cite_ref-SchwartzFamily06_20-1\">20.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Schwartz, K.L.; Roe, T.; Northrup. J. et al. (2006). \"Family medicine patients' use of the Internet for health information: a MetroNet study\". <i>Journal of the American Board of Family Medicine<\/i> <b>19<\/b> (1): 39\u201345. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.3122%2Fjabfm.19.1.39\" data-key=\"b28f84186e1aeb4c7b875b60942ddd33\">10.3122\/jabfm.19.1.39<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/16492004\" data-key=\"d8c071ba934047c43d508ac163259735\">16492004<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Family+medicine+patients%27+use+of+the+Internet+for+health+information%3A+a+MetroNet+study&rft.jtitle=Journal+of+the+American+Board+of+Family+Medicine&rft.aulast=Schwartz%2C+K.L.%3B+Roe%2C+T.%3B+Northrup.+J.+et+al.&rft.au=Schwartz%2C+K.L.%3B+Roe%2C+T.%3B+Northrup.+J.+et+al.&rft.date=2006&rft.volume=19&rft.issue=1&rft.pages=39%E2%80%9345&rft_id=info:doi\/10.3122%2Fjabfm.19.1.39&rft_id=info:pmid\/16492004&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RoseDoctors02-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RoseDoctors02_21-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Rose, P.W.; Jenkins, L.; Fuller, A. et al. (2002). \"Doctors' and patients' use of the Internet for healthcare: a study from one general practice\". <i>Health Information and Libraries Journal<\/i> <b>19<\/b> (4): 233-5. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1046%2Fj.1471-1842.2002.00402.x\" data-key=\"8c33c98f05ff8c3f4e4a7e7e2ad451e8\">10.1046\/j.1471-1842.2002.00402.x<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/12485155\" data-key=\"4e239ece84253ea277a50f6bccd2086f\">12485155<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Doctors%27+and+patients%27+use+of+the+Internet+for+healthcare%3A+a+study+from+one+general+practice&rft.jtitle=Health+Information+and+Libraries+Journal&rft.aulast=Rose%2C+P.W.%3B+Jenkins%2C+L.%3B+Fuller%2C+A.+et+al.&rft.au=Rose%2C+P.W.%3B+Jenkins%2C+L.%3B+Fuller%2C+A.+et+al.&rft.date=2002&rft.volume=19&rft.issue=4&rft.pages=233-5&rft_id=info:doi\/10.1046%2Fj.1471-1842.2002.00402.x&rft_id=info:pmid\/12485155&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FloridiInfo13-22\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FloridiInfo13_22-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Floridi, L. (2013). \"Information Quality\". <i>Philosophy & Technology<\/i> <b>26<\/b> (1): 1\u20136. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2Fs13347-013-0101-3\" data-key=\"7401a188494915eeab1c965458345625\">10.1007\/s13347-013-0101-3<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Information+Quality&rft.jtitle=Philosophy+%26+Technology&rft.aulast=Floridi%2C+L.&rft.au=Floridi%2C+L.&rft.date=2013&rft.volume=26&rft.issue=1&rft.pages=1%E2%80%936&rft_id=info:doi\/10.1007%2Fs13347-013-0101-3&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-YuelinFake17-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-YuelinFake17_23-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Yuelin, L.; Zhang, X.; Wang, S. (2017). \"Fake vs. real health information in social media in China\". <i>Proceedings of the Association for Information Science and Technology<\/i> <b>54<\/b> (1): 742\u201343. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1002%2Fpra2.2017.14505401139\" data-key=\"0789feed8f6cd5a4236e4a4f10e36f56\">10.1002\/pra2.2017.14505401139<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fake+vs.+real+health+information+in+social+media+in+China&rft.jtitle=Proceedings+of+the+Association+for+Information+Science+and+Technology&rft.aulast=Yuelin%2C+L.%3B+Zhang%2C+X.%3B+Wang%2C+S.&rft.au=Yuelin%2C+L.%3B+Zhang%2C+X.%3B+Wang%2C+S.&rft.date=2017&rft.volume=54&rft.issue=1&rft.pages=742%E2%80%9343&rft_id=info:doi\/10.1002%2Fpra2.2017.14505401139&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ChildsDeveloping04-24\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ChildsDeveloping04_24-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Childs, S. (2004). \"Developing health website quality assessment guidelines for the voluntary sector: Outcomes from the Judge Project\". <i>Health Information and Libraries Journal<\/i> <b>21<\/b> (Suppl. 2): 14\u201326. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1111%2Fj.1740-3324.2004.00520.x\" data-key=\"42fda7a44374641512f12ed09d824e7c\">10.1111\/j.1740-3324.2004.00520.x<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/15317572\" data-key=\"11fe9a18d71ea6ba7d39a456b75533a9\">15317572<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Developing+health+website+quality+assessment+guidelines+for+the+voluntary+sector%3A+Outcomes+from+the+Judge+Project&rft.jtitle=Health+Information+and+Libraries+Journal&rft.aulast=Childs%2C+S.&rft.au=Childs%2C+S.&rft.date=2004&rft.volume=21&rft.issue=Suppl.+2&rft.pages=14%E2%80%9326&rft_id=info:doi\/10.1111%2Fj.1740-3324.2004.00520.x&rft_id=info:pmid\/15317572&rfr_id=info:sid\/en.wikipedia.org:Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185652\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.670 seconds\nReal time usage: 0.784 seconds\nPreprocessor visited node count: 20161\/1000000\nPreprocessor generated node count: 34447\/1000000\nPost\u2010expand include size: 153779\/2097152 bytes\nTemplate argument size: 48856\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 645.528 1 - -total\n 83.87% 541.413 1 - Template:Reflist\n 73.30% 473.148 24 - Template:Citation\/core\n 68.86% 444.499 21 - Template:Cite_journal\n 9.63% 62.148 1 - Template:Infobox_journal_article\n 9.39% 60.591 42 - Template:Citation\/identifier\n 9.23% 59.594 1 - Template:Infobox\n 8.69% 56.109 3 - Template:Cite_book\n 5.55% 35.824 80 - Template:Infobox\/row\n 3.72% 24.035 25 - Template:Citation\/make_link\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10972-0!*!0!!en!5!* and timestamp 20190401185651 and revision id 35280\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users\">https:\/\/www.limswiki.org\/index.php\/Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","86da8ed36fc493b6a573df8d0f7095ac_images":["https:\/\/www.limswiki.org\/images\/a\/a5\/Tab1_Al-Jefri_FrontInMedicine2018_5.jpg","https:\/\/www.limswiki.org\/images\/4\/44\/Tab2_Al-Jefri_FrontInMedicine2018_5.jpg","https:\/\/www.limswiki.org\/images\/2\/2e\/Tab3_Al-Jefri_FrontInMedicine2018_5.jpg","https:\/\/www.limswiki.org\/images\/5\/5e\/Fig1_Al-Jefri_FrontInMedicine2018_5.jpg","https:\/\/www.limswiki.org\/images\/2\/29\/Tab4_Al-Jefri_FrontInMedicine2018_5.jpg","https:\/\/www.limswiki.org\/images\/b\/b7\/Fig2_Al-Jefri_FrontInMedicine2018_5.jpg","https:\/\/www.limswiki.org\/images\/1\/1f\/Tab5_Al-Jefri_FrontInMedicine2018_5.jpg","https:\/\/www.limswiki.org\/images\/9\/96\/Tab6_Al-Jefri_FrontInMedicine2018_5.jpg"],"86da8ed36fc493b6a573df8d0f7095ac_timestamp":1554145011,"47e85bcf8a99fb2753262f8d1499e7f0_type":"article","47e85bcf8a99fb2753262f8d1499e7f0_title":"Transferring exome sequencing data from clinical laboratories to healthcare providers: Lessons learned at a pediatric hospital (Swaminathan et al. 2018)","47e85bcf8a99fb2753262f8d1499e7f0_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital","47e85bcf8a99fb2753262f8d1499e7f0_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Transferring exome sequencing data from clinical laboratories to healthcare providers: Lessons learned at a pediatric hospital\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nTransferring exome sequencing data from clinical laboratories to healthcare providers: Lessons learned at a pediatric hospitalJournal\n \nFrontiers in GeneticsAuthor(s)\n \nSwaminathan, Rajeswari; Huang, Yungui; Miller, Katherine; Pastore, Matthew;\r\nHashimoto, Sayaka; Jacobson, Theodora; Mouhlas, Danielle; Lin, SimonAuthor affiliation(s)\n \nNationwide Children's HospitalPrimary contact\n \nEmail: simon dot lin at nationwidechildrens dot orgEditors\n \nPatrinos, George P.Year published\n \n2018Volume and issue\n \n9Page(s)\n \n54DOI\n \n10.3389\/fgene.2018.00054ISSN\n \n1664-8021Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttps:\/\/www.frontiersin.org\/articles\/10.3389\/fgene.2018.00054\/fullDownload\n \nhttps:\/\/www.frontiersin.org\/articles\/10.3389\/fgene.2018.00054\/pdf (PDF)\n\nContents\n\n1 Abstract \n2 Introduction \n3 Materials and methods \n4 Results \n5 Discussion \n6 Abbreviations \n7 Declarations \n\n7.1 Acknowledgements \n7.2 Author contributions \n7.3 Funding \n7.4 Data \n7.5 Conflict of interest statement \n\n\n8 References \n9 Notes \n\n\n\nAbstract \nThe adoption rate of genome sequencing for clinical diagnostics has been steadily increasing, leading to the possibility of improvement in diagnostic yields. Although laboratories generate a summary clinical report, sharing raw genomic data with healthcare providers is equally important, both for secondary research studies as well as for a deeper analysis of the data itself, as seen by the efforts from organizations such as American College of Medical Genetics and Genomics, as well as Global Alliance for Genomics and Health. Here, we aim to describe the existing protocol of genomic data sharing between a certified clinical laboratory and a healthcare provider and highlight some of the lessons learned. This study tracked and subsequently evaluated the data transfer workflow for 19 patients, all of whom consented to be part of this research study and visited the genetics clinic at a tertiary pediatric hospital between April 2016 and December 2016. Two of the most noticeable elements observed through this study are the manual validation steps and the discrepancies in patient identifiers used by a clinical lab vs. healthcare provider. Both of these add complexity to the transfer process as well as make it more susceptible to errors. The results from this study highlight some of the critical changes that need to be made in order to improve genomic data sharing workflows between healthcare providers and clinical sequencing laboratories.\nKeywords: genomic data sharing, genomic data transfer, whole exome sequencing, clinical genomics, interoperability, laboratory workflows\n\nIntroduction \nThe rate of genome sequencing is rising sharply, leading to the generation of substantial volumes of data. Despite the surge in data generation, utilizing the wealth of knowledge embedded in that data for the improvement of clinical outcomes is still lagging behind.[1]Additional research is still required in order to better associate genes\/variants with diseases. Currently, clinical laboratories return a summary report back to the ordering physician. However, depending on the complexity of the disease\u2014as well as the availability of information within knowledge bases\u2014not every report ends up with a diagnosis. In many cases, when a sequencing rest is unable to detect the underlying genetic cause, clinicians may choose to obtain the raw sequencing data (available as FASTQ, VCF, or BAM files) and perform a more detailed research study\/analysis on it, in hopes of untangling some of the complex details associated with the case. However, the underlying decision to share data ultimately rests in the hands of the patient\/participant. Sharing sequencing data directly with the patient itself can also be beneficial, especially when a researcher does not have adequate resources to return any clinically actionable information back to the patient.[2] Sharing data directly with individuals makes them feel empowered and better controls the further flow of their confidential information.[3] There are currently several initiatives, such as GenomeConnect, My Research Legacy by the American Heart Association, etc. that are involved in sharing biomedical information for research and health purposes.[4] Although there are several challenges associated with patient-controlled sharing of genomic data, it is not within the scope of the current study.\nAt present, clinical laboratories either load the data onto hard drives or Universal Serial Bus (USB) drives and ship them to the providers or directly transfer data over a secure network. There is currently no standard protocol for transferring sequencing data from laboratories to healthcare providers. Through this study, we aim to describe the current state of the genomic data transfer process, specifically, data obtained from whole-exome sequencing (WES) studies between sequencing laboratories and healthcare providers and highlight some of the key lessons learned.\n\nMaterials and methods \nDuring the observation period of this study from April 2016 to December 2016, samples from 122 patients admitted to a tertiary pediatric hospital and ordered for WES testing were sent to a genetic laboratory accredited by College of American Pathologists (CAP) and certified by the Clinical Laboratory Improvement Amendments (CLIA). Since genomic data is considered private and confidential, explicit consent had to be obtained from the patients in order to be able to use their data for research purposes. Nineteen of the 122 patients provided consent to have their WES data transferred from the laboratory to the researchers associated with the provider institution. There are many reasons for not being able to obtain patient consent, starting with participants having a complete lack of interest in research all the way to having to face discriminatory treatment in the event of being diagnosed with a high-risk disease mutation. The workflow, as shown in Figure 1 below, describes the steps involved from consenting the patient to receiving data back from the laboratory. For all 19 patients, the consent for WES as well as for raw data release were obtained on the same day by the same provider. Turnaround time for WES report release is approximately 12 weeks. Once the report is released, the raw data is independently released by the laboratory.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. This figure shows the different steps and entities involved in the process that starts from a patient consenting for WES and release of sequencing data, to the sequencing being performed in the sequencing lab and finally releasing the test result as well as transferring the raw sequencing data (FASTQ file) to the healthcare provider.\n\n\n\nFor securely transferring large volumes of health data, the laboratory in this study uses a \u201cManaged File Transfer System\u201d (MFTS), a service providing fine-grained access and control features over using simple Secure File Transfer Protocol (SFTP) clients.[5] The MFTS service uses both SFTP and HyperText Transfer Protocol Secure (HTTPS) protocols underneath for performing data transfers, and users can download the data through either client. The FASTQ files are deposited on a laboratory server, where they stay up to 90 days, from the date of upload. The laboratory sends a notification to the provider email address listed on the data release consent form. Validation is performed by comparing the identifier on the notification with the identifier listed on the data release form to ensure integrity of the data being downloaded. Healthcare providers are given a secure login-based access to a restricted section on the server, containing only the data from their consented patients.\n\nResults \nAs seen in Table 1, the time taken by the laboratory to process each of the data release requests varied considerably. The \u201c\u2013\u201d in some places is due to missing information on some of the WES report release dates. The average turnaround time from the time of test report release to having the raw data ready for download was around 9.7 weeks, with a maximum of 26 weeks, minimum of one week, and standard deviation of 8.5 weeks. The huge difference in processing times in the early cases compared to those toward the end can be attributed to the improvement in process workflow along the course of this study. When the study began, there was no standardized process in place for sending files over from the laboratory to the healthcare provider. Further, there were no protocols in place for creating specific users for the healthcare provider to access and download the data. However, as the process was repeatedly applied on subsequent cases, there was an iterative improvement to the workflow, as can be seen by the significant decrease in processing times.\n\r\n\n\n\n\n\n\n\n\n\n\n Table 1. Time taken from sending consent form to having data ready for download for each of the 19 patients in the study\n\n\n\nPaper-based patient consents obtained by the genetic counselors are physically sent to the genetic laboratory along with the blood or DNA sample, printed medical records, and other appropriate information. We observed challenges in consistently providing all of the required information to the laboratory. One of the challenges to this manual process is the possibility of dealing with missing information. There were two cases in the current study where patient consent forms were missing yet the data was available for download. On the other hand, there was a single case of a patient who provided consent, but there was no data available for download. Each time the data is available for download, a manual notification needs to be sent by the laboratory personnel to the provider, alerting them of the availability of data, which can lead to unnecessary wait times. Thirdly, there are discrepancies between the provider and the laboratory in uniquely identifying a sample. In this study, the consent forms by the provider used patient name and DOB, but the sequencing lab assigned a DNA sample number to uniquely identify each patient in the data download notification. One of the data release forms did mention the DNA sample number, but the others used a combination of patient name and date of birth. The email notifications sent by the sequencing lab notifying the healthcare provider that the FASTQ files are ready for download also uses the DNA sample number as the identifier. It is necessary to verify the DNA sample number in the data download notification matches with the identifier on the consent forms to make sure only data with appropriate consents are being transferred, thereby introducing an additional mapping step. Although the workflow became more robust and the processing times reduced significantly toward the end of the study, the process is not completely free of manual interference.\n\nDiscussion \nThe results from this study highlight an urgent need to implement automated systems to improve information exchange between healthcare providers and clinical genetic laboratories. As stated by the American College of Medical Genetics and Genomics (ACMG), genomic data sharing is extremely important for the development of new diagnostic techniques and therapeutics that will ultimately lead to an improvement of patient care and understanding of disease.[6] The importance of genomic data and its impact on health outcomes is also entering the minds of patients now. Since the ultimate owner of the data are the patients themselves, it is important that they realize this need in order to provide the required consent. Repeated sessions of genetic counseling and the widespread information available on the internet have helped educate patients to a considerable extent.[7] Having manual control of a possibly frequently used process in the future can lead to unwanted errors. Using the electronic health record (EHR) system to store all this data comes with the advantage that triggers could be set in place to validate all of the incoming and outgoing data as well as send automated notifications. On a shared note, since patients often see multiple healthcare providers during their lifetime and have their data shared across multiple provider institutions, an interoperable application programming interface (API) connecting the different systems would also be required in the future. This will eliminate the hassle of writing individual programs for each of the data access requests. In order to access genomic data across multiple systems, existing consortiums such as the Global Alliance for Genomics and Health (GA4GH) provide an interoperable genomics framework that can be accessed through an API.[8][8] Additionally, the Office of the National Coordinator for Health Information Technology (ONC) encourages those involved in health IT to contribute to the development of a defined, shared roadmap leveraging health IT interoperability to ultimately protect and advance healthcare for all.[9] \nSimilar to how all research sequencing data is stored in the centralized repository, dbGaP[10], sequencing laboratories can also deposit all of the clinical sequencing data into a similar centralized location and later provide appropriate access to researchers. The genomic world is also looking into the possibility of using a blockchain framework for the seamless sharing of sensitive genomic information. Instead of sharing data with the healthcare providers, who would eventually pass it on to the research community, the sequencing laboratories can also consider sharing the data directly with the patient themselves, who own that data. This way even if the data needs to be shared with multiple researchers, it can be taken care of by the patient themselves.\nThe current methods of secure data transfer, mainly by shipping hard drives, can be costly to providers (~150\u2013200 USD). One prospective option is to store data in a centralized cloud and provide access to interested parties in a secure manner. Although the concept of the Health Insurance Portability and Accountability Act (HIPAA)-compliant cloud is slowly coming into existence, maintaining security and privacy of genomic data in the cloud still remains an outstanding question for many organizations.\nIn conclusion, there is massive potential to leverage genomic data to advance human health overall. The medical community needs to be able to share genomic data to achieve better and improved patient outcomes. Our study highlights some of the hurdles that can be encountered and some potential ways to address them in order to achieve the path to successful implementation of secure and efficient genomic data transfer and sharing.\n\nAbbreviations \nACMG, American College of Medical Genetics and Genomics\nAPI, application programming interface\nCAP, College of American Pathologists\nCLIA, Clinical Laboratory Improvement Amendments\nEHR, electronic health record\nGA4GH, Global Alliance for Genomics and Health\nHIPAA, Health Insurance Portability and Accountability Act\nHTTPS, HyperText Transfer Protocol Secure\nMFTS, managed file transfer system\nMRN, medical record number\nONC, Office of the National Coordinator for Health Information Technology\nSFTP, Secure File Transfer Protocol\nWES, whole-exome sequencing\n\nDeclarations \nAcknowledgements \nThe authors wish to thank Ashley Kubatko for project management.\n\nAuthor contributions \nRS, YH, and SL conceived and designed the study. MP, SH, TJ, and DM obtained consent from patients and worked on obtaining the required data for the study. RS, YH, KM, and SL drafted the manuscript. All authors read, edited, and approved the final manuscript as written.\n\nFunding \nFunding for this study was provided by SL institutional faculty start-up funding at the Research Institute of Nationwide Children's Hospital.\n\nData \nAll 19 patients whose data has been used as part of this study consented for their data to be used for research studies. Since this is just a Quality Improvement (QI) project, there was no requirement to pass through the ethics committee. There was no analysis or manipulation done to data from any of the patients.\n\nConflict of interest statement \nThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.\n\nReferences \n\n\n\u2191 Ginsburg, G. (2014). \"Medical genomics: Gather and use genetic data in health care\". Nature 508 (7497): 451\u20133. doi:10.1038\/508451a. PMID 24765668.   \n\n\u2191 Middleton, A.; Wright, C.F.; Morley, K.I. et al. (2015). \"Potential research participants support the return of raw sequence data\". Journal of Medical Genetics 52 (8): 571\u20134. doi:10.1136\/jmedgenet-2015-103119. PMC PMC4518751. PMID 25995218. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4518751 .   \n\n\u2191 Shabani, M.; Vears, D.; Borry, P. (2018). \"Raw Genomic Data: Storage, Access, and Sharing\". Trends in Genetics 34 (1): 8\u201310. doi:10.1016\/j.tig.2017.10.004. PMID 29132689.   \n\n\u2191 Miller. K.E.; Lin, S.M. (2017). \"Addressing a patient-controlled approach for genomic data sharing\". Genetics in Medicine 19 (11): 1280\u20131. doi:10.1038\/gim.2017.36. PMID 28425983.   \n\n\u2191 \"Explanation of the FTP and SFTP protocols\". Know-how - Wise-FTP. AceBIT GmbH. https:\/\/www.wise-ftp.com\/know-how\/ftp_and_sftp.htm .   \n\n\u2191 ACMG Board of Directors (2017). \"Laboratory and clinical genomic data sharing is crucial to improving genetic health care: a position statement of the American College of Medical Genetics and Genomics\". Genetics in Medicine 19 (7): 721-722. doi:10.1038\/gim.2016.196. PMID 28055021.   \n\n\u2191 Morgan, T.; Schmidt, J.; Haakonsen, C. et al. (2014). \"Using the internet to seek information about genetic and rare diseases: A case study comparing data from 2006 and 2011\". JMIR Research Protocols 3 (1): e10. doi:10.2196\/resprot.2916. PMC PMC3961701. PMID 24565858. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3961701 .   \n\n\u2191 8.0 8.1 Global Alliance for Genomics and Health (2016). \"GENOMICS. A federated ecosystem for sharing genomic, clinical data\". Science 352 (6291): 1278-80. doi:10.1126\/science.aaf6162. PMID 27284183.   \n\n\u2191 Office of the National Coordinator for Health Information Technology (October 2015). \"Connecting Health and Care for the Nation: A Shared Nationwide Interoperability Roadmap\" (PDF). https:\/\/www.healthit.gov\/sites\/default\/files\/hie-interoperability\/nationwide-interoperability-roadmap-final-version-1.0.pdf .   \n\n\u2191 Tryka, K.A.; Hao, L.; Sturcke, A. et al. (2014). \"NCBI's Database of Genotypes and Phenotypes: dbGaP\". Nucleic Acids Research 42 (DB1): D975\u20139. doi:10.1093\/nar\/gkt1211. PMC PMC3965052. PMID 24297256. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3965052 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original paper listed references alphabetically; this wiki lists them by order of appearance, by design. The sole footnote was turned into an inline reference for convenience.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\">https:\/\/www.limswiki.org\/index.php\/Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles on data management and sharingLIMSwiki journal articles on genome informaticsLIMSwiki journal articles on health informatics\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 19 February 2019, at 00:58.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 44 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","47e85bcf8a99fb2753262f8d1499e7f0_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers_Lessons_learned_at_a_pediatric_hospital skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Transferring exome sequencing data from clinical laboratories to healthcare providers: Lessons learned at a pediatric hospital<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p>The adoption rate of <a href=\"https:\/\/www.limswiki.org\/index.php\/Genomics\" title=\"Genomics\" class=\"wiki-link\" data-key=\"96a82dabf51cf9510dd00c5a03396c44\">genome sequencing<\/a> for clinical diagnostics has been steadily increasing, leading to the possibility of improvement in diagnostic yields. Although <a href=\"https:\/\/www.limswiki.org\/index.php\/Laboratory\" title=\"Laboratory\" class=\"wiki-link\" data-key=\"c57fc5aac9e4abf31dccae81df664c33\">laboratories<\/a> generate a summary clinical report, sharing raw genomic data with healthcare providers is equally important, both for secondary research studies as well as for a deeper analysis of the data itself, as seen by the efforts from organizations such as American College of Medical Genetics and Genomics, as well as Global Alliance for Genomics and Health. Here, we aim to describe the existing protocol of genomic data sharing between a certified <a href=\"https:\/\/www.limswiki.org\/index.php\/Clinical_laboratory\" title=\"Clinical laboratory\" class=\"wiki-link\" data-key=\"307bcdf1bdbcd1bb167cee435b7a5463\">clinical laboratory<\/a> and a healthcare provider and highlight some of the lessons learned. This study tracked and subsequently evaluated the data transfer workflow for 19 patients, all of whom consented to be part of this research study and visited the genetics clinic at a tertiary pediatric hospital between April 2016 and December 2016. Two of the most noticeable elements observed through this study are the manual validation steps and the discrepancies in patient identifiers used by a clinical lab vs. healthcare provider. Both of these add complexity to the transfer process as well as make it more susceptible to errors. The results from this study highlight some of the critical changes that need to be made in order to improve genomic data sharing workflows between healthcare providers and clinical sequencing laboratories.\n<\/p><p><b>Keywords<\/b>: genomic data sharing, genomic data transfer, whole exome sequencing, clinical genomics, interoperability, laboratory workflows\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>The rate of genome sequencing is rising sharply, leading to the generation of substantial volumes of data. Despite the surge in data generation, utilizing the wealth of knowledge embedded in that data for the improvement of clinical outcomes is still lagging behind.<sup id=\"rdp-ebb-cite_ref-GinsburgMed14_1-0\" class=\"reference\"><a href=\"#cite_note-GinsburgMed14-1\">[1]<\/a><\/sup>Additional research is still required in order to better associate genes\/variants with diseases. Currently, clinical laboratories return a summary report back to the ordering physician. However, depending on the complexity of the disease\u2014as well as the availability of <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> within knowledge bases\u2014not every report ends up with a diagnosis. In many cases, when a sequencing rest is unable to detect the underlying genetic cause, clinicians may choose to obtain the raw sequencing data (available as FASTQ, VCF, or BAM files) and perform a more detailed research study\/analysis on it, in hopes of untangling some of the complex details associated with the case. However, the underlying decision to share data ultimately rests in the hands of the patient\/participant. Sharing sequencing data directly with the patient itself can also be beneficial, especially when a researcher does not have adequate resources to return any clinically actionable information back to the patient.<sup id=\"rdp-ebb-cite_ref-MiddletonPotential15_2-0\" class=\"reference\"><a href=\"#cite_note-MiddletonPotential15-2\">[2]<\/a><\/sup> Sharing data directly with individuals makes them feel empowered and better controls the further flow of their confidential information.<sup id=\"rdp-ebb-cite_ref-ShabaniRaw18_3-0\" class=\"reference\"><a href=\"#cite_note-ShabaniRaw18-3\">[3]<\/a><\/sup> There are currently several initiatives, such as GenomeConnect, My Research Legacy by the American Heart Association, etc. that are involved in sharing biomedical information for research and health purposes.<sup id=\"rdp-ebb-cite_ref-MillerAddress17_4-0\" class=\"reference\"><a href=\"#cite_note-MillerAddress17-4\">[4]<\/a><\/sup> Although there are several challenges associated with patient-controlled sharing of genomic data, it is not within the scope of the current study.\n<\/p><p>At present, clinical laboratories either load the data onto hard drives or Universal Serial Bus (USB) drives and ship them to the providers or directly transfer data over a secure network. There is currently no standard protocol for transferring sequencing data from laboratories to healthcare providers. Through this study, we aim to describe the current state of the genomic data transfer process, specifically, data obtained from whole-exome sequencing (WES) studies between sequencing laboratories and healthcare providers and highlight some of the key lessons learned.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Materials_and_methods\">Materials and methods<\/span><\/h2>\n<p>During the observation period of this study from April 2016 to December 2016, samples from 122 patients admitted to a tertiary pediatric hospital and ordered for WES testing were sent to a genetic laboratory accredited by College of American Pathologists (CAP) and certified by the <a href=\"https:\/\/www.limswiki.org\/index.php\/Clinical_Laboratory_Improvement_Amendments\" title=\"Clinical Laboratory Improvement Amendments\" class=\"wiki-link\" data-key=\"64bdae1dc17c40c28e0c560396a6ae35\">Clinical Laboratory Improvement Amendments<\/a> (CLIA). Since genomic data is considered private and confidential, explicit consent had to be obtained from the patients in order to be able to use their data for research purposes. Nineteen of the 122 patients provided consent to have their WES data transferred from the laboratory to the researchers associated with the provider institution. There are many reasons for not being able to obtain patient consent, starting with participants having a complete lack of interest in research all the way to having to face discriminatory treatment in the event of being diagnosed with a high-risk disease mutation. The workflow, as shown in Figure 1 below, describes the steps involved from consenting the patient to receiving data back from the laboratory. For all 19 patients, the consent for WES as well as for raw data release were obtained on the same day by the same provider. Turnaround time for WES report release is approximately 12 weeks. Once the report is released, the raw data is independently released by the laboratory.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Swaminathan_FrontInGenetics2018_9.jpg\" class=\"image wiki-link\" data-key=\"f82545ee4019bc1ca96e385868976067\"><img alt=\"Fig1 Swaminathan FrontInGenetics2018 9.jpg\" src=\"https:\/\/www.limswiki.org\/images\/8\/8d\/Fig1_Swaminathan_FrontInGenetics2018_9.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> This figure shows the different steps and entities involved in the process that starts from a patient consenting for WES and release of sequencing data, to the sequencing being performed in the sequencing lab and finally releasing the test result as well as transferring the raw sequencing data (FASTQ file) to the healthcare provider.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>For securely transferring large volumes of health data, the laboratory in this study uses a \u201cManaged File Transfer System\u201d (MFTS), a service providing fine-grained access and control features over using simple Secure File Transfer Protocol (SFTP) clients.<sup id=\"rdp-ebb-cite_ref-WiseExplan_5-0\" class=\"reference\"><a href=\"#cite_note-WiseExplan-5\">[5]<\/a><\/sup> The MFTS service uses both SFTP and HyperText Transfer Protocol Secure (HTTPS) protocols underneath for performing data transfers, and users can download the data through either client. The FASTQ files are deposited on a laboratory server, where they stay up to 90 days, from the date of upload. The laboratory sends a notification to the provider email address listed on the data release consent form. Validation is performed by comparing the identifier on the notification with the identifier listed on the data release form to ensure integrity of the data being downloaded. Healthcare providers are given a secure login-based access to a restricted section on the server, containing only the data from their consented patients.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Results\">Results<\/span><\/h2>\n<p>As seen in Table 1, the time taken by the laboratory to process each of the data release requests varied considerably. The \u201c\u2013\u201d in some places is due to missing information on some of the WES report release dates. The average turnaround time from the time of test report release to having the raw data ready for download was around 9.7 weeks, with a maximum of 26 weeks, minimum of one week, and standard deviation of 8.5 weeks. The huge difference in processing times in the early cases compared to those toward the end can be attributed to the improvement in process workflow along the course of this study. When the study began, there was no standardized process in place for sending files over from the laboratory to the healthcare provider. Further, there were no protocols in place for creating specific users for the healthcare provider to access and download the data. However, as the process was repeatedly applied on subsequent cases, there was an iterative improvement to the workflow, as can be seen by the significant decrease in processing times.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab1_Swaminathan_FrontInGenetics2018_9.jpg\" class=\"image wiki-link\" data-key=\"e76a484d9ba3cd29b0ed5c5192afa1c3\"><img alt=\"Tab1 Swaminathan FrontInGenetics2018 9.jpg\" src=\"https:\/\/www.limswiki.org\/images\/a\/a5\/Tab1_Swaminathan_FrontInGenetics2018_9.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 1.<\/b> Time taken from sending consent form to having data ready for download for each of the 19 patients in the study<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Paper-based patient consents obtained by the genetic counselors are physically sent to the genetic laboratory along with the blood or DNA sample, printed medical records, and other appropriate information. We observed challenges in consistently providing all of the required information to the laboratory. One of the challenges to this manual process is the possibility of dealing with missing information. There were two cases in the current study where patient consent forms were missing yet the data was available for download. On the other hand, there was a single case of a patient who provided consent, but there was no data available for download. Each time the data is available for download, a manual notification needs to be sent by the laboratory personnel to the provider, alerting them of the availability of data, which can lead to unnecessary wait times. Thirdly, there are discrepancies between the provider and the laboratory in uniquely identifying a sample. In this study, the consent forms by the provider used patient name and DOB, but the sequencing lab assigned a DNA sample number to uniquely identify each patient in the data download notification. One of the data release forms did mention the DNA sample number, but the others used a combination of patient name and date of birth. The email notifications sent by the sequencing lab notifying the healthcare provider that the FASTQ files are ready for download also uses the DNA sample number as the identifier. It is necessary to verify the DNA sample number in the data download notification matches with the identifier on the consent forms to make sure only data with appropriate consents are being transferred, thereby introducing an additional mapping step. Although the workflow became more robust and the processing times reduced significantly toward the end of the study, the process is not completely free of manual interference.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Discussion\">Discussion<\/span><\/h2>\n<p>The results from this study highlight an urgent need to implement automated systems to improve information exchange between healthcare providers and clinical genetic laboratories. As stated by the American College of Medical Genetics and Genomics (ACMG), genomic data sharing is extremely important for the development of new diagnostic techniques and therapeutics that will ultimately lead to an improvement of patient care and understanding of disease.<sup id=\"rdp-ebb-cite_ref-ACMGLab17_6-0\" class=\"reference\"><a href=\"#cite_note-ACMGLab17-6\">[6]<\/a><\/sup> The importance of genomic data and its impact on health outcomes is also entering the minds of patients now. Since the ultimate owner of the data are the patients themselves, it is important that they realize this need in order to provide the required consent. Repeated sessions of genetic counseling and the widespread information available on the internet have helped educate patients to a considerable extent.<sup id=\"rdp-ebb-cite_ref-MorganUsing14_7-0\" class=\"reference\"><a href=\"#cite_note-MorganUsing14-7\">[7]<\/a><\/sup> Having manual control of a possibly frequently used process in the future can lead to unwanted errors. Using the <a href=\"https:\/\/www.limswiki.org\/index.php\/Electronic_health_record\" title=\"Electronic health record\" class=\"wiki-link\" data-key=\"f2e31a73217185bb01389404c1fd5255\">electronic health record<\/a> (EHR) system to store all this data comes with the advantage that triggers could be set in place to validate all of the incoming and outgoing data as well as send automated notifications. On a shared note, since patients often see multiple healthcare providers during their lifetime and have their data shared across multiple provider institutions, an interoperable <a href=\"https:\/\/www.limswiki.org\/index.php\/Application_programming_interface\" title=\"Application programming interface\" class=\"wiki-link\" data-key=\"36fc319869eba4613cb0854b421b0934\">application programming interface<\/a> (API) connecting the different systems would also be required in the future. This will eliminate the hassle of writing individual programs for each of the data access requests. In order to access genomic data across multiple systems, existing consortiums such as the Global Alliance for Genomics and Health (GA4GH) provide an interoperable genomics framework that can be accessed through an API.<sup id=\"rdp-ebb-cite_ref-GAGHGenomics16_8-0\" class=\"reference\"><a href=\"#cite_note-GAGHGenomics16-8\">[8]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-GAGHGenomics16_8-1\" class=\"reference\"><a href=\"#cite_note-GAGHGenomics16-8\">[8]<\/a><\/sup> Additionally, the Office of the National Coordinator for Health Information Technology (ONC) encourages those involved in <a href=\"https:\/\/www.limswiki.org\/index.php\/Health_information_technology\" title=\"Health information technology\" class=\"wiki-link\" data-key=\"9c8ef822470559f757db89f3fa234cc0\">health IT<\/a> to contribute to the development of a defined, shared roadmap leveraging health IT interoperability to ultimately protect and advance healthcare for all.<sup id=\"rdp-ebb-cite_ref-ONCConn15_9-0\" class=\"reference\"><a href=\"#cite_note-ONCConn15-9\">[9]<\/a><\/sup> \n<\/p><p>Similar to how all research sequencing data is stored in the centralized repository, dbGaP<sup id=\"rdp-ebb-cite_ref-TrykaNCBI14_10-0\" class=\"reference\"><a href=\"#cite_note-TrykaNCBI14-10\">[10]<\/a><\/sup>, sequencing laboratories can also deposit all of the clinical sequencing data into a similar centralized location and later provide appropriate access to researchers. The genomic world is also looking into the possibility of using a <a href=\"https:\/\/www.limswiki.org\/index.php\/Blockchain\" title=\"Blockchain\" class=\"wiki-link\" data-key=\"ae8b186c311716aca561aaee91944f8e\">blockchain<\/a> framework for the seamless sharing of sensitive genomic information. Instead of sharing data with the healthcare providers, who would eventually pass it on to the research community, the sequencing laboratories can also consider sharing the data directly with the patient themselves, who own that data. This way even if the data needs to be shared with multiple researchers, it can be taken care of by the patient themselves.\n<\/p><p>The current methods of secure data transfer, mainly by shipping hard drives, can be costly to providers (~150\u2013200 USD). One prospective option is to store data in a centralized cloud and provide access to interested parties in a secure manner. Although the concept of the <a href=\"https:\/\/www.limswiki.org\/index.php\/Health_Insurance_Portability_and_Accountability_Act\" title=\"Health Insurance Portability and Accountability Act\" class=\"wiki-link\" data-key=\"b70673a0117c21576016cb7498867153\">Health Insurance Portability and Accountability Act<\/a> (HIPAA)-compliant cloud is slowly coming into existence, maintaining security and privacy of genomic data in the cloud still remains an outstanding question for many organizations.\n<\/p><p>In conclusion, there is massive potential to leverage genomic data to advance human health overall. The medical community needs to be able to share genomic data to achieve better and improved patient outcomes. Our study highlights some of the hurdles that can be encountered and some potential ways to address them in order to achieve the path to successful implementation of secure and efficient genomic data transfer and sharing.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Abbreviations\">Abbreviations<\/span><\/h2>\n<p><b>ACMG<\/b>, American College of Medical Genetics and Genomics\n<\/p><p><b>API<\/b>, application programming interface\n<\/p><p><b>CAP<\/b>, College of American Pathologists\n<\/p><p><b>CLIA<\/b>, Clinical Laboratory Improvement Amendments\n<\/p><p><b>EHR<\/b>, electronic health record\n<\/p><p><b>GA4GH<\/b>, Global Alliance for Genomics and Health\n<\/p><p><b>HIPAA<\/b>, Health Insurance Portability and Accountability Act\n<\/p><p><b>HTTPS<\/b>, HyperText Transfer Protocol Secure\n<\/p><p><b>MFTS<\/b>, managed file transfer system\n<\/p><p><b>MRN<\/b>, medical record number\n<\/p><p><b>ONC<\/b>, Office of the National Coordinator for Health Information Technology\n<\/p><p><b>SFTP<\/b>, Secure File Transfer Protocol\n<\/p><p><b>WES<\/b>, whole-exome sequencing\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Declarations\">Declarations<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h3>\n<p>The authors wish to thank Ashley Kubatko for project management.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Author_contributions\">Author contributions<\/span><\/h3>\n<p>RS, YH, and SL conceived and designed the study. MP, SH, TJ, and DM obtained consent from patients and worked on obtaining the required data for the study. RS, YH, KM, and SL drafted the manuscript. All authors read, edited, and approved the final manuscript as written.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h3>\n<p>Funding for this study was provided by SL institutional faculty start-up funding at the Research Institute of Nationwide Children's Hospital.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Data\">Data<\/span><\/h3>\n<p>All 19 patients whose data has been used as part of this study consented for their data to be used for research studies. Since this is just a Quality Improvement (QI) project, there was no requirement to pass through the ethics committee. There was no analysis or manipulation done to data from any of the patients.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Conflict_of_interest_statement\">Conflict of interest statement<\/span><\/h3>\n<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-GinsburgMed14-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GinsburgMed14_1-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Ginsburg, G. (2014). \"Medical genomics: Gather and use genetic data in health care\". <i>Nature<\/i> <b>508<\/b> (7497): 451\u20133. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1038%2F508451a\" data-key=\"09c730ef9f84fbc6d57f5f24cac647c9\">10.1038\/508451a<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24765668\" data-key=\"a097846ba03766e47b3e2b2008537f83\">24765668<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Medical+genomics%3A+Gather+and+use+genetic+data+in+health+care&rft.jtitle=Nature&rft.aulast=Ginsburg%2C+G.&rft.au=Ginsburg%2C+G.&rft.date=2014&rft.volume=508&rft.issue=7497&rft.pages=451%E2%80%933&rft_id=info:doi\/10.1038%2F508451a&rft_id=info:pmid\/24765668&rfr_id=info:sid\/en.wikipedia.org:Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MiddletonPotential15-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MiddletonPotential15_2-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Middleton, A.; Wright, C.F.; Morley, K.I. et al. (2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4518751\" data-key=\"be06a265219bb2ba760d99af255cbe23\">\"Potential research participants support the return of raw sequence data\"<\/a>. <i>Journal of Medical Genetics<\/i> <b>52<\/b> (8): 571\u20134. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1136%2Fjmedgenet-2015-103119\" data-key=\"899a02f3edf4d2026e8e6f4d807360ea\">10.1136\/jmedgenet-2015-103119<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4518751\/\" data-key=\"1ad3eb8aa2530925c5d7e1b65782a636\">PMC4518751<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25995218\" data-key=\"94fb0c8d2064f73282152eef547088c7\">25995218<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4518751\" data-key=\"be06a265219bb2ba760d99af255cbe23\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4518751<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Potential+research+participants+support+the+return+of+raw+sequence+data&rft.jtitle=Journal+of+Medical+Genetics&rft.aulast=Middleton%2C+A.%3B+Wright%2C+C.F.%3B+Morley%2C+K.I.+et+al.&rft.au=Middleton%2C+A.%3B+Wright%2C+C.F.%3B+Morley%2C+K.I.+et+al.&rft.date=2015&rft.volume=52&rft.issue=8&rft.pages=571%E2%80%934&rft_id=info:doi\/10.1136%2Fjmedgenet-2015-103119&rft_id=info:pmc\/PMC4518751&rft_id=info:pmid\/25995218&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4518751&rfr_id=info:sid\/en.wikipedia.org:Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ShabaniRaw18-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ShabaniRaw18_3-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Shabani, M.; Vears, D.; Borry, P. (2018). \"Raw Genomic Data: Storage, Access, and Sharing\". <i>Trends in Genetics<\/i> <b>34<\/b> (1): 8\u201310. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.tig.2017.10.004\" data-key=\"156e09320c84eb2eccc17d1672ae0c36\">10.1016\/j.tig.2017.10.004<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/29132689\" data-key=\"56be6daab9bd57008b97dd585f7bbf5d\">29132689<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Raw+Genomic+Data%3A+Storage%2C+Access%2C+and+Sharing&rft.jtitle=Trends+in+Genetics&rft.aulast=Shabani%2C+M.%3B+Vears%2C+D.%3B+Borry%2C+P.&rft.au=Shabani%2C+M.%3B+Vears%2C+D.%3B+Borry%2C+P.&rft.date=2018&rft.volume=34&rft.issue=1&rft.pages=8%E2%80%9310&rft_id=info:doi\/10.1016%2Fj.tig.2017.10.004&rft_id=info:pmid\/29132689&rfr_id=info:sid\/en.wikipedia.org:Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MillerAddress17-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MillerAddress17_4-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Miller. K.E.; Lin, S.M. (2017). \"Addressing a patient-controlled approach for genomic data sharing\". <i>Genetics in Medicine<\/i> <b>19<\/b> (11): 1280\u20131. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1038%2Fgim.2017.36\" data-key=\"b2be369603b7e957e73d8a5d1140d04f\">10.1038\/gim.2017.36<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/28425983\" data-key=\"d2cc43d9fff3ef2fa9b7ea8fd7a6aa44\">28425983<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Addressing+a+patient-controlled+approach+for+genomic+data+sharing&rft.jtitle=Genetics+in+Medicine&rft.aulast=Miller.+K.E.%3B+Lin%2C+S.M.&rft.au=Miller.+K.E.%3B+Lin%2C+S.M.&rft.date=2017&rft.volume=19&rft.issue=11&rft.pages=1280%E2%80%931&rft_id=info:doi\/10.1038%2Fgim.2017.36&rft_id=info:pmid\/28425983&rfr_id=info:sid\/en.wikipedia.org:Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WiseExplan-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WiseExplan_5-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.wise-ftp.com\/know-how\/ftp_and_sftp.htm\" data-key=\"2b120f2602dd25ba41a39b1f7de54d67\">\"Explanation of the FTP and SFTP protocols\"<\/a>. <i>Know-how - Wise-FTP<\/i>. AceBIT GmbH<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.wise-ftp.com\/know-how\/ftp_and_sftp.htm\" data-key=\"2b120f2602dd25ba41a39b1f7de54d67\">https:\/\/www.wise-ftp.com\/know-how\/ftp_and_sftp.htm<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Explanation+of+the+FTP+and+SFTP+protocols&rft.atitle=Know-how+-+Wise-FTP&rft.pub=AceBIT+GmbH&rft_id=https%3A%2F%2Fwww.wise-ftp.com%2Fknow-how%2Fftp_and_sftp.htm&rfr_id=info:sid\/en.wikipedia.org:Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ACMGLab17-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ACMGLab17_6-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">ACMG Board of Directors (2017). \"Laboratory and clinical genomic data sharing is crucial to improving genetic health care: a position statement of the American College of Medical Genetics and Genomics\". <i>Genetics in Medicine<\/i> <b>19<\/b> (7): 721-722. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1038%2Fgim.2016.196\" data-key=\"dc4a0c6ed93170458917187309b43124\">10.1038\/gim.2016.196<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/28055021\" data-key=\"72096511fd29dc53aad55bbde4a6c206\">28055021<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Laboratory+and+clinical+genomic+data+sharing+is+crucial+to+improving+genetic+health+care%3A+a+position+statement+of+the+American+College+of+Medical+Genetics+and+Genomics&rft.jtitle=Genetics+in+Medicine&rft.aulast=ACMG+Board+of+Directors&rft.au=ACMG+Board+of+Directors&rft.date=2017&rft.volume=19&rft.issue=7&rft.pages=721-722&rft_id=info:doi\/10.1038%2Fgim.2016.196&rft_id=info:pmid\/28055021&rfr_id=info:sid\/en.wikipedia.org:Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MorganUsing14-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MorganUsing14_7-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Morgan, T.; Schmidt, J.; Haakonsen, C. et al. (2014). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3961701\" data-key=\"5541b2376d6cce59ada442d306b10e15\">\"Using the internet to seek information about genetic and rare diseases: A case study comparing data from 2006 and 2011\"<\/a>. <i>JMIR Research Protocols<\/i> <b>3<\/b> (1): e10. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.2196%2Fresprot.2916\" data-key=\"529211b1d1f26454d31d57e33925b9c4\">10.2196\/resprot.2916<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3961701\/\" data-key=\"172e6871a5053168fac7204ebe694b62\">PMC3961701<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24565858\" data-key=\"36e82b148f21da5b043a130ba31e8e97\">24565858<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3961701\" data-key=\"5541b2376d6cce59ada442d306b10e15\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3961701<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Using+the+internet+to+seek+information+about+genetic+and+rare+diseases%3A+A+case+study+comparing+data+from+2006+and+2011&rft.jtitle=JMIR+Research+Protocols&rft.aulast=Morgan%2C+T.%3B+Schmidt%2C+J.%3B+Haakonsen%2C+C.+et+al.&rft.au=Morgan%2C+T.%3B+Schmidt%2C+J.%3B+Haakonsen%2C+C.+et+al.&rft.date=2014&rft.volume=3&rft.issue=1&rft.pages=e10&rft_id=info:doi\/10.2196%2Fresprot.2916&rft_id=info:pmc\/PMC3961701&rft_id=info:pmid\/24565858&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3961701&rfr_id=info:sid\/en.wikipedia.org:Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GAGHGenomics16-8\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-GAGHGenomics16_8-0\">8.0<\/a><\/sup> <sup><a href=\"#cite_ref-GAGHGenomics16_8-1\">8.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Global Alliance for Genomics and Health (2016). \"GENOMICS. A federated ecosystem for sharing genomic, clinical data\". <i>Science<\/i> <b>352<\/b> (6291): 1278-80. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1126%2Fscience.aaf6162\" data-key=\"248df57e4f773175eaa03aea2d3d14c5\">10.1126\/science.aaf6162<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/27284183\" data-key=\"74ddc2f28b380bbc939590d55e75778c\">27284183<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GENOMICS.+A+federated+ecosystem+for+sharing+genomic%2C+clinical+data&rft.jtitle=Science&rft.aulast=Global+Alliance+for+Genomics+and+Health&rft.au=Global+Alliance+for+Genomics+and+Health&rft.date=2016&rft.volume=352&rft.issue=6291&rft.pages=1278-80&rft_id=info:doi\/10.1126%2Fscience.aaf6162&rft_id=info:pmid\/27284183&rfr_id=info:sid\/en.wikipedia.org:Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ONCConn15-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ONCConn15_9-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Office of the National Coordinator for Health Information Technology (October 2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.healthit.gov\/sites\/default\/files\/hie-interoperability\/nationwide-interoperability-roadmap-final-version-1.0.pdf\" data-key=\"67c41ff8fd5ee270b57fb745ab944494\">\"Connecting Health and Care for the Nation: A Shared Nationwide Interoperability Roadmap\"<\/a> (PDF)<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.healthit.gov\/sites\/default\/files\/hie-interoperability\/nationwide-interoperability-roadmap-final-version-1.0.pdf\" data-key=\"67c41ff8fd5ee270b57fb745ab944494\">https:\/\/www.healthit.gov\/sites\/default\/files\/hie-interoperability\/nationwide-interoperability-roadmap-final-version-1.0.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Connecting+Health+and+Care+for+the+Nation%3A+A+Shared+Nationwide+Interoperability+Roadmap&rft.atitle=&rft.aulast=Office+of+the+National+Coordinator+for+Health+Information+Technology&rft.au=Office+of+the+National+Coordinator+for+Health+Information+Technology&rft.date=October+2015&rft_id=https%3A%2F%2Fwww.healthit.gov%2Fsites%2Fdefault%2Ffiles%2Fhie-interoperability%2Fnationwide-interoperability-roadmap-final-version-1.0.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TrykaNCBI14-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-TrykaNCBI14_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Tryka, K.A.; Hao, L.; Sturcke, A. et al. (2014). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3965052\" data-key=\"457f8399c89949d30aed813442824c01\">\"NCBI's Database of Genotypes and Phenotypes: dbGaP\"<\/a>. <i>Nucleic Acids Research<\/i> <b>42<\/b> (DB1): D975\u20139. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1093%2Fnar%2Fgkt1211\" data-key=\"33f9f55d5577f4781f26dc8bcaf65588\">10.1093\/nar\/gkt1211<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3965052\/\" data-key=\"0b6769404ed6e4fafdc71b16b9d79b15\">PMC3965052<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24297256\" data-key=\"c9aebac90204d3b750e88a32bd88d864\">24297256<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3965052\" data-key=\"457f8399c89949d30aed813442824c01\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3965052<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=NCBI%27s+Database+of+Genotypes+and+Phenotypes%3A+dbGaP&rft.jtitle=Nucleic+Acids+Research&rft.aulast=Tryka%2C+K.A.%3B+Hao%2C+L.%3B+Sturcke%2C+A.+et+al.&rft.au=Tryka%2C+K.A.%3B+Hao%2C+L.%3B+Sturcke%2C+A.+et+al.&rft.date=2014&rft.volume=42&rft.issue=DB1&rft.pages=D975%E2%80%939&rft_id=info:doi\/10.1093%2Fnar%2Fgkt1211&rft_id=info:pmc\/PMC3965052&rft_id=info:pmid\/24297256&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3965052&rfr_id=info:sid\/en.wikipedia.org:Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original paper listed references alphabetically; this wiki lists them by order of appearance, by design. The sole footnote was turned into an inline reference for convenience.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185651\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.331 seconds\nReal time usage: 0.359 seconds\nPreprocessor visited node count: 9558\/1000000\nPreprocessor generated node count: 30209\/1000000\nPost\u2010expand include size: 77009\/2097152 bytes\nTemplate argument size: 24129\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 332.828 1 - -total\n 74.98% 249.553 1 - Template:Reflist\n 62.55% 208.199 10 - Template:Citation\/core\n 58.56% 194.913 8 - Template:Cite_journal\n 20.08% 66.822 1 - Template:Infobox_journal_article\n 19.24% 64.050 1 - Template:Infobox\n 11.47% 38.164 80 - Template:Infobox\/row\n 9.40% 31.282 2 - Template:Cite_web\n 9.37% 31.187 19 - Template:Citation\/identifier\n 3.41% 11.353 11 - Template:Citation\/make_link\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10917-0!*!0!!en!5!* and timestamp 20190401185650 and revision id 35020\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital\">https:\/\/www.limswiki.org\/index.php\/Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","47e85bcf8a99fb2753262f8d1499e7f0_images":["https:\/\/www.limswiki.org\/images\/8\/8d\/Fig1_Swaminathan_FrontInGenetics2018_9.jpg","https:\/\/www.limswiki.org\/images\/a\/a5\/Tab1_Swaminathan_FrontInGenetics2018_9.jpg"],"47e85bcf8a99fb2753262f8d1499e7f0_timestamp":1554145010,"cb9038099fb8453d3ea802865335a88b_type":"article","cb9038099fb8453d3ea802865335a88b_title":"Adapting data management education to support clinical research projects in an academic medical center (Read 2019)","cb9038099fb8453d3ea802865335a88b_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center","cb9038099fb8453d3ea802865335a88b_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Adapting data management education to support clinical research projects in an academic medical center\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nAdapting data management education to support clinical research projects in an academic medical centerJournal\n \nJournal of the Medical Library AssociationAuthor(s)\n \nRead, Kevin B.Author affiliation(s)\n \nNew York University School of MedicinePrimary contact\n \nEmail: kevin dot read at nyumc dot orgYear published\n \n2019Volume and issue\n \n107(1)Page(s)\n \n89\u201397DOI\n \n10.5195\/jmla.2019.580ISSN\n \n1558-9439Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttp:\/\/jmla.pitt.edu\/ojs\/jmla\/article\/view\/580\/792Download\n \nhttp:\/\/jmla.pitt.edu\/ojs\/jmla\/article\/download\/580\/773 (PDF)\n\nContents\n\n1 Abstract \n\n1.1 Background \n1.2 Case presentation \n1.3 Conclusions \n\n\n2 Background \n3 Study purpose \n4 Case presentation \n\n4.1 Workshop development \n\n4.1.1 Gaining skills \n4.1.2 Identifying core competencies and building workshop content \n\n\n4.2 Workshop implementation \n4.3 Workshop evaluation \n4.4 Workshop results \n\n\n5 Discussion \n6 Data availability statement \n7 Supplemental file \n8 References \n9 Notes \n\n\n\nAbstract \nBackground \nLibrarians and researchers alike have long identified research data management (RDM) training as a need in biomedical research. Despite the wealth of libraries offering RDM education to their communities, clinical research is an area that has not been targeted. Clinical RDM (CRDM) is seen by its community as an essential part of the research process where established guidelines exist, yet educational initiatives in this area are unknown.\n\nCase presentation \nLeveraging my academic library\u2019s experience supporting CRDM through informationist grants and REDCap training in our medical center, I developed a 1.5 hour CRDM workshop. This workshop was designed to use established CRDM guidelines in clinical research and address common questions asked by our community through the library\u2019s existing data support program. The workshop was offered to the entire medical center four times between November 2017 and July 2018. This case study describes the development, implementation, and evaluation of this workshop.\n\nConclusions \nThe four workshops were well attended and well received by the medical center community, with 99% stating that they would recommend the class to others and 98% stating that they would use what they learned in their work. Attendees also articulated how they would implement the main competencies they learned from the workshop into their work. For the library, the effort to support CRDM has led to the coordination of a larger institutional collaborative training series to educate researchers on best practices with data, as well as the formation of institution-wide policy groups to address researcher challenges with CRDM, data transfer, and data sharing.\n\nBackground \nFor over 10 years, data management training has been identified as a need by the biomedical research community and librarians alike. From the perspective of biomedical researchers, the lack of good quality information management for research data[1][2] and an absence of training for researchers to improve their data management skills are recurring issues cited in the literature and a cause for concern for research overall.[1][3][4] Similarly, librarians practicing data management have identified that researchers generally receive no formal training in data management[5] yet have a desire to learn[6] because they lack confidence in their skills.\nTo address this need, librarians from academic institutions have been working to provide data management education and support to their communities. By developing specific approaches to creating data management education, libraries have found successful avenues in implementing stand-alone courses and one-shot workshops[7], integrating research data management into an existing curriculum[8], and offering domain-specific training.[9] Libraries have offered these training programs by providing general data management training to undergraduate and graduate students[10][11][12], doctoral scholars[13], and the general research community[14][15][16][17][18][19][20], whereas domain-specific data management can be seen most prominently in the life sciences[21], earth and environmental sciences[22][23], social sciences[24], and the digital humanities.[25]\nWhile it is clear that libraries have made inroads into domain-specific areas to provide training in data management, the clinical research community\u2014clinical faculty, project, and research coordinators; postdoctoral scholars; medical residents and fellows; data analysts; and medical or doctoral degree (MD\/PhD) students\u2014is one that has not received much attention. Clinical research data management (CRDM), an integral part of the clinical research process, differs from the broader concept of research data management because it involves rigorous procedures for the standardized collection and careful management of patient data to protect patient [[Information privacy|privacy] and ensure quality and accuracy in medical care. The clinical research community understands the importance of data standardization[26][27][28][29], data quality[30][31][32][33], and data collection[28][34][35][36] and has established good clinical data management practices (GCDMP)[37] to ensure that CRDM is conducted at the highest level of excellence.\nDespite this community-driven goal toward CRDM excellence, there is a dearth of literature about data management training for clinical research, with the only evidence coming from nursing training programs[35][38], whose research practices are further afield in that they focus on quality improvement rather than clinical investigations. This lack of evidence is surprising considering that the need for CRDM training has been communicated.[1][3][4][6]\nMy library, located in an academic medical center, has supported CRDM through National Library of Medicine informationist projects by collaborating with clinical research teams to improve data management practices[39] and, more recently, by serving as the front line of support for REDCap (an electronic data capture system for storing research data) by offering consultations and comprehensive training.[40] Through REDCap training, I identified a need to expand my knowledge of CRDM to better support the needs of our research community. While REDCap is a tool to help researchers collect data for their studies, the majority of issues that our clinical research community encountered were related to data management. These issues included developing data collection plans, assigning and managing roles and responsibilities throughout the research process, ensuring that the quality of data remains intact throughout the course of the study, and creating data collection instruments. As this recurring thread of issues expanded the learning needs of our community beyond those provided via our REDCap training, I decided to expand my knowledge to address the questions that our researchers asked, to develop a curriculum to support CRDM, and to offer and evaluate CRDM training for our community.\n\nStudy purpose \nThis case study will discuss (a) the development and implementation of a 1.5-hour CRDM workshop for the medical center research community, (b) the results and outcomes from teaching the CRDM workshop, and (c) the next steps for the library in this area.\n\nCase presentation \nWorkshop development \nGaining skills \nBeyond the experience I gained from working closely with researchers on their clinical research projects and through REDCap support, I took two particularly valuable training opportunities that improved my skills in CRDM: the \u201cData Management for Clinical Research\u201d Coursera course[41] and \u201cDeveloping Data Management Plans\u201d course[42] offered through the online educational program sponsored by the Society for Clinical Data Management. These two courses provided me with the knowledge that I needed to teach a CRDM workshop but more importantly gave me the confidence to teach it because they provided a depth of knowledge I did not have before. These courses also served to reinforce that the issues and challenges encountered at my own institution were common data management concerns across the broader clinical research community.\n\nIdentifying core competencies and building workshop content \nThe primary focus for developing a 1.5-hour CRDM workshop was to use the GCDMP core guidelines[37] as the baseline structure for the workshop. The core guidelines are separated into chapters in the GCDMP, which were used as the foundation for the core competencies of the workshop. Once this baseline structure was established, my goal was to weave in answers to the common questions that our clinical research community has asked through our existing REDCap training. These questions related to how to create codebooks and data dictionaries for research projects, how to structure roles in a research team, how to use best practices for building data collection instruments, how to protect their data according to Health Insurance Portability and Accountability Act (HIPAA) regulations that they should be aware of, how to improve the quality of their data throughout a study, and how to best document procedures throughout a study.\nThe goal of the workshop was to tie as many examples back to REDCap as possible, because the use of REDCap was written into institutional policy as the recommended tool for research data collection, which made it essential to highlight its data management capabilities. The core competencies combined with the questions mentioned above served as the foundation for developing the learning objectives and interactive learning activities for the workshop (Table 1).\n\r\n\n\n\n\n\n\n\n\nTable 1. Clinical research data management workshop core competencies\n\n\nCore competency\n\nLearning objectives\n\nInteractive learning\n\n\nData collection planning\n\n\u25aa Plan a data collection work flow\r\n\u25aa Document tools and resources used for data collection\r\n\u25aa Connect study protocol to data collection plan\n\n\u25aa Describe study goal\r\n\u25aa Write down first five steps of the data collection plan\r\n\u25aa Communicate with partner(s)\/team to identify gaps\n\n\nData collection instrument design\n\n\u25aa Describe data collection best practices\r\n\u25aa Identify common data collection risks and pitfalls\n\n\u25aa Review data collection form and identify errors\r\n\u25aa Revise data collection form to collect data according to best practices\n\n\nData standards utilization\n\n\u25aa Define data standards\r\n\u25aa Describe the benefits of using data standards for research\r\n\u25aa Locate data standards for use in research study\r\n\u25aa Navigate the terms of use for specific data standards\n\n\u25aa Search for relevant data standards in the REDCap Shared Library, National Library of Medicine, and FAIRsharing.org\r\n\u25aa Explain the terms of use for the chosen data standard\n\n\nData quality maintenance\n\n\u25aa Describe the importance of using data quality measures in a clinical research project\r\n\u25aa Implement data quality work flows using REDCap\n\n\u25aa Develop a data quality plan for an existing or prospective research project\r\n\u25aa Implement the Data Resolution Workflow feature in REDCap\n\n\nData storage, transfer, and analysis best practices\n\n\u25aa Identify institutionally supported data storage and transfer software\r\n\u25aa Identify the components of a statistical analysis plan\r\n\u25aa Describe the documentation needed to perform a successful data transfer\n\n\u25aa Select the appropriate tool for data storage and transfer based on different scenarios\n\n\nRole and responsibility management\n\n\u25aa Describe methods for ensuring that roles and responsibilities are clearly assigned\r\n\u25aa Develop documentation for past, current, and future roles\n\n\u25aa Assign roles for different project personnel using REDCap\r\n\u25aa Describe methods used to assign roles with partner(s)\/team\n\n\n\nThe core competencies and learning objectives were designed to make the workshop as practical as possible. While the theoretical components of CRDM are important and are emphasized in the workshop, the main focus was to consistently incorporate interactive learning throughout so that attendees could both apply and contextualize what they learned to their own research. Another goal of this workshop was to encourage communication between attendees to highlight common CRDM errors and provide avenues for attendees to learn about successful and unsuccessful approaches from their peers. To this end, after each core competency was taught, the workshop was designed to have attendees discuss their own experiences.\nIn addition to the core competencies listed in Table 1, the overarching theme and intention applied across the workshop was the importance of maintaining good documentation throughout a clinical research project (e.g., data collection plan, roles and responsibilities documents, and statistical analysis plan). By stressing the importance of documentation for each competency, I hoped that attendees would understand the value of and be able to develop their own detailed documentation at each stage of the research process. The time dedicated to developing this workshop\u2014which included reviewing the GCDMP core competencies, outlining commonly asked questions from the research community, establishing learning objectives, building the slide deck, and creating the workshop activities\u2014took between 80 and 100 hours to complete.\n\nWorkshop implementation \nThe CRDM workshop was offered broadly throughout the medical center three separate times in November 2017, January 2018, and February 2018. These workshops were promoted using our library\u2019s email discussion list of attendees from previous data classes and the Office of Science and Research and Clinical and Translational Science Institute\u2019s announcements emails. Direct outreach was also extended to residency directors and research coordinators, both of whom regularly attend the library\u2019s REDCap training. A fourth workshop was offered in July 2018 as part of the library\u2019s established Data Day to Day series[43], which the library has substantially marketed through posters, write-ups in institutional newsletters, and broadcast emails.\n\nWorkshop evaluation \nThe CRDM workshop evaluation consisted of both quantitative and qualitative methods using a questionnaire administered at the conclusion of each workshop (see supplemental file \"Appendix\"). This study was deemed exempt by our institutional review board (IRB). Using Likert scales, questions asked attendees to evaluate the difficulty level of the material presented in the workshop, their willingness to recommend the workshop to others, and their intention to use what they had learned in their work. Free-text questions asked attendees to specify how they would use what they learned in their current roles in the institution and what other course topics they would be interested in learning about. For the question that asked attendees to describe how they would use what they learned in their current roles, I hand-coded responses in a spreadsheet using the emergent coding technique[44] to identify the competencies that attendees stated as the most applicable to their work.\n\nWorkshop results \nOf the 145 attendees at the four workshops, 113 provided fully or partially completed evaluation forms. Overall registration to and attendance at all four workshops was very high, with substantial wait lists accumulating for each class offered (Figure 1). In fact, the workshop offered in February 2018 was a direct result of having 60 people on the wait list from the January session. Wait lists were useful for identifying communities that I had not reached through training to date as well as for understanding the popularity of the topic for the research community. If the wait list was high in number, it provided another opportunity to offer the workshop or reach out to attendees to see if there was an opportunity to teach a smaller class in their departments.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Total attendance, registration, and waitlist numbers for the four clinical research data management (CRDM) workshops\n\n\n\nThere was a wide range of attendees at these workshops (Figure 2), as there were no restrictions on who could attend. Project\/research coordinators (n=38), faculty (n=18), and managers (n=13) were prominent attendees at the workshop, and their comments in the evaluation form reflected its value and the importance of someone from the library teaching this material.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. Roles of attendees of the four CRDM workshops\n\n\n\nResearch coordinators and project managers specifically indicated that the CRDM workshop was helpful in multiple ways for their roles, including how to set up the organization of their data collection procedures, how to establish and clarify roles in a research team, and how to develop documentation for both data collection and the roles and responsibilities of their staff. Research coordinators also indicated that no other stakeholders in the institution taught this kind of material and that this type of training was essential for their work.\nFaculty indicated that the workshop was beneficial for developing project management skills, gaining an awareness of the benefits of using REDCap to both collect and manage data, and clarifying the roles and responsibilities of statisticians on their team. They also mentioned the benefits of their study team taking a workshop of this kind at the beginning of a study.\nAttendees more generally described the value of the resources presented in the workshop, specifically stating that using REDCap, locating resources for identifying relevant data collection standards, gaining awareness of institutional data storage options, and using the workshop slide deck to guide their CRDM processes were particularly helpful.\nOverall, the evaluation data indicated positive results, with the majority of those who responded (94%) indicating the level of material was just right, and almost all who responded stating they would recommend the class to others (99%) and would use what they learned in their work (98%). Additionally, responses from attendees who indicated how they would use what they learned and apply it to their current role helped provide additional context for the benefits of the CRDM workshop (Figure 3) with improving documentation (37%), planning work flows (34%), using REDCap (22%), and assigning roles and responsibilities (17%) being the most prominent applications of the core competencies learned.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 3. How attendees would use what they learned in their current roles\n\n\n\nFinally, attendees expressed interest in many additional topics that they would like to see taught in future classes. These topics included statistics, research compliance, the legal implication of data sharing, and IRB best practices for study design. It is important to mention that attendees indicated that they would like to see these additional topics taught in tandem with the CRDM workshop so that they could gain a better understanding of CRDM from the perspective of an established institutional work flow for clinical research projects.\n\nDiscussion \nConsidering that this was the first time that I had offered CRDM training to our research community, the overall attendance, high wait list numbers, and percentage of attendees who said the course content was at the appropriate level validated the educational approach that I used. One major concern during the workshop development phase was that the content would be too rudimentary for our research community; however, the evaluations suggested that this was not the case. Furthermore, since one of the central goals of the CRDM workshop was to emphasize the importance of documentation for each core competency, the fact that this was the most commonly cited application of what attendees learned was further validation of the CRDM workshop\u2019s course content.\nWhile my approach was to utilize REDCap as a resource to demonstrate good CRDM practices because it served a direct purpose for our research community, this workshop can be taught without reference to it. The core competencies of this workshop (Table 1) are based on fundamental guidelines of good CRDM practice, and these competencies and skills are applicable to any stakeholder who participates in clinical research, no matter what tool or format they decide to use to collect their data.\nThe positive reviews of the four broadly offered courses led to seven additional CRDM training sessions that were requested by specific departments and research teams, indicating a strong need from our research community for this material. Evaluation forms were not distributed during these seven sessions due to the consult-like nature of these requests. During these sessions, several research coordinators indicated that the CRDM workshop should be required for all clinical research teams before their studies begin. This call for additional training presents an opportunity for our library to incorporate CRDM education into existing institutional initiatives. Specifically, I identified our institutional education and training management system, residency research blocks, and principal investigator training as logical next steps for integrating CRDM education into institutional research work flows.\nThe evaluation data initiated the development of partnerships with other institutional stakeholders to better support clinical research training efforts. Our library has begun conversations with stakeholders from research compliance, general counsel, the IRB, the Office of Science and Research, and information technology (IT) to identify ways to better address the needs of clinical researchers. The CRDM workshop highlighted a level of uncertainty on the part of clinical researchers about how best to conduct research in the medical center and whom to contact when faced with certain questions or issues.\nSubsequent discussions with the aforementioned stakeholders have emphasized a need to provide more clarity to our community about the research process. To this end, our library is leading the coordination of these groups to offer a comprehensive clinical data education series with representatives from each major department providing their own training to complement the library\u2019s existing REDCap and CRDM workshops. This training series will likely be offered through our library\u2019s existing \u201cData Day to Day\u201d series so that the research community can take all of the classes within a short time span.\nThe lack of institutional clarity that attendees and the aforementioned stakeholders identified has also led to policy discussions related to data transfer, sharing, and compliance, as our current institutional procedures are unclear and poorly utilized. Through the development of new standard operating procedures and increased educational initiatives, our library is driving awareness of institutional best practices with the hopes of improving clinical research efficiency. Members from our library now sit on institutional policy working groups that are working to improve institutional data transfer and data sharing workflows.\nJust as librarians at the University of Washington carved out a role for themselves in supporting clinical research efforts[45], we seized the opportunity to do the same by offering CRDM education. As the first line of defense for teaching researchers, identifying their data management issues, and hearing their concerns, our library is serving as the conduit for ensuring clinical research is conducted according to GCDM practices at our institution. Establishing partnerships with research compliance, general counsel, the Office of Science and Research, and IT provides us with additional knowledge of their institutional roles and subsequently enables us to send researchers in the right direction to receive the necessary expertise and support. As this service model develops, our library plans to monitor and assess referrals to these other departments to demonstrate the value of increasing compliance in the institution and to integrate CRDM education services into any newly developed policy (which we were successful in doing for the new institutional data storage policy and REDCap). With our library serving as the driving force behind the improvement of CRDM support, the ultimate goal is that these new partnerships will result in our research community being better trained, more compliant, and increasingly aware of established institutional workflows for clinical research.\n\nData availability statement \nThe workshop evaluation form, resulting data, and slide deck from the \u201cClinical Research Data Management\u201d workshop are available in Figshare at DOI: http:\/\/dx.doi.org\/10.6084\/m9.figshare.7105817.v1.\n\nSupplemental file \nAppendix: Evaluation form\n\nReferences \n\n\n\u2191 1.0 1.1 1.2 Anderson, N.R.; Lee, E.S.; Brockenbrough, J.S. et al. (2007). \"Issues in biomedical research data management and analysis: Needs and barriers\". JAMIA 14 (4): 478\u201388. doi:10.1197\/jamia.M2114. PMC PMC2244904. PMID 17460139. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2244904 .   \n\n\u2191 Wang, X.; Williams, C.; Liu, Z.H.; Croghan, J. (2019). \"Big data management challenges in health research\u2014A literature review\". Briefings in Bioinformatics 20 (1): 156\u201367. doi:10.1093\/bib\/bbx086. PMID 28968677.   \n\n\u2191 3.0 3.1 Barone, L.; Williams, J.; Micklos, D. (2017). \"Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators\". PLoS Computer Biology 13 (10): e1005755. doi:10.1371\/journal.pcbi.1005755. PMC PMC5654259. PMID 29049281. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5654259 .   \n\n\u2191 4.0 4.1 Johansson, B.; Fogelberg-Dahm, M.; Wadensten, B. (2010). \"Evidence-based practice: The importance of education and leadership\". Journal of Nursing Management 18 (1): 70-7. doi:10.1111\/j.1365-2834.2009.01060.x. PMID 20465731.   \n\n\u2191 Federer, L.M.; Lu, Y.L.; Joubert, D.J. (2016). \"Data literacy training needs of biomedical researchers\". Journal of the Medical Library Association 104 (1): 52\u20137. doi:10.3163\/1536-5050.104.1.008. PMC PMC4722643. PMID 26807053. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4722643 .   \n\n\u2191 6.0 6.1 Scaramozzino, J.M.; Ram\u00edrez, M.L.; McGaughey, K.J. (2012). \"A Study of Faculty Data Curation Behaviors and Attitudes at a Teaching-Centered University\". College & Research Libraries 73 (4): 349\u201365. doi:10.5860\/crl-255.   \n\n\u2191 Carlson, J.; Johnston, L.; Westra, B.; Nichols, M. (2013). \"Developing an Approach for Data Management Education: A Report from the Data Information Literacy Project\". International Journal of Digital Curation 8 (1): 204\u201317. doi:10.2218\/ijdc.v8i1.254.   \n\n\u2191 MacMillan, D. (2015). \"Developing data literacy competencies to enhance faculty collaborations\". LIBER Quarterly 24 (3): 140\u201360. doi:10.18352\/lq.9868.   \n\n\u2191 Wittenberg, J.; Elings, M. (2017). \"Building a Research Data Management Service at the University of California, Berkeley: A tale of collaboration\". IFLA Journal 43 (1): 89\u201397. doi:10.1177\/0340035216686982.   \n\n\u2191 Piorun, M.E.; Kafel, D.; Leger-Hornby, T. et al. (2012). \"Teaching Research Data Management: An Undergraduate\/Graduate Curriculum\". Journal of eScience Librarianship 1 (1): 8. doi:10.7191\/jeslib.2012.1003.   \n\n\u2191 Reisner, B.A.; Vaughan, K.T.L.; Shorish, Y.L. (2014). \"Making Data Management Accessible in the Undergraduate Chemistry Curriculum\". Journal of Chemical Education 91 (11): 1943\u20136. doi:10.1021\/ed500099h.   \n\n\u2191 Adamick, J.; Reznik-Zellen, R.C.; Sheridan, M. (2013). \"Data Management Training for Graduate Students at a Large Research University\". Journal of eScience Librarianship 1 (3): e1022. doi:10.7191\/jeslib.2012.1022.   \n\n\u2191 Fransson, J.; Lagunas, P.T.; Kjellberg, S.; Toit, M.D. (2016). \"Developing integrated research data management support in close relation to doctoral students' research practices\". Proceedings of the Association for Information Science and Technology 53 (1): 1\u20134. doi:10.1002\/pra2.2016.14505301094.   \n\n\u2191 Clement, R.; Blau, A.; Abbaspour, P. et al. (2017). \"Team-based data management instruction at small liberal arts colleges\". IFLA Journal 43 (1): 105\u201318. doi:10.1177\/0340035216678239.   \n\n\u2191 Johnston, L.; Jeffryes, J. (2014). \"Steal this idea: A library instructors\u2019 guide to educating students in data management skills\". College & Research Libraries News 75 (8): 431\u20134. doi:10.5860\/crln.75.8.9175.   \n\n\u2191 Johnston, L.; Lafferty, M.; Petsan, B. (2012). \"Training Researchers on Data Management: A Scalable, Cross-Disciplinary Approach\". Journal of eScience Librarianship 1 (2): 2. doi:10.7191\/jeslib.2012.1012.   \n\n\u2191 Muilenburg, J.; Lebow, M.; Rich, J. (2014). \"Lessons Learned From a Research Data Management Pilot Course at an Academic Library\". Journal of eScience Librarianship 3 (1): 8. doi:10.7191\/jeslib.2014.1058.   \n\n\u2191 Southall, J. Scutt, C. (2017). \"Training for Research Data Management at the Bodleian Libraries: National Contexts and Local Implementation for Researchers and Librarians\". New Review of Academic Librarianship 23 (2\u20133): 303\u201322. doi:10.1080\/13614533.2017.1318766.   \n\n\u2191 Tammaro, A.M.; Casarosa, V. (2014). \"Research Data Management in the Curriculum: An Interdisciplinary Approach\". Procedia Computer Science 38: 138\u201342. doi:10.1016\/j.procs.2014.10.023.   \n\n\u2191 Verbakel, E.; Grootveld, M. (2016). \"\u2018Essentials 4 Data Support\u2019: Five years\u2019 experience with data management training\". IFLA Journal 42 (4): 278\u201383. doi:10.1177\/0340035216674027.   \n\n\u2191 DeBose, K.G.; Haugen, I.; Miller, R.K. (2017). \"Information Literacy Instruction Programs: Supporting the College of Agriculture and Life Sciences Community at Virginia Tech\". Library Trends 65 (3): 316\u201338. doi:10.1353\/lib.2017.0004.   \n\n\u2191 Fong, B.L.; Wang, M. (2015). \"Required Data Management Training for Graduate Students in an Earth and Environmental Sciences Department\". Journal of eScience Librarianship 4 (1): 3. doi:10.7191\/jeslib.2015.1067.   \n\n\u2191 Hou, C.-Y. (2015). \"Meeting the Needs of Data Management Training: The Federation of Earth Science Information Partners (ESIP) Data Management for Scientists Short Course\". Issues in Science & Technology Librarianship Spring 2015 (80). doi:10.5062\/F42805MM.   \n\n\u2191 Thielen, J.; Hess, A.N. (2017). \"Advancing Research Data Management in the Social Sciences: Implementing Instruction for Education Graduate Students Into a Doctoral Curriculum\". Behavioral & Social Sciences Librarian 36 (1). doi:10.1080\/01639269.2017.1387739.   \n\n\u2191 Dressel, W.F. (2017). \"Research Data Management Instruction for Digital Humanities\". Journal of eScience Librarianship 6 (2): 5. doi:10.7191\/jeslib.2017.1115.   \n\n\u2191 Bruland, P.; Breil, B.; Ritz, F.; Duggas, M. (2012). \"Interoperability in clinical research: from metadata registries to semantically annotated CDISC ODM\". Studies in Health Technology and Informatics 180: 564\u20138. PMID 22874254.   \n\n\u2191 Gaddale, J.R. (2015). \"Clinical Data Acquisition Standards Harmonization importance and benefits in clinical data management\". Perspectives in Clinical Research 6 (4): 179\u201383. doi:10.4103\/2229-3485.167101. PMC PMC4640009. PMID 26623387. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4640009 .   \n\n\u2191 28.0 28.1 Krishnankutty, B.; Bellary, S.; Kumar, N.B. (2012). \"Data management in clinical research: An overview\". Indian Journal of Pharmacology 44 (2): 168\u201372. doi:10.4103\/0253-7613.93842. PMC PMC3326906. PMID 22529469. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3326906 .   \n\n\u2191 Leroux, H.; Metke-Jimenez, A.; Lawley, M.J. (2017). \"Towards achieving semantic interoperability of clinical study data with FHIR\". Journal of Biomedical Semantics 8 (1): 41. doi:10.1186\/s13326-017-0148-7. PMC PMC5606031. PMID 28927443. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5606031 .   \n\n\u2191 Arthofer, K.; Girardi, D. (2017). \"Data Quality- and Master Data Management - A Hospital Case\". Studies in Health Technology and Informatics 236: 259\u201366. doi:10.3233\/978-1-61499-759-7-259. PMID 28508805.   \n\n\u2191 Callahan, T.; Barnard, J.; Helmkamp, L. et al. (2017). \"Reporting Data Quality Assessment Results: Identifying Individual and Organizational Barriers and Solutions\". EGEMS 5 (1): 16. doi:10.5334\/egems.214. PMC PMC5982990. PMID 29881736. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5982990 .   \n\n\u2191 Houston, L.; Probst, Y.; Yu, P.; Martin, A. (2018). \"Exploring Data Quality Management within Clinical Trials\". Applied Clinical Informatics 9 (1): 72\u201381. doi:10.1055\/s-0037-1621702. PMC PMC5801732. PMID 29388180. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5801732 .   \n\n\u2191 Teunenbroek, T.V.; Baker, J.; Dijkzeul, A. (2017). \"Towards a more effective and efficient governance and regulation of nanomaterials\". Particle and Fibre Toxicology 14 (1): 54. doi:10.1186\/s12989-017-0235-z. PMC PMC5735801. PMID 29258600. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5735801 .   \n\n\u2191 Ohmann, C.; Banzi, R.; Canham, S. et al. (2017). \"Sharing and reuse of individual participant data from clinical trials: Principles and recommendations\". BMJ Open 7 (12): e018647. doi:10.1136\/bmjopen-2017-018647. PMC PMC5736032. PMID 29247106. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5736032 .   \n\n\u2191 35.0 35.1 Polancich, S.; James, D.H.; Miltner, R.S. et al. (2018). \"Building DNP Essential Skills in Clinical Data Management and Analysis\". Nurse Educator 43 (1): 37\u201341. doi:10.1097\/NNE.0000000000000411. PMID 28665824.   \n\n\u2191 Sirgo, G.; Esteban, F.; G\u00f3mez, J. et al. (2018). \"Validation of the ICU-DaMa tool for automatically extracting variables for minimum dataset and quality indicators: The importance of data quality assessment\". International Journal of Medical Informatics 112: 166\u201372. doi:10.1016\/j.ijmedinf.2018.02.007. PMID 29500016.   \n\n\u2191 37.0 37.1 \"GCDMP\". Society for Clinical Data Management. 2017. https:\/\/www.scdm.org\/publications\/gcdmp\/ .   \n\n\u2191 Sylvia, M.; Terhaar, M. (2014). \"An approach to clinical data management for the doctor of nursing practice curriculum\". Journal of Professional Nursing 31 (1): 56-62. doi:10.1016\/j.profnurs.2013.04.002. PMID 24503316.   \n\n\u2191 Read, K.B.; LaPolla, F.W.; Tolea, M.I. et al. (2017). \"Improving data collection, documentation, and workflow in a dementia screening study\". JMLA 105 (2): 160\u201366. doi:10.5195\/jmla.2017.221. PMC PMC5370608. PMID 28377680. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5370608 .   \n\n\u2191 Read, K.; LaPolla, F.W.Z. (2018). \"A new hat for librarians: Providing REDCap support to establish the library as a central data hub\". JMLA 106 (1): 120\u201326. doi:10.5195\/jmla.2018.327. PMC PMC5764577. PMID 29339942. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5764577 .   \n\n\u2191 Duda, S.; Harris, P. (2017). \"Data Management for Clinical Research\". Coursera, Inc. https:\/\/www.coursera.org\/learn\/clinical-data-management .   \n\n\u2191 Walden, A. (2017). \"Developing Data Management Plans\". Society for Clinical Data Management. http:\/\/portal.scdm.org\/node\/1006 .   \n\n\u2191 Surkis, A.; LaPolla, F.W.; Contaxis, N.; Read, K.B. (2017). \"Data Day to Day: building a community of expertise to address data skills gaps in an academic medical center\". JMLA 105 (2): 185\u201391. doi:10.5195\/jmla.2017.35. PMC PMC5370612. PMID 28377684. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5370612 .   \n\n\u2191 Stuckey, H. (2015). \"The second step in data analysis: Coding qualitative research data\". Journal of Social Health and Diabetes 3 (1): 7. http:\/\/go.galegroup.com\/ps\/anonymous?id=GALE%7CA383423301 .   \n\n\u2191 Bardyn, T.P.; Patridge, E.F.; Moore, M.T.; Koh, J.J. (2018). \"Health Sciences Libraries Advancing Collaborative Clinical Research Data Management in Universities\". Journal of eScience Librarianship 7 (2): e1130. doi:10.7191\/jeslib.2018.1130. PMC PMC6124496. PMID 30197832. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC6124496 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\">https:\/\/www.limswiki.org\/index.php\/Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles on clinical informaticsLIMSwiki journal articles on library informaticsLIMSwiki journal articles on research\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 21 January 2019, at 23:38.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 294 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","cb9038099fb8453d3ea802865335a88b_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Adapting data management education to support clinical research projects in an academic medical center<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Background\">Background<\/span><\/h3>\n<p>Librarians and researchers alike have long identified research <a href=\"https:\/\/www.limswiki.org\/index.php\/Information_management\" title=\"Information management\" class=\"wiki-link\" data-key=\"f8672d270c0750a858ed940158ca0a73\">data management<\/a> (RDM) training as a need in biomedical <a href=\"https:\/\/www.limswiki.org\/index.php\/Research\" title=\"Research\" class=\"wiki-link\" data-key=\"409634fd90113f119362927fe222f549\">research<\/a>. Despite the wealth of libraries offering RDM education to their communities, clinical research is an area that has not been targeted. Clinical RDM (CRDM) is seen by its community as an essential part of the research process where established guidelines exist, yet educational initiatives in this area are unknown.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Case_presentation\">Case presentation<\/span><\/h3>\n<p>Leveraging my academic library\u2019s experience supporting CRDM through informationist grants and REDCap training in our medical center, I developed a 1.5 hour CRDM workshop. This workshop was designed to use established CRDM guidelines in clinical research and address common questions asked by our community through the library\u2019s existing data support program. The workshop was offered to the entire medical center four times between November 2017 and July 2018. This case study describes the development, implementation, and evaluation of this workshop.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Conclusions\">Conclusions<\/span><\/h3>\n<p>The four workshops were well attended and well received by the medical center community, with 99% stating that they would recommend the class to others and 98% stating that they would use what they learned in their work. Attendees also articulated how they would implement the main competencies they learned from the workshop into their work. For the library, the effort to support CRDM has led to the coordination of a larger institutional collaborative training series to educate researchers on best practices with data, as well as the formation of institution-wide policy groups to address researcher challenges with CRDM, data transfer, and data sharing.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Background_2\">Background<\/span><\/h2>\n<p>For over 10 years, data management training has been identified as a need by the biomedical research community and librarians alike. From the perspective of biomedical researchers, the lack of good quality information management for research data<sup id=\"rdp-ebb-cite_ref-AndersonIssues07_1-0\" class=\"reference\"><a href=\"#cite_note-AndersonIssues07-1\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-WangBigData19_2-0\" class=\"reference\"><a href=\"#cite_note-WangBigData19-2\">[2]<\/a><\/sup> and an absence of training for researchers to improve their data management skills are recurring issues cited in the literature and a cause for concern for research overall.<sup id=\"rdp-ebb-cite_ref-AndersonIssues07_1-1\" class=\"reference\"><a href=\"#cite_note-AndersonIssues07-1\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-BaroneUnmet17_3-0\" class=\"reference\"><a href=\"#cite_note-BaroneUnmet17-3\">[3]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-JohanssonEvidence10_4-0\" class=\"reference\"><a href=\"#cite_note-JohanssonEvidence10-4\">[4]<\/a><\/sup> Similarly, librarians practicing data management have identified that researchers generally receive no formal training in data management<sup id=\"rdp-ebb-cite_ref-FedererData16_5-0\" class=\"reference\"><a href=\"#cite_note-FedererData16-5\">[5]<\/a><\/sup> yet have a desire to learn<sup id=\"rdp-ebb-cite_ref-ScaramozzinoAStudy12_6-0\" class=\"reference\"><a href=\"#cite_note-ScaramozzinoAStudy12-6\">[6]<\/a><\/sup> because they lack confidence in their skills.\n<\/p><p>To address this need, librarians from academic institutions have been working to provide data management education and support to their communities. By developing specific approaches to creating data management education, libraries have found successful avenues in implementing stand-alone courses and one-shot workshops<sup id=\"rdp-ebb-cite_ref-CarlsonDevelop13_7-0\" class=\"reference\"><a href=\"#cite_note-CarlsonDevelop13-7\">[7]<\/a><\/sup>, integrating research data management into an existing curriculum<sup id=\"rdp-ebb-cite_ref-MacMillanDevelop15_8-0\" class=\"reference\"><a href=\"#cite_note-MacMillanDevelop15-8\">[8]<\/a><\/sup>, and offering domain-specific training.<sup id=\"rdp-ebb-cite_ref-WittenbergBuilding17_9-0\" class=\"reference\"><a href=\"#cite_note-WittenbergBuilding17-9\">[9]<\/a><\/sup> Libraries have offered these training programs by providing general data management training to undergraduate and graduate students<sup id=\"rdp-ebb-cite_ref-PiorunTeaching12_10-0\" class=\"reference\"><a href=\"#cite_note-PiorunTeaching12-10\">[10]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ReisnerMakingData14_11-0\" class=\"reference\"><a href=\"#cite_note-ReisnerMakingData14-11\">[11]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-AdamickData12_12-0\" class=\"reference\"><a href=\"#cite_note-AdamickData12-12\">[12]<\/a><\/sup>, doctoral scholars<sup id=\"rdp-ebb-cite_ref-FranssonDevelop16_13-0\" class=\"reference\"><a href=\"#cite_note-FranssonDevelop16-13\">[13]<\/a><\/sup>, and the general research community<sup id=\"rdp-ebb-cite_ref-ClementTeam17_14-0\" class=\"reference\"><a href=\"#cite_note-ClementTeam17-14\">[14]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-JohnstonSteal14_15-0\" class=\"reference\"><a href=\"#cite_note-JohnstonSteal14-15\">[15]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-JohnstonTrain12_16-0\" class=\"reference\"><a href=\"#cite_note-JohnstonTrain12-16\">[16]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-MuilenburgLessons14_17-0\" class=\"reference\"><a href=\"#cite_note-MuilenburgLessons14-17\">[17]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-SouthallTrain17_18-0\" class=\"reference\"><a href=\"#cite_note-SouthallTrain17-18\">[18]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-TammaroResearch14_19-0\" class=\"reference\"><a href=\"#cite_note-TammaroResearch14-19\">[19]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-VerbakelEssentials16_20-0\" class=\"reference\"><a href=\"#cite_note-VerbakelEssentials16-20\">[20]<\/a><\/sup>, whereas domain-specific data management can be seen most prominently in the life sciences<sup id=\"rdp-ebb-cite_ref-DeBoseInfo17_21-0\" class=\"reference\"><a href=\"#cite_note-DeBoseInfo17-21\">[21]<\/a><\/sup>, earth and environmental sciences<sup id=\"rdp-ebb-cite_ref-FongRequired15_22-0\" class=\"reference\"><a href=\"#cite_note-FongRequired15-22\">[22]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HouMeet15_23-0\" class=\"reference\"><a href=\"#cite_note-HouMeet15-23\">[23]<\/a><\/sup>, social sciences<sup id=\"rdp-ebb-cite_ref-ThielenAdvancing17_24-0\" class=\"reference\"><a href=\"#cite_note-ThielenAdvancing17-24\">[24]<\/a><\/sup>, and the digital humanities.<sup id=\"rdp-ebb-cite_ref-DresselResearch17_25-0\" class=\"reference\"><a href=\"#cite_note-DresselResearch17-25\">[25]<\/a><\/sup>\n<\/p><p>While it is clear that libraries have made inroads into domain-specific areas to provide training in data management, the clinical research community\u2014clinical faculty, project, and research coordinators; postdoctoral scholars; medical residents and fellows; data analysts; and medical or doctoral degree (MD\/PhD) students\u2014is one that has not received much attention. Clinical research data management (CRDM), an integral part of the clinical research process, differs from the broader concept of research data management because it involves rigorous procedures for the standardized collection and careful management of patient data to protect patient [[Information privacy|privacy] and ensure quality and accuracy in medical care. The clinical research community understands the importance of <a href=\"https:\/\/www.limswiki.org\/index.php\/Data_integration\" title=\"Data integration\" class=\"wiki-link\" data-key=\"fd01c635859e1d5b9583e43e31ef6718\">data standardization<\/a><sup id=\"rdp-ebb-cite_ref-BrulandInterop12_26-0\" class=\"reference\"><a href=\"#cite_note-BrulandInterop12-26\">[26]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-GaddaleClinical15_27-0\" class=\"reference\"><a href=\"#cite_note-GaddaleClinical15-27\">[27]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-KrishnankuttyData12_28-0\" class=\"reference\"><a href=\"#cite_note-KrishnankuttyData12-28\">[28]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-LerouxTowards17_29-0\" class=\"reference\"><a href=\"#cite_note-LerouxTowards17-29\">[29]<\/a><\/sup>, <a href=\"https:\/\/www.limswiki.org\/index.php\/Data_integrity\" title=\"Data integrity\" class=\"wiki-link\" data-key=\"382a9bb77ee3e36bb3b37c79ed813167\">data quality<\/a><sup id=\"rdp-ebb-cite_ref-ArthoferData17_30-0\" class=\"reference\"><a href=\"#cite_note-ArthoferData17-30\">[30]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-CallahanRepo17_31-0\" class=\"reference\"><a href=\"#cite_note-CallahanRepo17-31\">[31]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HoustonExplor18_32-0\" class=\"reference\"><a href=\"#cite_note-HoustonExplor18-32\">[32]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-TeunenbroekTowards17_33-0\" class=\"reference\"><a href=\"#cite_note-TeunenbroekTowards17-33\">[33]<\/a><\/sup>, and data collection<sup id=\"rdp-ebb-cite_ref-KrishnankuttyData12_28-1\" class=\"reference\"><a href=\"#cite_note-KrishnankuttyData12-28\">[28]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-OhmannSharing17_34-0\" class=\"reference\"><a href=\"#cite_note-OhmannSharing17-34\">[34]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-PolancichBuild18_35-0\" class=\"reference\"><a href=\"#cite_note-PolancichBuild18-35\">[35]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-SirgoValid18_36-0\" class=\"reference\"><a href=\"#cite_note-SirgoValid18-36\">[36]<\/a><\/sup> and has established good clinical data management practices (GCDMP)<sup id=\"rdp-ebb-cite_ref-SCDM_GCDMP_37-0\" class=\"reference\"><a href=\"#cite_note-SCDM_GCDMP-37\">[37]<\/a><\/sup> to ensure that CRDM is conducted at the highest level of excellence.\n<\/p><p>Despite this community-driven goal toward CRDM excellence, there is a dearth of literature about data management training for clinical research, with the only evidence coming from nursing training programs<sup id=\"rdp-ebb-cite_ref-PolancichBuild18_35-1\" class=\"reference\"><a href=\"#cite_note-PolancichBuild18-35\">[35]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-SylviaAnApproach14_38-0\" class=\"reference\"><a href=\"#cite_note-SylviaAnApproach14-38\">[38]<\/a><\/sup>, whose research practices are further afield in that they focus on quality improvement rather than clinical investigations. This lack of evidence is surprising considering that the need for CRDM training has been communicated.<sup id=\"rdp-ebb-cite_ref-AndersonIssues07_1-2\" class=\"reference\"><a href=\"#cite_note-AndersonIssues07-1\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-BaroneUnmet17_3-1\" class=\"reference\"><a href=\"#cite_note-BaroneUnmet17-3\">[3]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-JohanssonEvidence10_4-1\" class=\"reference\"><a href=\"#cite_note-JohanssonEvidence10-4\">[4]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ScaramozzinoAStudy12_6-1\" class=\"reference\"><a href=\"#cite_note-ScaramozzinoAStudy12-6\">[6]<\/a><\/sup>\n<\/p><p>My library, located in an academic medical center, has supported CRDM through <a href=\"https:\/\/www.limswiki.org\/index.php\/United_States_National_Library_of_Medicine\" title=\"United States National Library of Medicine\" class=\"wiki-link\" data-key=\"f769280ef4fdc14f3c6370a7b49bf50f\">National Library of Medicine<\/a> informationist projects by collaborating with clinical research teams to improve data management practices<sup id=\"rdp-ebb-cite_ref-ReadImprov17_39-0\" class=\"reference\"><a href=\"#cite_note-ReadImprov17-39\">[39]<\/a><\/sup> and, more recently, by serving as the front line of support for REDCap (an electronic data capture system for storing research data) by offering consultations and comprehensive training.<sup id=\"rdp-ebb-cite_ref-ReadANewHat18_40-0\" class=\"reference\"><a href=\"#cite_note-ReadANewHat18-40\">[40]<\/a><\/sup> Through REDCap training, I identified a need to expand my knowledge of CRDM to better support the needs of our research community. While REDCap is a tool to help researchers collect data for their studies, the majority of issues that our clinical research community encountered were related to data management. These issues included developing data collection plans, assigning and managing roles and responsibilities throughout the research process, ensuring that the quality of data remains intact throughout the course of the study, and creating data collection instruments. As this recurring thread of issues expanded the learning needs of our community beyond those provided via our REDCap training, I decided to expand my knowledge to address the questions that our researchers asked, to develop a curriculum to support CRDM, and to offer and evaluate CRDM training for our community.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Study_purpose\">Study purpose<\/span><\/h2>\n<p>This case study will discuss (a) the development and implementation of a 1.5-hour CRDM workshop for the medical center research community, (b) the results and outcomes from teaching the CRDM workshop, and (c) the next steps for the library in this area.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Case_presentation_2\">Case presentation<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Workshop_development\">Workshop development<\/span><\/h3>\n<h4><span class=\"mw-headline\" id=\"Gaining_skills\">Gaining skills<\/span><\/h4>\n<p>Beyond the experience I gained from working closely with researchers on their clinical research projects and through REDCap support, I took two particularly valuable training opportunities that improved my skills in CRDM: the \u201cData Management for Clinical Research\u201d Coursera course<sup id=\"rdp-ebb-cite_ref-DudaData17_41-0\" class=\"reference\"><a href=\"#cite_note-DudaData17-41\">[41]<\/a><\/sup> and \u201cDeveloping Data Management Plans\u201d course<sup id=\"rdp-ebb-cite_ref-WaldenDevelop17_42-0\" class=\"reference\"><a href=\"#cite_note-WaldenDevelop17-42\">[42]<\/a><\/sup> offered through the online educational program sponsored by the Society for Clinical Data Management. These two courses provided me with the knowledge that I needed to teach a CRDM workshop but more importantly gave me the confidence to teach it because they provided a depth of knowledge I did not have before. These courses also served to reinforce that the issues and challenges encountered at my own institution were common data management concerns across the broader clinical research community.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Identifying_core_competencies_and_building_workshop_content\">Identifying core competencies and building workshop content<\/span><\/h4>\n<p>The primary focus for developing a 1.5-hour CRDM workshop was to use the GCDMP core guidelines<sup id=\"rdp-ebb-cite_ref-SCDM_GCDMP_37-1\" class=\"reference\"><a href=\"#cite_note-SCDM_GCDMP-37\">[37]<\/a><\/sup> as the baseline structure for the workshop. The core guidelines are separated into chapters in the GCDMP, which were used as the foundation for the core competencies of the workshop. Once this baseline structure was established, my goal was to weave in answers to the common questions that our clinical research community has asked through our existing REDCap training. These questions related to how to create codebooks and data dictionaries for research projects, how to structure roles in a research team, how to use best practices for building data collection instruments, how to protect their data according to <a href=\"https:\/\/www.limswiki.org\/index.php\/Health_Insurance_Portability_and_Accountability_Act\" title=\"Health Insurance Portability and Accountability Act\" class=\"wiki-link\" data-key=\"b70673a0117c21576016cb7498867153\">Health Insurance Portability and Accountability Act<\/a> (HIPAA) regulations that they should be aware of, how to improve the quality of their data throughout a study, and how to best document procedures throughout a study.\n<\/p><p>The goal of the workshop was to tie as many examples back to REDCap as possible, because the use of REDCap was written into institutional policy as the recommended tool for research data collection, which made it essential to highlight its data management capabilities. The core competencies combined with the questions mentioned above served as the foundation for developing the learning objectives and interactive learning activities for the workshop (Table 1).\n<\/p><p><br \/>\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"3\"><b>Table 1.<\/b> Clinical research data management workshop core competencies\n<\/td><\/tr>\n<tr>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Core competency\n<\/th>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Learning objectives\n<\/th>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Interactive learning\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Data collection planning\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Plan a data collection work flow<br \/>\u25aa Document tools and resources used for data collection<br \/>\u25aa Connect study protocol to data collection plan\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Describe study goal<br \/>\u25aa Write down first five steps of the data collection plan<br \/>\u25aa Communicate with partner(s)\/team to identify gaps\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Data collection instrument design\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Describe data collection best practices<br \/>\u25aa Identify common data collection risks and pitfalls\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Review data collection form and identify errors<br \/>\u25aa Revise data collection form to collect data according to best practices\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Data standards utilization\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Define data standards<br \/>\u25aa Describe the benefits of using data standards for research<br \/>\u25aa Locate data standards for use in research study<br \/>\u25aa Navigate the terms of use for specific data standards\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Search for relevant data standards in the REDCap Shared Library, National Library of Medicine, and FAIRsharing.org<br \/>\u25aa Explain the terms of use for the chosen data standard\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Data quality maintenance\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Describe the importance of using data quality measures in a clinical research project<br \/>\u25aa Implement data quality work flows using REDCap\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Develop a data quality plan for an existing or prospective research project<br \/>\u25aa Implement the Data Resolution Workflow feature in REDCap\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Data storage, transfer, and analysis best practices\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Identify institutionally supported data storage and transfer software<br \/>\u25aa Identify the components of a statistical analysis plan<br \/>\u25aa Describe the documentation needed to perform a successful data transfer\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Select the appropriate tool for data storage and transfer based on different scenarios\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Role and responsibility management\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Describe methods for ensuring that roles and responsibilities are clearly assigned<br \/>\u25aa Develop documentation for past, current, and future roles\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\u25aa Assign roles for different project personnel using REDCap<br \/>\u25aa Describe methods used to assign roles with partner(s)\/team\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The core competencies and learning objectives were designed to make the workshop as practical as possible. While the theoretical components of CRDM are important and are emphasized in the workshop, the main focus was to consistently incorporate interactive learning throughout so that attendees could both apply and contextualize what they learned to their own research. Another goal of this workshop was to encourage communication between attendees to highlight common CRDM errors and provide avenues for attendees to learn about successful and unsuccessful approaches from their peers. To this end, after each core competency was taught, the workshop was designed to have attendees discuss their own experiences.\n<\/p><p>In addition to the core competencies listed in Table 1, the overarching theme and intention applied across the workshop was the importance of maintaining good documentation throughout a clinical research project (e.g., data collection plan, roles and responsibilities documents, and statistical analysis plan). By stressing the importance of documentation for each competency, I hoped that attendees would understand the value of and be able to develop their own detailed documentation at each stage of the research process. The time dedicated to developing this workshop\u2014which included reviewing the GCDMP core competencies, outlining commonly asked questions from the research community, establishing learning objectives, building the slide deck, and creating the workshop activities\u2014took between 80 and 100 hours to complete.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Workshop_implementation\">Workshop implementation<\/span><\/h3>\n<p>The CRDM workshop was offered broadly throughout the medical center three separate times in November 2017, January 2018, and February 2018. These workshops were promoted using our library\u2019s email discussion list of attendees from previous data classes and the Office of Science and Research and Clinical and Translational Science Institute\u2019s announcements emails. Direct outreach was also extended to residency directors and research coordinators, both of whom regularly attend the library\u2019s REDCap training. A fourth workshop was offered in July 2018 as part of the library\u2019s established Data Day to Day series<sup id=\"rdp-ebb-cite_ref-SurkisData17_43-0\" class=\"reference\"><a href=\"#cite_note-SurkisData17-43\">[43]<\/a><\/sup>, which the library has substantially marketed through posters, write-ups in institutional newsletters, and broadcast emails.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Workshop_evaluation\">Workshop evaluation<\/span><\/h3>\n<p>The CRDM workshop evaluation consisted of both quantitative and qualitative methods using a questionnaire administered at the conclusion of each workshop (see supplemental file \"Appendix\"). This study was deemed exempt by our institutional review board (IRB). Using Likert scales, questions asked attendees to evaluate the difficulty level of the material presented in the workshop, their willingness to recommend the workshop to others, and their intention to use what they had learned in their work. Free-text questions asked attendees to specify how they would use what they learned in their current roles in the institution and what other course topics they would be interested in learning about. For the question that asked attendees to describe how they would use what they learned in their current roles, I hand-coded responses in a spreadsheet using the emergent coding technique<sup id=\"rdp-ebb-cite_ref-StuckeyTheSecond15_44-0\" class=\"reference\"><a href=\"#cite_note-StuckeyTheSecond15-44\">[44]<\/a><\/sup> to identify the competencies that attendees stated as the most applicable to their work.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Workshop_results\">Workshop results<\/span><\/h3>\n<p>Of the 145 attendees at the four workshops, 113 provided fully or partially completed evaluation forms. Overall registration to and attendance at all four workshops was very high, with substantial wait lists accumulating for each class offered (Figure 1). In fact, the workshop offered in February 2018 was a direct result of having 60 people on the wait list from the January session. Wait lists were useful for identifying communities that I had not reached through training to date as well as for understanding the popularity of the topic for the research community. If the wait list was high in number, it provided another opportunity to offer the workshop or reach out to attendees to see if there was an opportunity to teach a smaller class in their departments.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Read_JMedLibAssoc2019_107-1.gif\" class=\"image wiki-link\" data-key=\"f82794a8168e67356fd00572ffae208e\"><img alt=\"Fig1 Read JMedLibAssoc2019 107-1.gif\" src=\"https:\/\/www.limswiki.org\/images\/8\/8f\/Fig1_Read_JMedLibAssoc2019_107-1.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Total attendance, registration, and waitlist numbers for the four clinical research data management (CRDM) workshops<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>There was a wide range of attendees at these workshops (Figure 2), as there were no restrictions on who could attend. Project\/research coordinators (<i>n<\/i>=38), faculty (<i>n<\/i>=18), and managers (<i>n<\/i>=13) were prominent attendees at the workshop, and their comments in the evaluation form reflected its value and the importance of someone from the library teaching this material.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Read_JMedLibAssoc2019_107-1.gif\" class=\"image wiki-link\" data-key=\"1bcfb9864dec13d1cb6495468477647b\"><img alt=\"Fig2 Read JMedLibAssoc2019 107-1.gif\" src=\"https:\/\/www.limswiki.org\/images\/c\/ca\/Fig2_Read_JMedLibAssoc2019_107-1.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> Roles of attendees of the four CRDM workshops<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Research coordinators and project managers specifically indicated that the CRDM workshop was helpful in multiple ways for their roles, including how to set up the organization of their data collection procedures, how to establish and clarify roles in a research team, and how to develop documentation for both data collection and the roles and responsibilities of their staff. Research coordinators also indicated that no other stakeholders in the institution taught this kind of material and that this type of training was essential for their work.\n<\/p><p>Faculty indicated that the workshop was beneficial for developing project management skills, gaining an awareness of the benefits of using REDCap to both collect and manage data, and clarifying the roles and responsibilities of statisticians on their team. They also mentioned the benefits of their study team taking a workshop of this kind at the beginning of a study.\n<\/p><p>Attendees more generally described the value of the resources presented in the workshop, specifically stating that using REDCap, locating resources for identifying relevant data collection standards, gaining awareness of institutional data storage options, and using the workshop slide deck to guide their CRDM processes were particularly helpful.\n<\/p><p>Overall, the evaluation data indicated positive results, with the majority of those who responded (94%) indicating the level of material was just right, and almost all who responded stating they would recommend the class to others (99%) and would use what they learned in their work (98%). Additionally, responses from attendees who indicated how they would use what they learned and apply it to their current role helped provide additional context for the benefits of the CRDM workshop (Figure 3) with improving documentation (37%), planning work flows (34%), using REDCap (22%), and assigning roles and responsibilities (17%) being the most prominent applications of the core competencies learned.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Read_JMedLibAssoc2019_107-1.gif\" class=\"image wiki-link\" data-key=\"d85dd10f203c6bacdcd6eaee1cbdf3c4\"><img alt=\"Fig3 Read JMedLibAssoc2019 107-1.gif\" src=\"https:\/\/www.limswiki.org\/images\/d\/d0\/Fig3_Read_JMedLibAssoc2019_107-1.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 3.<\/b> How attendees would use what they learned in their current roles<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Finally, attendees expressed interest in many additional topics that they would like to see taught in future classes. These topics included statistics, research compliance, the legal implication of data sharing, and IRB best practices for study design. It is important to mention that attendees indicated that they would like to see these additional topics taught in tandem with the CRDM workshop so that they could gain a better understanding of CRDM from the perspective of an established institutional work flow for clinical research projects.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Discussion\">Discussion<\/span><\/h2>\n<p>Considering that this was the first time that I had offered CRDM training to our research community, the overall attendance, high wait list numbers, and percentage of attendees who said the course content was at the appropriate level validated the educational approach that I used. One major concern during the workshop development phase was that the content would be too rudimentary for our research community; however, the evaluations suggested that this was not the case. Furthermore, since one of the central goals of the CRDM workshop was to emphasize the importance of documentation for each core competency, the fact that this was the most commonly cited application of what attendees learned was further validation of the CRDM workshop\u2019s course content.\n<\/p><p>While my approach was to utilize REDCap as a resource to demonstrate good CRDM practices because it served a direct purpose for our research community, this workshop can be taught without reference to it. The core competencies of this workshop (Table 1) are based on fundamental guidelines of good CRDM practice, and these competencies and skills are applicable to any stakeholder who participates in clinical research, no matter what tool or format they decide to use to collect their data.\n<\/p><p>The positive reviews of the four broadly offered courses led to seven additional CRDM training sessions that were requested by specific departments and research teams, indicating a strong need from our research community for this material. Evaluation forms were not distributed during these seven sessions due to the consult-like nature of these requests. During these sessions, several research coordinators indicated that the CRDM workshop should be required for all clinical research teams before their studies begin. This call for additional training presents an opportunity for our library to incorporate CRDM education into existing institutional initiatives. Specifically, I identified our institutional education and training management system, residency research blocks, and principal investigator training as logical next steps for integrating CRDM education into institutional research work flows.\n<\/p><p>The evaluation data initiated the development of partnerships with other institutional stakeholders to better support clinical research training efforts. Our library has begun conversations with stakeholders from research compliance, general counsel, the IRB, the Office of Science and Research, and information technology (IT) to identify ways to better address the needs of clinical researchers. The CRDM workshop highlighted a level of uncertainty on the part of clinical researchers about how best to conduct research in the medical center and whom to contact when faced with certain questions or issues.\n<\/p><p>Subsequent discussions with the aforementioned stakeholders have emphasized a need to provide more clarity to our community about the research process. To this end, our library is leading the coordination of these groups to offer a comprehensive clinical data education series with representatives from each major department providing their own training to complement the library\u2019s existing REDCap and CRDM workshops. This training series will likely be offered through our library\u2019s existing \u201cData Day to Day\u201d series so that the research community can take all of the classes within a short time span.\n<\/p><p>The lack of institutional clarity that attendees and the aforementioned stakeholders identified has also led to policy discussions related to data transfer, sharing, and <a href=\"https:\/\/www.limswiki.org\/index.php\/Regulatory_compliance\" title=\"Regulatory compliance\" class=\"wiki-link\" data-key=\"7dbc9be278a8efda25a4b592ee6ef0ca\">compliance<\/a>, as our current institutional procedures are unclear and poorly utilized. Through the development of new standard operating procedures and increased educational initiatives, our library is driving awareness of institutional best practices with the hopes of improving clinical research efficiency. Members from our library now sit on institutional policy working groups that are working to improve institutional data transfer and data sharing workflows.\n<\/p><p>Just as librarians at the University of Washington carved out a role for themselves in supporting clinical research efforts<sup id=\"rdp-ebb-cite_ref-BardynHealth18_45-0\" class=\"reference\"><a href=\"#cite_note-BardynHealth18-45\">[45]<\/a><\/sup>, we seized the opportunity to do the same by offering CRDM education. As the first line of defense for teaching researchers, identifying their data management issues, and hearing their concerns, our library is serving as the conduit for ensuring clinical research is conducted according to GCDM practices at our institution. Establishing partnerships with research compliance, general counsel, the Office of Science and Research, and IT provides us with additional knowledge of their institutional roles and subsequently enables us to send researchers in the right direction to receive the necessary expertise and support. As this service model develops, our library plans to monitor and assess referrals to these other departments to demonstrate the value of increasing compliance in the institution and to integrate CRDM education services into any newly developed policy (which we were successful in doing for the new institutional data storage policy and REDCap). With our library serving as the driving force behind the improvement of CRDM support, the ultimate goal is that these new partnerships will result in our research community being better trained, more compliant, and increasingly aware of established institutional workflows for clinical research.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Data_availability_statement\">Data availability statement<\/span><\/h2>\n<p>The workshop evaluation form, resulting data, and slide deck from the \u201cClinical Research Data Management\u201d workshop are available in Figshare at DOI: <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/dx.doi.org\/10.6084\/m9.figshare.7105817.v1\" data-key=\"c1b8d3e0410bf80e7b26efb415712df8\">http:\/\/dx.doi.org\/10.6084\/m9.figshare.7105817.v1<\/a>.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Supplemental_file\">Supplemental file<\/span><\/h2>\n<p><i>Appendix<\/i>: <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/jmla.mlanet.org\/ojs\/jmla\/article\/downloadSuppFile\/580\/819\" data-key=\"78f561518d0242df1e1de5f7887377e4\">Evaluation form<\/a>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-AndersonIssues07-1\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-AndersonIssues07_1-0\">1.0<\/a><\/sup> <sup><a href=\"#cite_ref-AndersonIssues07_1-1\">1.1<\/a><\/sup> <sup><a href=\"#cite_ref-AndersonIssues07_1-2\">1.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Anderson, N.R.; Lee, E.S.; Brockenbrough, J.S. et al. (2007). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2244904\" data-key=\"12f6817523d7eade838e630614d5cc44\">\"Issues in biomedical research data management and analysis: Needs and barriers\"<\/a>. <i>JAMIA<\/i> <b>14<\/b> (4): 478\u201388. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1197%2Fjamia.M2114\" data-key=\"7e0f7238517f5702f6e15d83909073fb\">10.1197\/jamia.M2114<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2244904\/\" data-key=\"78092d7e0412f31333131e7dcebddd7a\">PMC2244904<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17460139\" data-key=\"f986ea5e1499fe23c8e83a1aac397032\">17460139<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2244904\" data-key=\"12f6817523d7eade838e630614d5cc44\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2244904<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Issues+in+biomedical+research+data+management+and+analysis%3A+Needs+and+barriers&rft.jtitle=JAMIA&rft.aulast=Anderson%2C+N.R.%3B+Lee%2C+E.S.%3B+Brockenbrough%2C+J.S.+et+al.&rft.au=Anderson%2C+N.R.%3B+Lee%2C+E.S.%3B+Brockenbrough%2C+J.S.+et+al.&rft.date=2007&rft.volume=14&rft.issue=4&rft.pages=478%E2%80%9388&rft_id=info:doi\/10.1197%2Fjamia.M2114&rft_id=info:pmc\/PMC2244904&rft_id=info:pmid\/17460139&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC2244904&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WangBigData19-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WangBigData19_2-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wang, X.; Williams, C.; Liu, Z.H.; Croghan, J. (2019). \"Big data management challenges in health research\u2014A literature review\". <i>Briefings in Bioinformatics<\/i> <b>20<\/b> (1): 156\u201367. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1093%2Fbib%2Fbbx086\" data-key=\"9fb9427b3dfe5cc1fec678a7f3c21676\">10.1093\/bib\/bbx086<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/28968677\" data-key=\"a9881f4b14ed02c5a184ae512bf78a67\">28968677<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Big+data+management+challenges+in+health+research%E2%80%94A+literature+review&rft.jtitle=Briefings+in+Bioinformatics&rft.aulast=Wang%2C+X.%3B+Williams%2C+C.%3B+Liu%2C+Z.H.%3B+Croghan%2C+J.&rft.au=Wang%2C+X.%3B+Williams%2C+C.%3B+Liu%2C+Z.H.%3B+Croghan%2C+J.&rft.date=2019&rft.volume=20&rft.issue=1&rft.pages=156%E2%80%9367&rft_id=info:doi\/10.1093%2Fbib%2Fbbx086&rft_id=info:pmid\/28968677&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BaroneUnmet17-3\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BaroneUnmet17_3-0\">3.0<\/a><\/sup> <sup><a href=\"#cite_ref-BaroneUnmet17_3-1\">3.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Barone, L.; Williams, J.; Micklos, D. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5654259\" data-key=\"c9d19b9da7b1a9479655f402787dfd6f\">\"Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators\"<\/a>. <i>PLoS Computer Biology<\/i> <b>13<\/b> (10): e1005755. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.1005755\" data-key=\"869c197d7a9e87100bd437a34f86b5dd\">10.1371\/journal.pcbi.1005755<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5654259\/\" data-key=\"cbfe3af9676bd4d93c13f569a283be61\">PMC5654259<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/29049281\" data-key=\"1a322ebbd5de70dc67e98b6c48ab9b65\">29049281<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5654259\" data-key=\"c9d19b9da7b1a9479655f402787dfd6f\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5654259<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Unmet+needs+for+analyzing+biological+big+data%3A+A+survey+of+704+NSF+principal+investigators&rft.jtitle=PLoS+Computer+Biology&rft.aulast=Barone%2C+L.%3B+Williams%2C+J.%3B+Micklos%2C+D.&rft.au=Barone%2C+L.%3B+Williams%2C+J.%3B+Micklos%2C+D.&rft.date=2017&rft.volume=13&rft.issue=10&rft.pages=e1005755&rft_id=info:doi\/10.1371%2Fjournal.pcbi.1005755&rft_id=info:pmc\/PMC5654259&rft_id=info:pmid\/29049281&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5654259&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JohanssonEvidence10-4\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-JohanssonEvidence10_4-0\">4.0<\/a><\/sup> <sup><a href=\"#cite_ref-JohanssonEvidence10_4-1\">4.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Johansson, B.; Fogelberg-Dahm, M.; Wadensten, B. (2010). \"Evidence-based practice: The importance of education and leadership\". <i>Journal of Nursing Management<\/i> <b>18<\/b> (1): 70-7. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1111%2Fj.1365-2834.2009.01060.x\" data-key=\"8915dbbd7366292fc17775e282d13553\">10.1111\/j.1365-2834.2009.01060.x<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20465731\" data-key=\"5fa98640b496321d54cc0a558df02079\">20465731<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evidence-based+practice%3A+The+importance+of+education+and+leadership&rft.jtitle=Journal+of+Nursing+Management&rft.aulast=Johansson%2C+B.%3B+Fogelberg-Dahm%2C+M.%3B+Wadensten%2C+B.&rft.au=Johansson%2C+B.%3B+Fogelberg-Dahm%2C+M.%3B+Wadensten%2C+B.&rft.date=2010&rft.volume=18&rft.issue=1&rft.pages=70-7&rft_id=info:doi\/10.1111%2Fj.1365-2834.2009.01060.x&rft_id=info:pmid\/20465731&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FedererData16-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FedererData16_5-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Federer, L.M.; Lu, Y.L.; Joubert, D.J. (2016). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4722643\" data-key=\"af0421c85a175f12a63cf72cf4cd11ec\">\"Data literacy training needs of biomedical researchers\"<\/a>. <i>Journal of the Medical Library Association<\/i> <b>104<\/b> (1): 52\u20137. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.3163%2F1536-5050.104.1.008\" data-key=\"edfa39064667916cf75a160f6aecb9e8\">10.3163\/1536-5050.104.1.008<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4722643\/\" data-key=\"efdeace5b2f18ffbd2d8cc302932c264\">PMC4722643<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26807053\" data-key=\"61a53d69985885633578a7e51200cf07\">26807053<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4722643\" data-key=\"af0421c85a175f12a63cf72cf4cd11ec\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4722643<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+literacy+training+needs+of+biomedical+researchers&rft.jtitle=Journal+of+the+Medical+Library+Association&rft.aulast=Federer%2C+L.M.%3B+Lu%2C+Y.L.%3B+Joubert%2C+D.J.&rft.au=Federer%2C+L.M.%3B+Lu%2C+Y.L.%3B+Joubert%2C+D.J.&rft.date=2016&rft.volume=104&rft.issue=1&rft.pages=52%E2%80%937&rft_id=info:doi\/10.3163%2F1536-5050.104.1.008&rft_id=info:pmc\/PMC4722643&rft_id=info:pmid\/26807053&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4722643&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ScaramozzinoAStudy12-6\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-ScaramozzinoAStudy12_6-0\">6.0<\/a><\/sup> <sup><a href=\"#cite_ref-ScaramozzinoAStudy12_6-1\">6.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Scaramozzino, J.M.; Ram\u00edrez, M.L.; McGaughey, K.J. (2012). \"A Study of Faculty Data Curation Behaviors and Attitudes at a Teaching-Centered University\". <i>College & Research Libraries<\/i> <b>73<\/b> (4): 349\u201365. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.5860%2Fcrl-255\" data-key=\"61e24c514ff5c8b38ae323a11fb4e0c7\">10.5860\/crl-255<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Study+of+Faculty+Data+Curation+Behaviors+and+Attitudes+at+a+Teaching-Centered+University&rft.jtitle=College+%26+Research+Libraries&rft.aulast=Scaramozzino%2C+J.M.%3B+Ram%C3%ADrez%2C+M.L.%3B+McGaughey%2C+K.J.&rft.au=Scaramozzino%2C+J.M.%3B+Ram%C3%ADrez%2C+M.L.%3B+McGaughey%2C+K.J.&rft.date=2012&rft.volume=73&rft.issue=4&rft.pages=349%E2%80%9365&rft_id=info:doi\/10.5860%2Fcrl-255&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CarlsonDevelop13-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CarlsonDevelop13_7-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Carlson, J.; Johnston, L.; Westra, B.; Nichols, M. (2013). \"Developing an Approach for Data Management Education: A Report from the Data Information Literacy Project\". <i>International Journal of Digital Curation<\/i> <b>8<\/b> (1): 204\u201317. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.2218%2Fijdc.v8i1.254\" data-key=\"978ee791e684faf33a465fb1bc7fabb9\">10.2218\/ijdc.v8i1.254<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Developing+an+Approach+for+Data+Management+Education%3A+A+Report+from+the+Data+Information+Literacy+Project&rft.jtitle=International+Journal+of+Digital+Curation&rft.aulast=Carlson%2C+J.%3B+Johnston%2C+L.%3B+Westra%2C+B.%3B+Nichols%2C+M.&rft.au=Carlson%2C+J.%3B+Johnston%2C+L.%3B+Westra%2C+B.%3B+Nichols%2C+M.&rft.date=2013&rft.volume=8&rft.issue=1&rft.pages=204%E2%80%9317&rft_id=info:doi\/10.2218%2Fijdc.v8i1.254&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MacMillanDevelop15-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MacMillanDevelop15_8-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">MacMillan, D. (2015). \"Developing data literacy competencies to enhance faculty collaborations\". <i>LIBER Quarterly<\/i> <b>24<\/b> (3): 140\u201360. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.18352%2Flq.9868\" data-key=\"ca5733c13423de013db59e206e6c9917\">10.18352\/lq.9868<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Developing+data+literacy+competencies+to+enhance+faculty+collaborations&rft.jtitle=LIBER+Quarterly&rft.aulast=MacMillan%2C+D.&rft.au=MacMillan%2C+D.&rft.date=2015&rft.volume=24&rft.issue=3&rft.pages=140%E2%80%9360&rft_id=info:doi\/10.18352%2Flq.9868&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WittenbergBuilding17-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WittenbergBuilding17_9-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wittenberg, J.; Elings, M. (2017). \"Building a Research Data Management Service at the University of California, Berkeley: A tale of collaboration\". <i>IFLA Journal<\/i> <b>43<\/b> (1): 89\u201397. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1177%2F0340035216686982\" data-key=\"15bbb0e3039958d8efcbaa456e8fd3b6\">10.1177\/0340035216686982<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Building+a+Research+Data+Management+Service+at+the+University+of+California%2C+Berkeley%3A+A+tale+of+collaboration&rft.jtitle=IFLA+Journal&rft.aulast=Wittenberg%2C+J.%3B+Elings%2C+M.&rft.au=Wittenberg%2C+J.%3B+Elings%2C+M.&rft.date=2017&rft.volume=43&rft.issue=1&rft.pages=89%E2%80%9397&rft_id=info:doi\/10.1177%2F0340035216686982&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PiorunTeaching12-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PiorunTeaching12_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Piorun, M.E.; Kafel, D.; Leger-Hornby, T. et al. (2012). \"Teaching Research Data Management: An Undergraduate\/Graduate Curriculum\". <i>Journal of eScience Librarianship<\/i> <b>1<\/b> (1): 8. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.7191%2Fjeslib.2012.1003\" data-key=\"8f09fcbd91e1afc2d1e0d6521e6c1528\">10.7191\/jeslib.2012.1003<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Teaching+Research+Data+Management%3A+An+Undergraduate%2FGraduate+Curriculum&rft.jtitle=Journal+of+eScience+Librarianship&rft.aulast=Piorun%2C+M.E.%3B+Kafel%2C+D.%3B+Leger-Hornby%2C+T.+et+al.&rft.au=Piorun%2C+M.E.%3B+Kafel%2C+D.%3B+Leger-Hornby%2C+T.+et+al.&rft.date=2012&rft.volume=1&rft.issue=1&rft.pages=8&rft_id=info:doi\/10.7191%2Fjeslib.2012.1003&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ReisnerMakingData14-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ReisnerMakingData14_11-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Reisner, B.A.; Vaughan, K.T.L.; Shorish, Y.L. (2014). \"Making Data Management Accessible in the Undergraduate Chemistry Curriculum\". <i>Journal of Chemical Education<\/i> <b>91<\/b> (11): 1943\u20136. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1021%2Fed500099h\" data-key=\"b2a4623eda9b7a39aa8a5e762baec90a\">10.1021\/ed500099h<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Making+Data+Management+Accessible+in+the+Undergraduate+Chemistry+Curriculum&rft.jtitle=Journal+of+Chemical+Education&rft.aulast=Reisner%2C+B.A.%3B+Vaughan%2C+K.T.L.%3B+Shorish%2C+Y.L.&rft.au=Reisner%2C+B.A.%3B+Vaughan%2C+K.T.L.%3B+Shorish%2C+Y.L.&rft.date=2014&rft.volume=91&rft.issue=11&rft.pages=1943%E2%80%936&rft_id=info:doi\/10.1021%2Fed500099h&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AdamickData12-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AdamickData12_12-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Adamick, J.; Reznik-Zellen, R.C.; Sheridan, M. (2013). \"Data Management Training for Graduate Students at a Large Research University\". <i>Journal of eScience Librarianship<\/i> <b>1<\/b> (3): e1022. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.7191%2Fjeslib.2012.1022\" data-key=\"36d3702a415d8f3a6791196ee361029c\">10.7191\/jeslib.2012.1022<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+Management+Training+for+Graduate+Students+at+a+Large+Research+University&rft.jtitle=Journal+of+eScience+Librarianship&rft.aulast=Adamick%2C+J.%3B+Reznik-Zellen%2C+R.C.%3B+Sheridan%2C+M.&rft.au=Adamick%2C+J.%3B+Reznik-Zellen%2C+R.C.%3B+Sheridan%2C+M.&rft.date=2013&rft.volume=1&rft.issue=3&rft.pages=e1022&rft_id=info:doi\/10.7191%2Fjeslib.2012.1022&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FranssonDevelop16-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FranssonDevelop16_13-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Fransson, J.; Lagunas, P.T.; Kjellberg, S.; Toit, M.D. (2016). \"Developing integrated research data management support in close relation to doctoral students' research practices\". <i>Proceedings of the Association for Information Science and Technology<\/i> <b>53<\/b> (1): 1\u20134. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1002%2Fpra2.2016.14505301094\" data-key=\"daf22eb7f24d20dd3d96ee5d12db2361\">10.1002\/pra2.2016.14505301094<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Developing+integrated+research+data+management+support+in+close+relation+to+doctoral+students%27+research+practices&rft.jtitle=Proceedings+of+the+Association+for+Information+Science+and+Technology&rft.aulast=Fransson%2C+J.%3B+Lagunas%2C+P.T.%3B+Kjellberg%2C+S.%3B+Toit%2C+M.D.&rft.au=Fransson%2C+J.%3B+Lagunas%2C+P.T.%3B+Kjellberg%2C+S.%3B+Toit%2C+M.D.&rft.date=2016&rft.volume=53&rft.issue=1&rft.pages=1%E2%80%934&rft_id=info:doi\/10.1002%2Fpra2.2016.14505301094&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ClementTeam17-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ClementTeam17_14-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Clement, R.; Blau, A.; Abbaspour, P. et al. (2017). \"Team-based data management instruction at small liberal arts colleges\". <i>IFLA Journal<\/i> <b>43<\/b> (1): 105\u201318. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1177%2F0340035216678239\" data-key=\"fcc4e9d5bb50807862eb53a509175051\">10.1177\/0340035216678239<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Team-based+data+management+instruction+at+small+liberal+arts+colleges&rft.jtitle=IFLA+Journal&rft.aulast=Clement%2C+R.%3B+Blau%2C+A.%3B+Abbaspour%2C+P.+et+al.&rft.au=Clement%2C+R.%3B+Blau%2C+A.%3B+Abbaspour%2C+P.+et+al.&rft.date=2017&rft.volume=43&rft.issue=1&rft.pages=105%E2%80%9318&rft_id=info:doi\/10.1177%2F0340035216678239&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JohnstonSteal14-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-JohnstonSteal14_15-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Johnston, L.; Jeffryes, J. (2014). \"Steal this idea: A library instructors\u2019 guide to educating students in data management skills\". <i>College & Research Libraries News<\/i> <b>75<\/b> (8): 431\u20134. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.5860%2Fcrln.75.8.9175\" data-key=\"630f7d4498f8aeafbe27a27fa165dec3\">10.5860\/crln.75.8.9175<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Steal+this+idea%3A+A+library+instructors%E2%80%99+guide+to+educating+students+in+data+management+skills&rft.jtitle=College+%26+Research+Libraries+News&rft.aulast=Johnston%2C+L.%3B+Jeffryes%2C+J.&rft.au=Johnston%2C+L.%3B+Jeffryes%2C+J.&rft.date=2014&rft.volume=75&rft.issue=8&rft.pages=431%E2%80%934&rft_id=info:doi\/10.5860%2Fcrln.75.8.9175&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JohnstonTrain12-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-JohnstonTrain12_16-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Johnston, L.; Lafferty, M.; Petsan, B. (2012). \"Training Researchers on Data Management: A Scalable, Cross-Disciplinary Approach\". <i>Journal of eScience Librarianship<\/i> <b>1<\/b> (2): 2. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.7191%2Fjeslib.2012.1012\" data-key=\"95ee72ec2b0ac3690caefdc1d0dec6b6\">10.7191\/jeslib.2012.1012<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Training+Researchers+on+Data+Management%3A+A+Scalable%2C+Cross-Disciplinary+Approach&rft.jtitle=Journal+of+eScience+Librarianship&rft.aulast=Johnston%2C+L.%3B+Lafferty%2C+M.%3B+Petsan%2C+B.&rft.au=Johnston%2C+L.%3B+Lafferty%2C+M.%3B+Petsan%2C+B.&rft.date=2012&rft.volume=1&rft.issue=2&rft.pages=2&rft_id=info:doi\/10.7191%2Fjeslib.2012.1012&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MuilenburgLessons14-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MuilenburgLessons14_17-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Muilenburg, J.; Lebow, M.; Rich, J. (2014). \"Lessons Learned From a Research Data Management Pilot Course at an Academic Library\". <i>Journal of eScience Librarianship<\/i> <b>3<\/b> (1): 8. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.7191%2Fjeslib.2014.1058\" data-key=\"2224e6fb1ca970a25b804bd21df3b2e7\">10.7191\/jeslib.2014.1058<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Lessons+Learned+From+a+Research+Data+Management+Pilot+Course+at+an+Academic+Library&rft.jtitle=Journal+of+eScience+Librarianship&rft.aulast=Muilenburg%2C+J.%3B+Lebow%2C+M.%3B+Rich%2C+J.&rft.au=Muilenburg%2C+J.%3B+Lebow%2C+M.%3B+Rich%2C+J.&rft.date=2014&rft.volume=3&rft.issue=1&rft.pages=8&rft_id=info:doi\/10.7191%2Fjeslib.2014.1058&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SouthallTrain17-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SouthallTrain17_18-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Southall, J. Scutt, C. (2017). \"Training for Research Data Management at the Bodleian Libraries: National Contexts and Local Implementation for Researchers and Librarians\". <i>New Review of Academic Librarianship<\/i> <b>23<\/b> (2\u20133): 303\u201322. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1080%2F13614533.2017.1318766\" data-key=\"ee9e3ae82379ce70a24f4eadf537b7cf\">10.1080\/13614533.2017.1318766<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Training+for+Research+Data+Management+at+the+Bodleian+Libraries%3A+National+Contexts+and+Local+Implementation+for+Researchers+and+Librarians&rft.jtitle=New+Review+of+Academic+Librarianship&rft.aulast=Southall%2C+J.+Scutt%2C+C.&rft.au=Southall%2C+J.+Scutt%2C+C.&rft.date=2017&rft.volume=23&rft.issue=2%E2%80%933&rft.pages=303%E2%80%9322&rft_id=info:doi\/10.1080%2F13614533.2017.1318766&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TammaroResearch14-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-TammaroResearch14_19-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Tammaro, A.M.; Casarosa, V. (2014). \"Research Data Management in the Curriculum: An Interdisciplinary Approach\". <i>Procedia Computer Science<\/i> <b>38<\/b>: 138\u201342. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.procs.2014.10.023\" data-key=\"1ac47d09d6c0d85a44b2a3b28086e76d\">10.1016\/j.procs.2014.10.023<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Research+Data+Management+in+the+Curriculum%3A+An+Interdisciplinary+Approach&rft.jtitle=Procedia+Computer+Science&rft.aulast=Tammaro%2C+A.M.%3B+Casarosa%2C+V.&rft.au=Tammaro%2C+A.M.%3B+Casarosa%2C+V.&rft.date=2014&rft.volume=38&rft.pages=138%E2%80%9342&rft_id=info:doi\/10.1016%2Fj.procs.2014.10.023&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-VerbakelEssentials16-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-VerbakelEssentials16_20-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Verbakel, E.; Grootveld, M. (2016). \"\u2018Essentials 4 Data Support\u2019: Five years\u2019 experience with data management training\". <i>IFLA Journal<\/i> <b>42<\/b> (4): 278\u201383. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1177%2F0340035216674027\" data-key=\"cdd177843daf468b9303554085cd2ef8\">10.1177\/0340035216674027<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E2%80%98Essentials+4+Data+Support%E2%80%99%3A+Five+years%E2%80%99+experience+with+data+management+training&rft.jtitle=IFLA+Journal&rft.aulast=Verbakel%2C+E.%3B+Grootveld%2C+M.&rft.au=Verbakel%2C+E.%3B+Grootveld%2C+M.&rft.date=2016&rft.volume=42&rft.issue=4&rft.pages=278%E2%80%9383&rft_id=info:doi\/10.1177%2F0340035216674027&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DeBoseInfo17-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DeBoseInfo17_21-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">DeBose, K.G.; Haugen, I.; Miller, R.K. (2017). \"Information Literacy Instruction Programs: Supporting the College of Agriculture and Life Sciences Community at Virginia Tech\". <i>Library Trends<\/i> <b>65<\/b> (3): 316\u201338. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1353%2Flib.2017.0004\" data-key=\"85f3ce457d7525bfd364f581a837bcb5\">10.1353\/lib.2017.0004<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Information+Literacy+Instruction+Programs%3A+Supporting+the+College+of+Agriculture+and+Life+Sciences+Community+at+Virginia+Tech&rft.jtitle=Library+Trends&rft.aulast=DeBose%2C+K.G.%3B+Haugen%2C+I.%3B+Miller%2C+R.K.&rft.au=DeBose%2C+K.G.%3B+Haugen%2C+I.%3B+Miller%2C+R.K.&rft.date=2017&rft.volume=65&rft.issue=3&rft.pages=316%E2%80%9338&rft_id=info:doi\/10.1353%2Flib.2017.0004&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FongRequired15-22\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FongRequired15_22-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Fong, B.L.; Wang, M. (2015). \"Required Data Management Training for Graduate Students in an Earth and Environmental Sciences Department\". <i>Journal of eScience Librarianship<\/i> <b>4<\/b> (1): 3. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.7191%2Fjeslib.2015.1067\" data-key=\"16022e53f256c9e2da57a4ed6fd684e7\">10.7191\/jeslib.2015.1067<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Required+Data+Management+Training+for+Graduate+Students+in+an+Earth+and+Environmental+Sciences+Department&rft.jtitle=Journal+of+eScience+Librarianship&rft.aulast=Fong%2C+B.L.%3B+Wang%2C+M.&rft.au=Fong%2C+B.L.%3B+Wang%2C+M.&rft.date=2015&rft.volume=4&rft.issue=1&rft.pages=3&rft_id=info:doi\/10.7191%2Fjeslib.2015.1067&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HouMeet15-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HouMeet15_23-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hou, C.-Y. (2015). \"Meeting the Needs of Data Management Training: The Federation of Earth Science Information Partners (ESIP) Data Management for Scientists Short Course\". <i>Issues in Science & Technology Librarianship<\/i> <b>Spring 2015<\/b> (80). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.5062%2FF42805MM\" data-key=\"8ca34abcbb2bff220d25f262133c649f\">10.5062\/F42805MM<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Meeting+the+Needs+of+Data+Management+Training%3A+The+Federation+of+Earth+Science+Information+Partners+%28ESIP%29+Data+Management+for+Scientists+Short+Course&rft.jtitle=Issues+in+Science+%26+Technology+Librarianship&rft.aulast=Hou%2C+C.-Y.&rft.au=Hou%2C+C.-Y.&rft.date=2015&rft.volume=Spring+2015&rft.issue=80&rft_id=info:doi\/10.5062%2FF42805MM&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ThielenAdvancing17-24\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ThielenAdvancing17_24-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Thielen, J.; Hess, A.N. (2017). \"Advancing Research Data Management in the Social Sciences: Implementing Instruction for Education Graduate Students Into a Doctoral Curriculum\". <i>Behavioral & Social Sciences Librarian<\/i> <b>36<\/b> (1). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1080%2F01639269.2017.1387739\" data-key=\"30e3aa2fa60b6031738db5428e241407\">10.1080\/01639269.2017.1387739<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Advancing+Research+Data+Management+in+the+Social+Sciences%3A+Implementing+Instruction+for+Education+Graduate+Students+Into+a+Doctoral+Curriculum&rft.jtitle=Behavioral+%26+Social+Sciences+Librarian&rft.aulast=Thielen%2C+J.%3B+Hess%2C+A.N.&rft.au=Thielen%2C+J.%3B+Hess%2C+A.N.&rft.date=2017&rft.volume=36&rft.issue=1&rft_id=info:doi\/10.1080%2F01639269.2017.1387739&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DresselResearch17-25\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DresselResearch17_25-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Dressel, W.F. (2017). \"Research Data Management Instruction for Digital Humanities\". <i>Journal of eScience Librarianship<\/i> <b>6<\/b> (2): 5. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.7191%2Fjeslib.2017.1115\" data-key=\"c1d7853a637f6a6711ff2b9c5a1b2594\">10.7191\/jeslib.2017.1115<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Research+Data+Management+Instruction+for+Digital+Humanities&rft.jtitle=Journal+of+eScience+Librarianship&rft.aulast=Dressel%2C+W.F.&rft.au=Dressel%2C+W.F.&rft.date=2017&rft.volume=6&rft.issue=2&rft.pages=5&rft_id=info:doi\/10.7191%2Fjeslib.2017.1115&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BrulandInterop12-26\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BrulandInterop12_26-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bruland, P.; Breil, B.; Ritz, F.; Duggas, M. (2012). \"Interoperability in clinical research: from metadata registries to semantically annotated CDISC ODM\". <i>Studies in Health Technology and Informatics<\/i> <b>180<\/b>: 564\u20138. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22874254\" data-key=\"e2f2fc16f7149aaab8ea8613e360cb29\">22874254<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Interoperability+in+clinical+research%3A+from+metadata+registries+to+semantically+annotated+CDISC+ODM&rft.jtitle=Studies+in+Health+Technology+and+Informatics&rft.aulast=Bruland%2C+P.%3B+Breil%2C+B.%3B+Ritz%2C+F.%3B+Duggas%2C+M.&rft.au=Bruland%2C+P.%3B+Breil%2C+B.%3B+Ritz%2C+F.%3B+Duggas%2C+M.&rft.date=2012&rft.volume=180&rft.pages=564%E2%80%938&rft_id=info:pmid\/22874254&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GaddaleClinical15-27\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GaddaleClinical15_27-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Gaddale, J.R. (2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4640009\" data-key=\"4b8b79e53d650dc3202a6e4cea8e0b56\">\"Clinical Data Acquisition Standards Harmonization importance and benefits in clinical data management\"<\/a>. <i>Perspectives in Clinical Research<\/i> <b>6<\/b> (4): 179\u201383. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.4103%2F2229-3485.167101\" data-key=\"7fa4b33a4d8ed85044befd2b088d4c3b\">10.4103\/2229-3485.167101<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4640009\/\" data-key=\"e5c5d3bde76dd2e7152a0fea844da20b\">PMC4640009<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26623387\" data-key=\"4392e6e0b74f4198d0c381d283d0076a\">26623387<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4640009\" data-key=\"4b8b79e53d650dc3202a6e4cea8e0b56\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4640009<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Clinical+Data+Acquisition+Standards+Harmonization+importance+and+benefits+in+clinical+data+management&rft.jtitle=Perspectives+in+Clinical+Research&rft.aulast=Gaddale%2C+J.R.&rft.au=Gaddale%2C+J.R.&rft.date=2015&rft.volume=6&rft.issue=4&rft.pages=179%E2%80%9383&rft_id=info:doi\/10.4103%2F2229-3485.167101&rft_id=info:pmc\/PMC4640009&rft_id=info:pmid\/26623387&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4640009&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KrishnankuttyData12-28\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-KrishnankuttyData12_28-0\">28.0<\/a><\/sup> <sup><a href=\"#cite_ref-KrishnankuttyData12_28-1\">28.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Krishnankutty, B.; Bellary, S.; Kumar, N.B. (2012). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3326906\" data-key=\"f6f03e7cf1d24f5d88ec65b208e7a681\">\"Data management in clinical research: An overview\"<\/a>. <i>Indian Journal of Pharmacology<\/i> <b>44<\/b> (2): 168\u201372. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.4103%2F0253-7613.93842\" data-key=\"1d4528b2b5107a5988ab33fc715379c2\">10.4103\/0253-7613.93842<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3326906\/\" data-key=\"9ed0c0941bef1d87ae6aef02b90df106\">PMC3326906<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22529469\" data-key=\"3d2e41dcbf1c884787ce79a5557a071f\">22529469<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3326906\" data-key=\"f6f03e7cf1d24f5d88ec65b208e7a681\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3326906<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+management+in+clinical+research%3A+An+overview&rft.jtitle=Indian+Journal+of+Pharmacology&rft.aulast=Krishnankutty%2C+B.%3B+Bellary%2C+S.%3B+Kumar%2C+N.B.&rft.au=Krishnankutty%2C+B.%3B+Bellary%2C+S.%3B+Kumar%2C+N.B.&rft.date=2012&rft.volume=44&rft.issue=2&rft.pages=168%E2%80%9372&rft_id=info:doi\/10.4103%2F0253-7613.93842&rft_id=info:pmc\/PMC3326906&rft_id=info:pmid\/22529469&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3326906&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LerouxTowards17-29\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LerouxTowards17_29-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Leroux, H.; Metke-Jimenez, A.; Lawley, M.J. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5606031\" data-key=\"b336f2ce41b384a55e9c8622faa04f16\">\"Towards achieving semantic interoperability of clinical study data with FHIR\"<\/a>. <i>Journal of Biomedical Semantics<\/i> <b>8<\/b> (1): 41. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1186%2Fs13326-017-0148-7\" data-key=\"c3e752578427d9d3d487fb2d705c0e07\">10.1186\/s13326-017-0148-7<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5606031\/\" data-key=\"b47166a993add252bd8dba381d4bde50\">PMC5606031<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/28927443\" data-key=\"92f611693117e06f156e1496a7162588\">28927443<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5606031\" data-key=\"b336f2ce41b384a55e9c8622faa04f16\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5606031<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Towards+achieving+semantic+interoperability+of+clinical+study+data+with+FHIR&rft.jtitle=Journal+of+Biomedical+Semantics&rft.aulast=Leroux%2C+H.%3B+Metke-Jimenez%2C+A.%3B+Lawley%2C+M.J.&rft.au=Leroux%2C+H.%3B+Metke-Jimenez%2C+A.%3B+Lawley%2C+M.J.&rft.date=2017&rft.volume=8&rft.issue=1&rft.pages=41&rft_id=info:doi\/10.1186%2Fs13326-017-0148-7&rft_id=info:pmc\/PMC5606031&rft_id=info:pmid\/28927443&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5606031&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ArthoferData17-30\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ArthoferData17_30-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Arthofer, K.; Girardi, D. (2017). \"Data Quality- and Master Data Management - A Hospital Case\". <i>Studies in Health Technology and Informatics<\/i> <b>236<\/b>: 259\u201366. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.3233%2F978-1-61499-759-7-259\" data-key=\"f60548fd70c37115ae101237224ed991\">10.3233\/978-1-61499-759-7-259<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/28508805\" data-key=\"8feb309e0bf3d8f213435fd36dfbf460\">28508805<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+Quality-+and+Master+Data+Management+-+A+Hospital+Case&rft.jtitle=Studies+in+Health+Technology+and+Informatics&rft.aulast=Arthofer%2C+K.%3B+Girardi%2C+D.&rft.au=Arthofer%2C+K.%3B+Girardi%2C+D.&rft.date=2017&rft.volume=236&rft.pages=259%E2%80%9366&rft_id=info:doi\/10.3233%2F978-1-61499-759-7-259&rft_id=info:pmid\/28508805&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CallahanRepo17-31\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CallahanRepo17_31-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Callahan, T.; Barnard, J.; Helmkamp, L. et al. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5982990\" data-key=\"1e9e50e35c523b00c89ddeec3e7c9107\">\"Reporting Data Quality Assessment Results: Identifying Individual and Organizational Barriers and Solutions\"<\/a>. <i>EGEMS<\/i> <b>5<\/b> (1): 16. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.5334%2Fegems.214\" data-key=\"26227e7a200127b484166951b735812f\">10.5334\/egems.214<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5982990\/\" data-key=\"99854d6ef72cabf32d5de36120a1ff27\">PMC5982990<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/29881736\" data-key=\"5a929805b134fb97dc67958ad112f579\">29881736<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5982990\" data-key=\"1e9e50e35c523b00c89ddeec3e7c9107\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5982990<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reporting+Data+Quality+Assessment+Results%3A+Identifying+Individual+and+Organizational+Barriers+and+Solutions&rft.jtitle=EGEMS&rft.aulast=Callahan%2C+T.%3B+Barnard%2C+J.%3B+Helmkamp%2C+L.+et+al.&rft.au=Callahan%2C+T.%3B+Barnard%2C+J.%3B+Helmkamp%2C+L.+et+al.&rft.date=2017&rft.volume=5&rft.issue=1&rft.pages=16&rft_id=info:doi\/10.5334%2Fegems.214&rft_id=info:pmc\/PMC5982990&rft_id=info:pmid\/29881736&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5982990&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HoustonExplor18-32\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HoustonExplor18_32-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Houston, L.; Probst, Y.; Yu, P.; Martin, A. (2018). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5801732\" data-key=\"22f00acffa9ca5be0183dbc8e45c8e19\">\"Exploring Data Quality Management within Clinical Trials\"<\/a>. <i>Applied Clinical Informatics<\/i> <b>9<\/b> (1): 72\u201381. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1055%2Fs-0037-1621702\" data-key=\"507d394998f0f94cd4fca92aef57c01e\">10.1055\/s-0037-1621702<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5801732\/\" data-key=\"31fb74efd6950cc765c88cbd77e3f4f1\">PMC5801732<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/29388180\" data-key=\"1857cda91c86e36e94ff9d2cddec9425\">29388180<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5801732\" data-key=\"22f00acffa9ca5be0183dbc8e45c8e19\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5801732<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Exploring+Data+Quality+Management+within+Clinical+Trials&rft.jtitle=Applied+Clinical+Informatics&rft.aulast=Houston%2C+L.%3B+Probst%2C+Y.%3B+Yu%2C+P.%3B+Martin%2C+A.&rft.au=Houston%2C+L.%3B+Probst%2C+Y.%3B+Yu%2C+P.%3B+Martin%2C+A.&rft.date=2018&rft.volume=9&rft.issue=1&rft.pages=72%E2%80%9381&rft_id=info:doi\/10.1055%2Fs-0037-1621702&rft_id=info:pmc\/PMC5801732&rft_id=info:pmid\/29388180&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5801732&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TeunenbroekTowards17-33\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-TeunenbroekTowards17_33-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Teunenbroek, T.V.; Baker, J.; Dijkzeul, A. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5735801\" data-key=\"41967d4fd305589c6fc17055dafaefb7\">\"Towards a more effective and efficient governance and regulation of nanomaterials\"<\/a>. <i>Particle and Fibre Toxicology<\/i> <b>14<\/b> (1): 54. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1186%2Fs12989-017-0235-z\" data-key=\"532919582d1220ac5011c418e5b67d23\">10.1186\/s12989-017-0235-z<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5735801\/\" data-key=\"c39f44c07d9d88eb43de1861b406f97a\">PMC5735801<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/29258600\" data-key=\"2f058aa104d75cbc0bf4776cb621992c\">29258600<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5735801\" data-key=\"41967d4fd305589c6fc17055dafaefb7\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5735801<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Towards+a+more+effective+and+efficient+governance+and+regulation+of+nanomaterials&rft.jtitle=Particle+and+Fibre+Toxicology&rft.aulast=Teunenbroek%2C+T.V.%3B+Baker%2C+J.%3B+Dijkzeul%2C+A.&rft.au=Teunenbroek%2C+T.V.%3B+Baker%2C+J.%3B+Dijkzeul%2C+A.&rft.date=2017&rft.volume=14&rft.issue=1&rft.pages=54&rft_id=info:doi\/10.1186%2Fs12989-017-0235-z&rft_id=info:pmc\/PMC5735801&rft_id=info:pmid\/29258600&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5735801&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-OhmannSharing17-34\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-OhmannSharing17_34-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Ohmann, C.; Banzi, R.; Canham, S. et al. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5736032\" data-key=\"d83bee5ebe09d43024072cdaa93df3fd\">\"Sharing and reuse of individual participant data from clinical trials: Principles and recommendations\"<\/a>. <i>BMJ Open<\/i> <b>7<\/b> (12): e018647. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1136%2Fbmjopen-2017-018647\" data-key=\"4946b47e8e9b5646519db77a3a59cd89\">10.1136\/bmjopen-2017-018647<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5736032\/\" data-key=\"1ad902883ee93ddb2c5896a3cf8db0e5\">PMC5736032<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/29247106\" data-key=\"b12a5595a735195a22f7f14a0a42aaa8\">29247106<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5736032\" data-key=\"d83bee5ebe09d43024072cdaa93df3fd\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5736032<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sharing+and+reuse+of+individual+participant+data+from+clinical+trials%3A+Principles+and+recommendations&rft.jtitle=BMJ+Open&rft.aulast=Ohmann%2C+C.%3B+Banzi%2C+R.%3B+Canham%2C+S.+et+al.&rft.au=Ohmann%2C+C.%3B+Banzi%2C+R.%3B+Canham%2C+S.+et+al.&rft.date=2017&rft.volume=7&rft.issue=12&rft.pages=e018647&rft_id=info:doi\/10.1136%2Fbmjopen-2017-018647&rft_id=info:pmc\/PMC5736032&rft_id=info:pmid\/29247106&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5736032&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PolancichBuild18-35\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-PolancichBuild18_35-0\">35.0<\/a><\/sup> <sup><a href=\"#cite_ref-PolancichBuild18_35-1\">35.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Polancich, S.; James, D.H.; Miltner, R.S. et al. (2018). \"Building DNP Essential Skills in Clinical Data Management and Analysis\". <i>Nurse Educator<\/i> <b>43<\/b> (1): 37\u201341. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1097%2FNNE.0000000000000411\" data-key=\"e878f3be6aae79aa674a7c09d1caecde\">10.1097\/NNE.0000000000000411<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/28665824\" data-key=\"a2d1e62ded1bba10248eecb67908af14\">28665824<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Building+DNP+Essential+Skills+in+Clinical+Data+Management+and+Analysis&rft.jtitle=Nurse+Educator&rft.aulast=Polancich%2C+S.%3B+James%2C+D.H.%3B+Miltner%2C+R.S.+et+al.&rft.au=Polancich%2C+S.%3B+James%2C+D.H.%3B+Miltner%2C+R.S.+et+al.&rft.date=2018&rft.volume=43&rft.issue=1&rft.pages=37%E2%80%9341&rft_id=info:doi\/10.1097%2FNNE.0000000000000411&rft_id=info:pmid\/28665824&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SirgoValid18-36\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SirgoValid18_36-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sirgo, G.; Esteban, F.; G\u00f3mez, J. et al. (2018). \"Validation of the ICU-DaMa tool for automatically extracting variables for minimum dataset and quality indicators: The importance of data quality assessment\". <i>International Journal of Medical Informatics<\/i> <b>112<\/b>: 166\u201372. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.ijmedinf.2018.02.007\" data-key=\"f792f9a242cf758c97b882cb48f56da4\">10.1016\/j.ijmedinf.2018.02.007<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/29500016\" data-key=\"8bf39ece1315889416bdf0d2e07fb650\">29500016<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Validation+of+the+ICU-DaMa+tool+for+automatically+extracting+variables+for+minimum+dataset+and+quality+indicators%3A+The+importance+of+data+quality+assessment&rft.jtitle=International+Journal+of+Medical+Informatics&rft.aulast=Sirgo%2C+G.%3B+Esteban%2C+F.%3B+G%C3%B3mez%2C+J.+et+al.&rft.au=Sirgo%2C+G.%3B+Esteban%2C+F.%3B+G%C3%B3mez%2C+J.+et+al.&rft.date=2018&rft.volume=112&rft.pages=166%E2%80%9372&rft_id=info:doi\/10.1016%2Fj.ijmedinf.2018.02.007&rft_id=info:pmid\/29500016&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SCDM_GCDMP-37\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-SCDM_GCDMP_37-0\">37.0<\/a><\/sup> <sup><a href=\"#cite_ref-SCDM_GCDMP_37-1\">37.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.scdm.org\/publications\/gcdmp\/\" data-key=\"2387d37b806a1106d35cb0d15baef1be\">\"GCDMP\"<\/a>. Society for Clinical Data Management. 2017<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.scdm.org\/publications\/gcdmp\/\" data-key=\"2387d37b806a1106d35cb0d15baef1be\">https:\/\/www.scdm.org\/publications\/gcdmp\/<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=GCDMP&rft.atitle=&rft.date=2017&rft.pub=Society+for+Clinical+Data+Management&rft_id=https%3A%2F%2Fwww.scdm.org%2Fpublications%2Fgcdmp%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SylviaAnApproach14-38\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SylviaAnApproach14_38-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sylvia, M.; Terhaar, M. (2014). \"An approach to clinical data management for the doctor of nursing practice curriculum\". <i>Journal of Professional Nursing<\/i> <b>31<\/b> (1): 56-62. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.profnurs.2013.04.002\" data-key=\"266fe729fd673a9413dcb176043b28b3\">10.1016\/j.profnurs.2013.04.002<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24503316\" data-key=\"da3f6abc4a9f0f50a567e707773ddd4b\">24503316<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+approach+to+clinical+data+management+for+the+doctor+of+nursing+practice+curriculum&rft.jtitle=Journal+of+Professional+Nursing&rft.aulast=Sylvia%2C+M.%3B+Terhaar%2C+M.&rft.au=Sylvia%2C+M.%3B+Terhaar%2C+M.&rft.date=2014&rft.volume=31&rft.issue=1&rft.pages=56-62&rft_id=info:doi\/10.1016%2Fj.profnurs.2013.04.002&rft_id=info:pmid\/24503316&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ReadImprov17-39\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ReadImprov17_39-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Read, K.B.; LaPolla, F.W.; Tolea, M.I. et al. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5370608\" data-key=\"9b84da132f371498ef314f4a1b4805eb\">\"Improving data collection, documentation, and workflow in a dementia screening study\"<\/a>. <i>JMLA<\/i> <b>105<\/b> (2): 160\u201366. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.5195%2Fjmla.2017.221\" data-key=\"2be7c91c3bb740ac5aadd19b9e195f9a\">10.5195\/jmla.2017.221<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5370608\/\" data-key=\"7f0bec9255005fcb92d97712251f2a1a\">PMC5370608<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/28377680\" data-key=\"992677a7f1e78a6e35eda803280f5039\">28377680<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5370608\" data-key=\"9b84da132f371498ef314f4a1b4805eb\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5370608<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+data+collection%2C+documentation%2C+and+workflow+in+a+dementia+screening+study&rft.jtitle=JMLA&rft.aulast=Read%2C+K.B.%3B+LaPolla%2C+F.W.%3B+Tolea%2C+M.I.+et+al.&rft.au=Read%2C+K.B.%3B+LaPolla%2C+F.W.%3B+Tolea%2C+M.I.+et+al.&rft.date=2017&rft.volume=105&rft.issue=2&rft.pages=160%E2%80%9366&rft_id=info:doi\/10.5195%2Fjmla.2017.221&rft_id=info:pmc\/PMC5370608&rft_id=info:pmid\/28377680&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5370608&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ReadANewHat18-40\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ReadANewHat18_40-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Read, K.; LaPolla, F.W.Z. (2018). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5764577\" data-key=\"8ebf804dd867741265e06a939a803933\">\"A new hat for librarians: Providing REDCap support to establish the library as a central data hub\"<\/a>. <i>JMLA<\/i> <b>106<\/b> (1): 120\u201326. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.5195%2Fjmla.2018.327\" data-key=\"7f670a1bffb7ca411dcc6c23552559eb\">10.5195\/jmla.2018.327<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5764577\/\" data-key=\"56f05e9a33cb9932464dcb8e139d828b\">PMC5764577<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/29339942\" data-key=\"ca5afe8b17dc3225a4d7ce522dc7b2ab\">29339942<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5764577\" data-key=\"8ebf804dd867741265e06a939a803933\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5764577<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+new+hat+for+librarians%3A+Providing+REDCap+support+to+establish+the+library+as+a+central+data+hub&rft.jtitle=JMLA&rft.aulast=Read%2C+K.%3B+LaPolla%2C+F.W.Z.&rft.au=Read%2C+K.%3B+LaPolla%2C+F.W.Z.&rft.date=2018&rft.volume=106&rft.issue=1&rft.pages=120%E2%80%9326&rft_id=info:doi\/10.5195%2Fjmla.2018.327&rft_id=info:pmc\/PMC5764577&rft_id=info:pmid\/29339942&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5764577&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DudaData17-41\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DudaData17_41-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Duda, S.; Harris, P. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.coursera.org\/learn\/clinical-data-management\" data-key=\"d183b7e1a72944dffdd95c52a8f96314\">\"Data Management for Clinical Research\"<\/a>. Coursera, Inc<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.coursera.org\/learn\/clinical-data-management\" data-key=\"d183b7e1a72944dffdd95c52a8f96314\">https:\/\/www.coursera.org\/learn\/clinical-data-management<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Data+Management+for+Clinical+Research&rft.atitle=&rft.aulast=Duda%2C+S.%3B+Harris%2C+P.&rft.au=Duda%2C+S.%3B+Harris%2C+P.&rft.date=2017&rft.pub=Coursera%2C+Inc&rft_id=https%3A%2F%2Fwww.coursera.org%2Flearn%2Fclinical-data-management&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WaldenDevelop17-42\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WaldenDevelop17_42-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Walden, A. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/portal.scdm.org\/node\/1006\" data-key=\"e9db10f9290240fbbfe7080b1d4864ec\">\"Developing Data Management Plans\"<\/a>. Society for Clinical Data Management<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/portal.scdm.org\/node\/1006\" data-key=\"e9db10f9290240fbbfe7080b1d4864ec\">http:\/\/portal.scdm.org\/node\/1006<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Developing+Data+Management+Plans&rft.atitle=&rft.aulast=Walden%2C+A.&rft.au=Walden%2C+A.&rft.date=2017&rft.pub=Society+for+Clinical+Data+Management&rft_id=http%3A%2F%2Fportal.scdm.org%2Fnode%2F1006&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SurkisData17-43\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SurkisData17_43-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Surkis, A.; LaPolla, F.W.; Contaxis, N.; Read, K.B. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5370612\" data-key=\"913ae3268bf7c0dc4c0dc706f3816fb7\">\"Data Day to Day: building a community of expertise to address data skills gaps in an academic medical center\"<\/a>. <i>JMLA<\/i> <b>105<\/b> (2): 185\u201391. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.5195%2Fjmla.2017.35\" data-key=\"763cbe58e46d38d966f955999461ffe1\">10.5195\/jmla.2017.35<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5370612\/\" data-key=\"93e1b9cf1608821ecc96488f819398f9\">PMC5370612<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/28377684\" data-key=\"0f11618a891ea2421040df0f1c158aad\">28377684<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5370612\" data-key=\"913ae3268bf7c0dc4c0dc706f3816fb7\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5370612<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+Day+to+Day%3A+building+a+community+of+expertise+to+address+data+skills+gaps+in+an+academic+medical+center&rft.jtitle=JMLA&rft.aulast=Surkis%2C+A.%3B+LaPolla%2C+F.W.%3B+Contaxis%2C+N.%3B+Read%2C+K.B.&rft.au=Surkis%2C+A.%3B+LaPolla%2C+F.W.%3B+Contaxis%2C+N.%3B+Read%2C+K.B.&rft.date=2017&rft.volume=105&rft.issue=2&rft.pages=185%E2%80%9391&rft_id=info:doi\/10.5195%2Fjmla.2017.35&rft_id=info:pmc\/PMC5370612&rft_id=info:pmid\/28377684&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5370612&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-StuckeyTheSecond15-44\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-StuckeyTheSecond15_44-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Stuckey, H. (2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/go.galegroup.com\/ps\/anonymous?id=GALE%7CA383423301\" data-key=\"eb1c5a1d1f36f39929a7c53cd63f7e3b\">\"The second step in data analysis: Coding qualitative research data\"<\/a>. <i>Journal of Social Health and Diabetes<\/i> <b>3<\/b> (1): 7<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/go.galegroup.com\/ps\/anonymous?id=GALE%7CA383423301\" data-key=\"eb1c5a1d1f36f39929a7c53cd63f7e3b\">http:\/\/go.galegroup.com\/ps\/anonymous?id=GALE%7CA383423301<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+second+step+in+data+analysis%3A+Coding+qualitative+research+data&rft.jtitle=Journal+of+Social+Health+and+Diabetes&rft.aulast=Stuckey%2C+H.&rft.au=Stuckey%2C+H.&rft.date=2015&rft.volume=3&rft.issue=1&rft.pages=7&rft_id=http%3A%2F%2Fgo.galegroup.com%2Fps%2Fanonymous%3Fid%3DGALE%257CA383423301&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BardynHealth18-45\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BardynHealth18_45-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bardyn, T.P.; Patridge, E.F.; Moore, M.T.; Koh, J.J. (2018). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC6124496\" data-key=\"fe5c20b91850e64d34afd36f987a7f1a\">\"Health Sciences Libraries Advancing Collaborative Clinical Research Data Management in Universities\"<\/a>. <i>Journal of eScience Librarianship<\/i> <b>7<\/b> (2): e1130. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.7191%2Fjeslib.2018.1130\" data-key=\"991ce0ddde63f377b1394d268f82e6c0\">10.7191\/jeslib.2018.1130<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC6124496\/\" data-key=\"d3df240c9b368e5255c351c98b0278a6\">PMC6124496<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/30197832\" data-key=\"f2300332de793249552b689417581056\">30197832<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC6124496\" data-key=\"fe5c20b91850e64d34afd36f987a7f1a\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC6124496<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Health+Sciences+Libraries+Advancing+Collaborative+Clinical+Research+Data+Management+in+Universities&rft.jtitle=Journal+of+eScience+Librarianship&rft.aulast=Bardyn%2C+T.P.%3B+Patridge%2C+E.F.%3B+Moore%2C+M.T.%3B+Koh%2C+J.J.&rft.au=Bardyn%2C+T.P.%3B+Patridge%2C+E.F.%3B+Moore%2C+M.T.%3B+Koh%2C+J.J.&rft.date=2018&rft.volume=7&rft.issue=2&rft.pages=e1130&rft_id=info:doi\/10.7191%2Fjeslib.2018.1130&rft_id=info:pmc\/PMC6124496&rft_id=info:pmid\/30197832&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC6124496&rfr_id=info:sid\/en.wikipedia.org:Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185650\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 1.077 seconds\nReal time usage: 1.105 seconds\nPreprocessor visited node count: 35295\/1000000\nPreprocessor generated node count: 37221\/1000000\nPost\u2010expand include size: 285395\/2097152 bytes\nTemplate argument size: 89749\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 1055.498 1 - -total\n 88.64% 935.628 1 - Template:Reflist\n 79.01% 833.946 45 - Template:Citation\/core\n 78.36% 827.110 42 - Template:Cite_journal\n 9.43% 99.579 75 - Template:Citation\/identifier\n 6.66% 70.320 1 - Template:Infobox_journal_article\n 6.38% 67.359 1 - Template:Infobox\n 4.49% 47.390 3 - Template:Cite_web\n 4.14% 43.680 45 - Template:Citation\/make_link\n 3.70% 39.048 80 - Template:Infobox\/row\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10862-0!*!0!!en!5!* and timestamp 20190401185649 and revision id 34843\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center\">https:\/\/www.limswiki.org\/index.php\/Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","cb9038099fb8453d3ea802865335a88b_images":["https:\/\/www.limswiki.org\/images\/8\/8f\/Fig1_Read_JMedLibAssoc2019_107-1.gif","https:\/\/www.limswiki.org\/images\/c\/ca\/Fig2_Read_JMedLibAssoc2019_107-1.gif","https:\/\/www.limswiki.org\/images\/d\/d0\/Fig3_Read_JMedLibAssoc2019_107-1.gif"],"cb9038099fb8453d3ea802865335a88b_timestamp":1554145009,"ab125d6daef2f763e588fcd5432c1b66_type":"article","ab125d6daef2f763e588fcd5432c1b66_title":"Building a newborn screening information management system from theory to practice (Pluscauskas et al. 2019)","ab125d6daef2f763e588fcd5432c1b66_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice","ab125d6daef2f763e588fcd5432c1b66_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Building a newborn screening information management system from theory to practice\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nBuilding a newborn screening information management system from theory to practiceJournal\n \nInternational Journal of Neonatal ScreeningAuthor(s)\n \nPluscauskas, Michael; Henderson, Matthew; Milburn, Jennifer; Chakraborty, PraneshAuthor affiliation(s)\n \nChildren\u2019s Hospital of Eastern Ontario, University of OttawaPrimary contact\n \nEmail: mplus at cheo dot on dot caYear published\n \n2019Volume and issue\n \n5 (1)Page(s)\n \n9DOI\n \n10.3390\/ijns5010009ISSN\n \n2409-515XDistribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttps:\/\/www.mdpi.com\/2409-515X\/5\/1\/9\/htmDownload\n \nhttps:\/\/www.mdpi.com\/2409-515X\/5\/1\/9\/pdf (PDF)\n\nContents\n\n1 Abstract \n2 Introduction \n3 Theory \n\n3.1 Outgrowing NSO's current SIMS \n3.2 Designing an \"ideal\" SIMS for NSO \n\n\n4 Practice \n\n4.1 Technical considerations \n4.2 User considerations \n4.3 Procurement considerations \n4.4 Implementation considerations \n4.5 Organizational and jurisdictional considerations \n4.6 Project benefits and impacts \n\n\n5 Conclusions \n6 Supplementary materials \n7 Acknowledgements \n\n7.1 Author contributions \n7.2 Funding \n7.3 Conflicts of interest \n\n\n8 References \n9 Notes \n\n\n\nAbstract \nInformation management systems are the central process management and communication hub for many newborn screening programs. In late 2014, Newborn Screening Ontario (NSO) undertook an end-to-end assessment of its information management needs, which resulted in a project to develop a flexible information systems (IS) ecosystem and related process changes. This enabled NSO to better manage its current and future workflow and communication needs. An idealized vision of a screening information management system (SIMS) was developed that was refined into enterprise and functional architectures. This was followed by the development of technical specifications, user requirements, and procurement. In undertaking a holistic full product lifecycle redesign approach, a number of change management challenges were faced by NSO across the entire program. Strong leadership support and full program engagement were key for overall project success. It is anticipated that improvements in program flexibility and the ability to innovate will outweigh the efforts and costs.\nKeywords: newborn screening, neonatal screening, laboratory information management system, laboratory information system, LIMS, LIS, screening information management\n\nIntroduction \nDating back to the early days of computers, a variety of software programs have been utilized to automate numerous laboratory workflows, processes, and related activities. In newborn screening labs, a screening information management system (SIMS)\u2014a set of integrated software components which may or may not be from a single vendor that can be used to automate screening workflows\u2014has become a useful tool for automation. In addition, a SIMS can be used to manage other aspects of overall laboratory management and programmatic aspects, such as case reporting and follow-up.\nPopulation-based newborn screening began appearing in many North American and European countries over 50 years ago. Due to the low number of tests per patient, many programs were initially able to handle their testing and reporting functions using manual workflows, paper-based lab logs, and patient reports. Labs initially employed simple computer functions such as using word processing to create mailing lists\/labels and basic reports. Eventually, usage evolved into managing basic lab workflows and quality metrics using standard software such as spreadsheets and databases.\nIn the late 1990s and early 2000s, the shift to complex equipment such as tandem mass spectrometers (MS\/MS), that could test for dozens of target diseases and related follow-ups, necessitated the development of a more complex information technology infrastructure to manage newborn screening programs. This shift required the development of new, often integrated, software programs to manage the new challenges that were created by these changes. Also, a number of new target disorders required very quick turnaround times (TATs) in order to locate infants who were at risk of early decompensation that could cause severe morbidity or even death. This meant that neonatal screening labs required information systems that could process large batches of results quickly in order to ensure that infants who tested positive for these disorders could be identified in a timely manner.\nThe current shift into new paradigms for newborn screening, including molecular (DNA-based) screening, is pushing the limits of the original software architectural approaches of these integrated packages. The ability of decision support tools such as Collaborative Laboratory Integrated Reports (CLIR)[1] to be linked, potentially in real time via application program interfaces (APIs)[2], also stretches the limits of current SIMS functionality. In addition, the potential of new point-of-care technologies and the possibility of screening for certain time-critical disorders prenatally have stretched these new paradigms even further. As these needs evolve, a more comprehensive information ecosystem approach to newborn screening SIMS architectures will be required to support the expanding needs of newborn screening.\nIn their paper \u201cThe Ideal Laboratory Information System,\u201d Sepulveda and Young[3] described a set of processes and technology modules that they considered key for the development of an \u201cideal\u201d comprehensive clinical laboratory information system (LIS). An approach to developing an \u201cideal\u201d SIMS in a newborn screening context is described in this paper. The experiences of Newborn Screening Ontario (NSO) are used to illustrate the various technical and administrative approaches that are involved in undertaking such a project. The potential benefits, impacts, and risks of taking this approach are discussed and key lessons are highlighted.\n\nTheory \nOutgrowing NSO's current SIMS \nNSO is located in the Canadian province of Ontario. Canada has a publicly funded health care system that is managed by its thirteen provinces and territories. Ontario is Canada\u2019s largest province, with a population of approximately 14 million people and an annual public healthcare budget of over $CDN 50 billion in 2017. NSO is a fully integrated program that manages all aspects of newborn screening and follow-up activities for children born in the province of Ontario (approximately 140,000 births per year). With the expansion of provincial newborn screening in 2006, the lab and program management was moved from the provincial public health branch to the Children's Hospital of Eastern Ontario (CHEO), a leading academic pediatric research hospital that is located in the city of Ottawa. NSO receives ongoing annual base funding from the province as well as project-based funding to implement new programs as needed.\nNSO performs blood-spot-based screening for metabolic and endocrine disorders as well as for hemoglobinopathies, cystic fibrosis (CF), and severe combined immune deficiencies (SCID). Multiple tier testing strategies are also used for a number of these diseases. In addition, NSO offers diagnostic and monitoring testing for many of the aforementioned disorders. Further, NSO coordinates and administers the provincial critical congenital heart disease (CCHD) point-of-care screening program.[4] NSO is also working with Ontario\u2019s Infant Hearing Program (IHP) to phase in blood spot testing for hearing loss risk factors including cytomegalovirus (CMV) DNA and certain key genetic risk factors.[5] Figure 1 shows a timeline of newborn screening in Ontario from the beginning of province-wide screening for phenylketonuria (PKU) to the near future.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Newborn Screening Ontario timeline\n\n\n\nLike many other newborn screening programs, NSO coordinates and manages programmatic elements in addition to testing. These include pre-screening education, distribution of collection cards and communications, case management for special screening circumstances such as transfused or premature babies, short term follow-up of screen-positive babies, and analysis\/reporting of key performance indicators. Finally, NSO has an academic mandate and coordinates regular meetings of treatment center physicians and health care providers, performs research, and participates in research collaborations, including work that is aimed at understanding the long-term outcomes of screening. It therefore requires an information infrastructure to support these diverse yet closely interrelated set of screening activities.\nNewborn screening is a rapidly evolving field. An illustration of the rate of change in the last 15 years can be found in the adoption of the Recommended Universal Screening Panel (RUSP)[6] in the United States. In 2002, the American College of Medical Genetics reviewed 81 conditions and placed 29 of them in a core screening panel, which made up the original RUSP. At that time, the majority of U.S. states screened for only six disorders. Today, all states screen for at least 29 conditions.[7] The RUSP was, and continues to be, influential internationally.\nA broader screening panel resulted in more test results per child screened, the need for novel analytical techniques, and the need for second\/multiple tier testing to improve specificity. Like many other newborn screening labs, the NSO laboratory uses immunoassays, enzyme assays, chromatography, liquid chromatography\u2013mass spectrometry (LC-MS\/MS), flow injection analysis tandem mass spectrometry (FIA-MS\/MS), quantitative real-time PCR (qPCR), and primer extension and next generation sequencing techniques for screening. Despite the breadth of techniques used, it is sometimes necessary to reflex first-tier screen-positive samples to second-tier testing methods in order to achieve an acceptable positive predictive value for a positive screening result. An example of tiered testing strategies and complex screening logic is congenital adrenal hyperplasia (CAH). First tier CAH screening relies on the measurement of 17-hydroxyprogesterone. The results are interpreted based on gestational age and birth-weight-specific screening thresholds. A panel of steroids is measured by LC-MS\/MS in first-tier positive samples, and another set of screening logic is applied to this panel of results.\nAs the complexity of testing in newborn screening has increased, so has the capacity and expertise in the laboratory. As a result, newborn screening labs are well equipped to provide follow-up diagnostic and monitoring tests for screen-positive newborns and those affected by target diseases. In the case of NSO, we offer monitoring for patients with phenylketonuria, tyrosinemia, and glutaric aciduria type 1, along with molecular DNA testing for all disease targets of newborn screening. Finally, the pace of change in newborn screening requires that labs continue to develop and implement new screening, monitoring, and diagnostic approaches on an ongoing basis. NSO engages in small- to large-scale research studies that require data to be rapidly and readily available, including access to program, pre-analytical, testing, and follow-up data.\nA single LIS solution for overall lab and program management was procured by NSO when it was established in 2006. This system was critical to the initial launch and subsequent development of the program, as it provided \u201ctightly-coupled\u201d support for the newborn screening laboratory information flows between equipment, quality control (QC), and case management. This approach worked well for ensuring the efficient management of lab workflows\u2019 timely movement of information to follow-up teams, especially in relation to the standard blood spot testing for metabolic disorders. As the complexity of NSO\u2019s mandate increased, it was determined that a broader approach was required to allow for the development of new areas. These areas included the expansion of molecular screening, diagnostic testing, complex screening algorithms, point-of-care screening, short- and long-term follow-up, and sample lifecycle management (see below). As noted above, it was determined that a broader newborn screening information ecosystem needed to be developed.\n\nDesigning an \"ideal\" SIMS for NSO \nIn their book, McCudden and Henderson noted \u201cthe present and future of health care relies on electronic information.\u201d[8] The complexities that are required to capture complex demographics, manage high sample volumes in a timely manner, and integrate multiple test platforms have strained many programs\u2019 abilities to move forward with these innovations. Newborn screening programs require a SIMS that is capable of managing all data coming into and going out of the program. There must also be facilities to customize the SIMS to the needs of the program. The functions of an ideal SIMS for newborn screening programs are outlined in Figure 2.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. Concepts and functionality in a SIMS (screening information management system). Each secondary node represents a concept that is addressed by a SIMS. Tertiary nodes show specific examples of SIMS functionally that fall within the linked concept. Colors are used to emphasize the link between concepts and functions.\n\n\n\nScreening for CF provides an illustrative example of the requirements of a SIMS; colors in the text are a reference to the concepts\/functions that are illustrated in Figure 2. Generally, the first point at which information for a sample enters the SIMS is when samples arrive in the lab and demographics are entered or retrieved from hospital or jurisdictional registries (blue). Samples are punched for analysis (blue) in parallel with demographic entry to reduce turnaround time. The status of all samples from time of receipt in the lab to generation of positive or negative mailers is tracked using real-time dashboards that are located in the lab and administrative areas (purple). In many screening labs, an IRT-DNA CF screening strategy is used, with immunoreactivity trypsinogen (IRT) being used as a first-tier biomarker (blue). Testing is performed in accordance with relevant procedures (red). After review and acceptance of quality control results (red), a sample with elevated IRT will generate a request (orange) for CFTR genotyping (cyan). If screened positive for CF (orange), a risk letter will be generated (green). The risk letter incorporates both the IRT (blue) and CF genotype (cyan) results. Many jurisdictions have also implemented or are considering implementing third-tier sequencing that will further add to the complexity of this test. The program will refer the screen-positive infant to the appropriate care provider and this referral will be documented (green). The NBS program will be notified when the newborn is retrieved and this will be documented (green). Diagnostic testing will be performed and the NBS program will be informed of the final diagnosis (green). Periodically, the program will evaluate CF screening performance. This evaluation requires information from all aspects of the screening program (purple) such as, however not limited to, test turnaround time (blue), time to referral and retrieval (green), final or working diagnosis, and positive and negative predictive value (green). Babies and families with CF, CF variants, or false positive results may be invited to participate in further program evaluation and research, and the SIMS must facilitate this (green).\nWhile not exhaustive, the CF screen-positive scenario provides an overview of the requirements of a SIMS for NBS. Not described above is the need to make changes to the system as disorders, methods, tests, and screening logic changes. Once changes are made, an approach to testing the system to ensure that it functions as intended will be required.\nIn order to realize this vision, NSO launched a large-scale project in late 2014 to develop a comprehensive service oriented architecture (SOA)[9] solution type in which information can flow seamlessly through the key areas of the program. The solution consisted of seven key functional modules (services) that, when combined, can achieve this vision, including:\n\n1. Patient record management: This module is expected to handle most of the pre-analytical aspects of NSO. The module will be used to receive samples in the laboratory and to enter demographic data for these samples. It consists of a number of processes including but not limited to sample reception, demographic\/clinical indication data entry and validation, test ordering (batch and custom), linking multiple samples to a patient; triggering workflows for samples that are unsatisfactory for testing, and ensuring electronic transmission receipt of key information (e.g., demographics in, results out) via HL7 and related protocols.\n2. Laboratory information system\/quality control: These modules will be expected to handle most of the analytical and QC data, including workflows and dataflows within the NSO laboratory environment.\n3. Clinical\/medical review: The review and releasing of results is a critical component of laboratory information workflow between the technologists and medical\/scientific staff. This module is designed to apply pre-programmed logic to distill critical lab information in order to produce actionable results in an efficient manner. The clinical\/medical review module will consist of a configurable rules-based system that integrates information from multiple data sources, applies disorder logic, and streamlines workflows to drive decisions. It will also include a rules-based expert system, including a web-based graphical user interface (GUI) to support the review and reporting functions, an administrative interface for creating and managing rules, and a user-configurable knowledge base that is \u201chuman readable.\u201d In its first iteration, this functionality will be embedded in the core SIMS. The possibility of using an independent rules engine service that can be made available to multiple systems in the longer term will also be explored.\n4. Case management: This module will be expected to handle most of the post-analytical aspects of NSO, including follow-up with submitters and treatment centers.\n5. Sample lifecycle management: The need to track blood collection cards throughout their lifecycle, from distribution through transport to storage, is a key and sometimes overlooked process within newborn screening program management. A local vendor has worked with NSO to develop a system that helps manage sample card inventory management, track the transport of samples and card usage, and monitor expiry dates for filter paper; the vendor is also working towards tracking in-lab samples, off-site storage, and destruction.\n6. Reporting and analytics: In order to support the data intensive nature of newborn screening results, NSO has developed a data warehouse to enable user-controlled data access and to manage automated and ad-hoc reporting on all types of NSO data.\n7. Decision support: Third-party products can be linked in to provide decision support to key decision makers to assist in timely, effective decision making.\nThese functional modules were combined to produce an enterprise architecture to help guide the project. Figure 3a shows the \u201ctightly coupled\u201d enterprise architecture of the original NSO information system infrastructure, and Figure 3b shows the desired final state.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 3. (a) NSO Enterprise Architecture\u2014Initial State; (b) NSO (Newborn Screening Ontario) Enterprise Architecture\u2014Desired Final State\n\n\n\nThe NSO SIMS components are shown within the large blue box including instruments\/interfaces, LISs, and related systems such as data warehouses. The ovals outside the large box show NSO\u2019s data consumers (entities), which includes key groups such as hospitals who submit samples (submitters) and treatment centers who retrieve screen-positive and parents. They also show data holding entities such as health information exchanges (HIEs) and data registries. Dataflows are shown via directional arrows.\nIn Figure 3a, the blue color represents the fact that all functions were handled by a single \u201ctightly coupled\u201d LIS. The new enterprise architecture is more \u201cservices\u201d focused with different systems playing different roles. The colors in Figure 3b call out what components are responsible for each function in the new NSO data architecture, where green is the main LIS (OMNILab), gray represents a hybrid of the main LIS and other in-house systems, and orange represents other off-the-shelf components.\nOnce the enterprise view of the desired enterprise architecture was developed, the team moved on to ensure that key functional considerations for the system were taken into account by developing a functional architecture. An analogy of the flow from enterprise architecture to a functional one is similar to the link between a building\u2019s architecture and building plans. The enterprise architecture provides a high-level, idealized view of what can be accomplished across the organization, whereas the functional views (with related user requirements) provide detail about what needs to be built. Figure 4 shows the key conceptual information flows that were developed by the team to guide the project. The modules and phases of implementation are called out on the right side of the diagram.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 4. NSO Functional Architecture. The numbers (1\u20135) provide a reference point to follow the NSO dataflows from entry of the samples into the lab (1) through the sending of reports to submitters (5).\n\n\n\nPractice \nTechnical considerations \nOnce the enterprise and functional architectures were solidified, NSO faced a number of choices. An early choice was whether to take a buy, build, or hybrid approach to development. When developing any large scale IT infrastructure project, especially one like NSO where there is not a \u201cone size fits all\u201d model available, a decision needs to be made to buy a commercial off the shelf (COTS) solution or to build the software code from scratch. Each of these approaches has pros and cons as discussed below.\nCOTS systems are supported by a vendor. This can be critically important, especially in areas like newborn screening where access to IT human resources can be severely limited. However, in general, COTS can be less flexible in their ability to meet less-common customer requirements. This can lead to the need to either adjust lab workflows to \u201cconform\u201d to the software or to create manual workarounds that are tracked outside of the system. In addition, requests for system enhancements and\/or system fixes generally wait in a queue which, depending on the complexity and urgency of the requirement, can create frustration and potential risks.\nCustomized software can allow labs to focus on organizational needs and unique user requirements. However, building and supporting software can be very expensive, and recruiting and managing qualified software human resources (e.g., software engineers, programmers, QA) is not an area of expertise that is associated with laboratory program management. Therefore, undertaking this type of approach exclusively is usually not feasible for most lab programs.\nGiven the complexity of the architecture involved, NSO chose a hybrid approach using COTS systems wherever possible and then adding in custom solutions to fill in any gaps. Once this choice was made, NSO needed to determine whether to use a single-source system or to integrate more than one system to achieve the end goal for its COTS needs. This is often referred to as the choice between integrated single-source (one vendor) versus best-of-breed[10] solutions. Best-of-breed entails integrating two or more independent solutions that can be from multiple vendors and\/or can also be custom built. The main risk of best-of-breed usually revolves around the integration points. The risk of integration usually resides with the client and can lead to complexity and possibly even conflict if multiple vendors are involved. In addition, a best-of-breed solution will often require more human resources on the part of the client than working with one vendor.\nIn order to limit risk and complexity, NSO chose to pursue a best-of-breed strategy and limit the system to the smallest number of components to meet its needs. Ultimately, the decision of whether to pursue single-vendor versus best-of-breed needs to be based on organizational capacity, budget, risk profile, and the maturity of the software options available. That being said, as the needs of the newborn screening community become more complex, the viability of a \u201cone size fits all\u201d model of software delivery will become increasingly more difficult to maintain.\nAnother set of parameters that was considered was the configurability versus customization of the system. Configurability refers to the ability of trained, usually internal users to adjust key parameters of the software via a user interface. This is in contrast to \u201ccustomization,\u201d which involves having vendors or custom software developers directly adjust the code of their software to enable adjustment of these parameters. A system was chosen that enabled NSO to configure items such as analyte cut-off values, the order of items on a puncher list, the contents of key screen-positive results, reports\/letters, and patient follow-up workflows.\nThe more configurable a piece of software, the more individual newborn screening programs can adjust it to meet their specific needs. This can give programs greater control over timelines as they do not have to wait on vendors to customize their software to meet less common end-user needs. As the degree of internal configurability increases, it becomes incumbent on programs to ensure that they have the ability to program and test their configuration changes to ensure that any changes that are made are in fact having the desired effects and no undesirable effects. This approach necessitated a change in human resources at NSO with the creation of new roles for lab subject matter experts to assist the project and the reassignment of lab and follow up personnel to configure the software. Although these HR needs will be higher during the initial build out phase, it is expected that some of the roles will remain in place on a permanent basis once the initial build is over.\nThe need to balance the ability to test a large volume of time-sensitive samples versus enabling a patient-centric view of individual orders was a key driver in determining NSO\u2019s approach. At its core, this dilemma comes down to the ability for a lab system to apply a standard set of tests to a series of samples quickly (known as a batch or \"sample-centric\" model) versus the ability to track individuals through the system from accessioning through resulting (known as a \"patient-centric\" model). In a screening model where there are hundreds of samples a day, speed is important in order to receive timely results on key sets of time-sensitive tests. This need is best served by a sample-centric mode. However, most standard LIS systems are based on a patient-centric model where the details about the individual to be tested are entered into the system and tests and later results are assigned to the patient file.\nThe majority of requisitions received at NSO are hand-filled test requisitions with attached blood filter papers. In order to facilitate timely testing, the filter papers and requisitions are separated and identical pre-printed screening labels are attached to each. These requisitions are sent to data entry clerks to enter the data over the course of the day while the blood spots are immediately punched and tested in the lab. The information is linked in the LIS by the common screening accession number. The issue that can arise is that in \u201csample-centric mode,\u201d each sample is usually \u201cpreassigned\u201d a specific set of tests in order to ensure that they can flow quickly through the lab processes that are needed to produce the standard newborn screening results. The sample-centric system can become particularly unwieldy when there is a need to order different tests on a sample at accessioning (as is the case with diagnostic, monitoring, or partial sample panels). To address this, a hybrid set of processes enabling both high-volume sample-centric workflows for standard newborn screening samples while also allowing for patient-centric test ordering to be created.\n\nUser considerations \nOnce architecture and technical considerations had been developed, user needs and requirements were gathered and tracked. A program-wide exercise was carried out to determine desired functionality at a high level, as well as improvements that were necessary to support specific work flows. This \u201cwish list\u201d was documented in order to facilitate the evaluation of overall project success.\nWish list items fell into a number of broad categories, including the need for better program data management and work-flows; functionality to support\/automate processes not currently supported in the LIS that could affect efficiency and safety; the ability to efficiently implement new lab paradigms such as molecular DNA screening and diagnostics; a desire to integrate both diagnostic (patient-centric) and screening (batch-centric) paradigms; the ability to integrate best-of-breed equipment; case management and follow-up procedure flexibility; and open access to SIMS data for program and laboratory analytics.\nConsolidated lists were tracked as part of the project to ensure that the system matched user expectations. Figure S1 User Needs Tracking (see \"Supplementary materials\" section) shows a graphical representation of these consolidated user needs that are categorized by the functional area.\n\nProcurement considerations \nThe user needs and functionality statements were used to develop a set of formal business and functional requirements that were utilized in the tendering process. Requirements management software (Enterprise Architect from Sparx Systems) was used to ensure requirement traceability back to the key architecture and functional areas described above. Each business requirement was coupled with a number of related functional requirements. These business requirement (BR) and functional requirements (FR) were used to create an in-depth request for proposal (RFP) for a SIMS. Table 1 shows an example of a BR and related FRs describing part of the clinical\/medical review module that was described earlier in this paper.\n\r\n\n\n\n\n\n\n\n\n\n\n Table 1. Business and Functional Requirements for tendering\n\n\n\nThe evaluation of the RFP was divided into two phases. The first phase consisted of a written response from qualified vendors that consisted of two sets of criteria. The first criteria was a pass\/fail gate that was applied to all mandatory requirements. All vendors that were able to meet these criteria were then scored for non-mandatory requirements. Non-mandatory (desirable) and results were consolidated and averaged. Scoring was weighted based upon an agreed upon relative importance of the requirements. Figure S2 Example of Phase 1 Scoring (see \"Supplementary materials\" section) shows a generic example of phase one scoring that was applied for all vendors that met the pass\/fail criteria.\nThe top vendors were then invited to participate for the second phase of the RFP process. In the second phase of the procurement, the vendors were brought in for in-depth meetings to describe their proposals and were then asked to provide demonstrations of how various aspects of their systems met NSO\u2019s needs.\nOnce phase two was complete, NSO used a technique known as decision matrix analysis[11] to determine the best option for moving forward. A decision matrix is a table (usually set up in a spreadsheet) where the options you wish to analyze are divided into rows and the factors that influence this decision are divided into columns. The relative importance of these decisions was assigned a weight, usually determined by surveying the key decision makers (e.g., 1 is a low impact and 10 is a high impact). The individual scores for each area were calculated by multiplying the raw score by the assigned impact factor, and then the scores were added together to get the final score for each option.\nEight key factors were identified as critical to the overall decision, including the internal and external costs (including personnel and consultants), software and hardware costs, and project costs associated with the build for this option (higher score means lower costs); vendor support capabilities; the ability to deliver a product in a timely fashion; the ability to allow NSO to innovate; technical integration risks, including the technical underpinnings of the software and how easy or hard it will be to integrate with other systems (higher score means lower risk); and secondary benefits (which includes the vendor fixing our current issues without creating new ones). Figure 5 shows an anonymized mock-up of the decision matrix tool that was used to make the final vendor choice.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 5. NSO\u2019s decision matrix analysis tool that was used to make final vendor choices\n\n\n\nA COTS solution, OMNI-Lab NBS from Integrated Software Solutions (ISS) Pty. Ltd. was chosen to fulfill a number of key pieces of the architecture. Following the award of the procurement, ISS and NSO worked on the development of a mutually acceptable contract. Although many non-procurement personnel usually view contract negations as primarily about money, there are in fact a number of other key factors that need to be resolved in addition to the initial and ongoing costs of the software, including the type of contract payment mechanisms (such as time and materials versus deliverable-based contracts). In the case of NSO, a deliverable-based contract was established where contract payments were linked to key milestones. This contractual structure has enabled NSO and the vendor to take a team approach during the design and delivery of the software. Although the upfront effort in defining milestones and working towards mutually agreeable sign offs of project phases can be relatively high, the long-term benefits of ensuring that both sides are on the same page during the implementation phase has made the effort worthwhile.\n\nImplementation considerations \nA decision as to how to move forward with system implementation created a challenge, as NSO currently has a newborn screening laboratory information management system (LIMS) in place. A staged\/agile approach that divided the project into smaller manageable pieces of work, known as sprints, was put in place to mitigate this challenge. The approach enabled the team to focus on key areas of implementation while managing risk. Higher priority areas with lower change management needs were staged earlier than those requiring more change management or where there was currently a system in place to handle day to day operations.\nApplying this approach, NSO launched its CCHD screening implementation using the SIMS solution in mid-2017, and blood spot CMV DNA testing of babies not passing their hearing screen was moved into the new system in mid-2018. DBS Screening, which requires the most change management, is currently being deployed. To maximize efficiency, sprints also overlap so that development in one area is started while others are still in progress. This approach has a number of benefits, including enabling less risky product acceptance and module configuration across testing and production environments which, in turn, reduces the risk for the more complex deployments.\nSeveral processes to ensure overall build quality have been initiated. The system is deployed across three environments (development, test, production) in order to ensure that all new features can be scoped and configured in the development environment before being tested and finally put into production. This enabled NSO to ensure that new features and configurations were not put into the working environment until they were fully tested and, therefore, do not cause any issues with the production system. Table 2 describes how each an environment is utilized in order to ensure full production quality for software, data, and configurations.\n\r\n\n\n\n\n\n\n\n\n\n\n Table 2. NSO environment strategy\n\n\n\nA number of internal subject matter experts were consulted to develop and deploy test cases for all new system areas in order to ensure that the implications of these changes are fully understood and tested before deployment. All software bugs and requested features are tracked using issue tracking software (JIRA from Atlassian) and, where possible, system testing is automated using a standard testing tool (Ranorex) in order to be able to easily repeat key end-to-end system tests as new features are deployed. The application of these processes, in addition to the staged\/agile approach to deployment and implementation, is helping to mitigate project deployment risks, especially as the project moves into later and more complex stages.\n\nOrganizational and jurisdictional considerations \nNewborn screening is delivered via a variety of organizational structures around the world, and therefore each program\u2019s overall needs will be different. In spite of this, there are still a number of key areas where specific organizational strategies can be used to ensure the successful delivery of an IS implementation. One critical factor for success is the overall vision and support of senior management. Senior management vision and support has been critical to the overall success of the project to date. Key leadership personnel from NSO, as well as CHEO executive and IS, sit on the project steering committee, which provides guidance and oversight for the project. The project is also represented at the NSO leadership committee in order to ensure that the project aligns with overall NSO strategic direction and goals. Regular progress reports are provided to external stakeholders via the NSO Advisory Council.[12]\nEnsuring that costs and budgets align with overall program priorities is also key. Many newborn screening programs exist within budget constrained environments where day-to-day operational priorities take precedent over longer term endeavors such as updating IT systems. Beyond ongoing program support for the IS rebuild, NSO has partially mitigated this issue by treating its IT system as a depreciable asset like other capital equipment and infrastructure and including IT system replacement costs in the overall capital budgeting process. There can also be other novel approaches to working through resourcing issues; for example, other newborn screening programs have begun to pool resources across jurisdictions to share infrastructure in order to keep build and operating costs down.\nAnother critical area that is often overlooked is the need for strong support from related information technology and communications (ICT) groups that support the newborn screening organization. Often the modern demands of IT can sometimes outstrip the capabilities of some of the older IT systems within a public health setting. In addition, IT and network infrastructures are often procured and managed via different entities or departments than those that are responsible for operating the newborn screening lab and\/or program. The emergence of cloud-based technologies will likely help to mitigate some of these pressures in the future as the reliance on related third parties to manage the \u201cnuts and bolts\u201d of infrastructure will likely decrease.\nBeyond infrastructure support, jurisdictional considerations can also have a strong influence on a program\u2019s choices of SIMS options. Some jurisdictions are procuring region-, state- or province-wide integrated electronic records solutions, such as those from EPIC Systems Corporation (Verona, WI, USA)[13] and Cerner Corporation (North Kansas, MO, USA).[14] Newborn screening labs that fall within these jurisdictions are looking to understand whether they can utilize these systems for their SIMS needs, and in a few early cases they have already done so. In addition, some jurisdictions are outsourcing entire newborn screening lab functions, such as MSMS and related biochemistry, along with the related IT requirements. Although these cross-jurisdictional integrated systems can have a definite impact on a newborn screening lab\u2019s SIMS choices, they need not necessarily be a constraint to meeting a lab\u2019s overall needs.\nThe ability to understand its key architectural and functional requirements can greatly assist in enabling a lab to perform newborn screening functions within an integrated electronic record solution or outsourced service environment. These shared infrastructure solutions may also ease some of the infrastructure pressures and allow labs to focus on their core needs while sharing some of the IS infrastructure costs across multiple functional areas throughout their institutions.\n\nProject benefits and impacts \nA critical part of any long-term project (or group of projects) is to identify and measure their benefits in order to assess ongoing value. In formal project management terminology, this is often called \u201cbenefits realization.\u201d Benefits realization is broadly defined as \u201cthe process of organizing and managing, such that the potential benefits arising from the use of IT are actually realized .\u201d[15] In more practical terms, this can be considered a synthesis of user needs identification, as discussed earlier in the paper, combined with broader program goals. The project\u2019s key success metrics were determined by the project leadership group at project initiation and are being tracked throughout the project lifecycle. Figure S3 Benefits Realization Tracking (see \"Supplementary materials\" section) shows the benefit realization for key-in lab benefits tracking across two different dimensions, namely the impact of the benefit (direction of the arrow) and program area(s) impacted (color of the arrow). In addition, once the SIMS is implemented, NSO will track the impact of the changes on critical clinical program measures such as positive predictive values (PPV) and false positive rates (FPRs). Analysis will be undertaken utilizing both the relevant LIS systems and the NSO Data Warehouse. These clinical metrics are reported as part of NSO\u2019s clinical communication to its treatment groups and, more broadly, as part of the NSO annual report. It is expected that given that NSO will be able to achieve more granular control over many of its processes, this will lead to better PPVs and decreased FPRs. Post-launch NSO will continue to monitor these measures in order to ensure continuous quality improvement for its clinical metrics.\n\nConclusions \nA number of critical lessons in regard to implementing a flexible SIMS architecture were learned through the stages that were described in this paper. One of the key lessons learned was that developing, implementing, and deploying a SIMS is about much more than the technology. The SIMS has become the central analytical, process management, and communication hub for many newborn screening programs. In order to realize the full benefit of establishing an \u201cideal\u201d SIMS, it is necessary to engage your full team to ensure that the promise of the new technologies can be achieved. At its most basic level, this requires strong program leadership, vision, and strategy. Although the journey can be quite challenging, the benefits of taking a holistic approach to SIMS development in terms of the ability to be flexible and innovate far outweighs the efforts and costs in the long run.\n\nSupplementary materials \nSupplementary materials can be found at https:\/\/www.mdpi.com\/2409-515X\/5\/1\/9\/s1. (.zip file)\n\nAcknowledgements \nThe authors would like to acknowledge a number of groups and individuals that without their diligent and hard work, this project and paper would not be possible. The entire Newborn Screening Ontario team spent many hours providing invaluable input into defining their needs and providing a vision of what an \u201cideal\u201d newborn screening information system would look like. Steve Conrad led a team of dedicated NSO staff and students to shape these \u201cideals\u201d into formal business and technical requirements. These were used to craft the enterprise and functional architectures that are presented in the paper and were used by the team to create the request for information and proposal (RFI\/RFP) documents that formed the basis for the procurement of the new off the shelf systems. The Project Lancet (the internal NSO name for the rebuild project) team has spent many hours bringing the vision and requirements to reality. The core Lancet implementation team (in alphabetical order: Marlene Elliott, Mike Kowalski, Shannon McClelland, Nate McIntosh, Chloe O\u2019Sullivan, Megan Sayer); the subject matter experts (Sarah Foster, Janet Marcadier, Larry Fisher, Alison Evans) and the project management group (Marina Hebert and Jim Bottomley) have continued to impress with their ability to process complex issues and move them forward through the product and project lifecycle continuum. NSO operations staff continue to dedicate many hours of their time to continuing to ensure both project and product success, especially David Lawrie and Christine McRoberts who are ensuring that both the operational implementation and quality continue to meet NSO\u2019s high standards. Medical\/scientific input at Project Lancet\u2019s demo sessions (a.k.a. \u201ctracers\u201d) from Dennis Bulman, Kristin Kernohan, Nathalie Lepage, and others has been critical, ensuring that the scientific validity of the medical decisions made utilizing the SIMS will be appropriate. Leadership support and vision is critical to ensure the success of any project of this magnitude; as such, the authors would like to thank Mari Teitelbaum for her diligent efforts to provide a bridge between NSO and CHEO provincial programs as chair of the Project Lancet steering committee and providing both timely and sage advice to ensure that the project stays on track. Finally, the authors would also like to thank everyone who helped provide assistance and input in composing the figures and tables for the paper, in particular Lauren Gallagher (Figure 1), Steve Conrad (Figure 3a,b and Figure 4), Shannon McClelland (Figure S1), and Marlene Elliott (Table 2).\n\nAuthor contributions \nM.P. and M.H. wrote and revised the original manuscript and figures (with the input of those discussed below). J.M. and P.C. provided input into the overall manuscript content and manuscript revisions. P.C. was the primary mentor on the paper and provided oversight for content and details. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.\n\nFunding \nThis research received no external funding.\n\nConflicts of interest \nThe authors declare no conflict of interest.\n\nReferences \n\n\n\u2191 \"About CLIR\". Mayo Foundation for Medical Education and Research. https:\/\/clir.mayo.edu\/Home\/About . Retrieved 26 November 2018 .   \n\n\u2191 Pantanowitz, L.; Henricks, W.H.; Beckwith, B.A. (2007). \"Medical laboratory informatics\". Clinics in Laboratory Medicine 27 (4): 823\u201343. doi:10.1016\/j.cll.2007.07.011. PMID 17950900.   \n\n\u2191 Sepulveda, J.L;. Young, D.S. (2013). \"The ideal laboratory information system\". pp. 1129\u201340. doi:10.5858\/arpa.2012-0362-RA. PMID 23216205.   \n\n\u2191 \"CCHD Screening\". Newborn Screening Ontario. https:\/\/www.newbornscreening.on.ca\/en\/health-care-providers\/submitters\/cchd-screening-implementation . Retrieved 26 November 2018 .   \n\n\u2191 \"Expanded Hearing Screenings - Overview\". Newborn Screening Ontario. https:\/\/www.newbornscreening.on.ca\/en\/page\/overview . Retrieved 26 November 2018 .   \n\n\u2191 \"Recommended Uniform Screening Panel\". Health Resources & Services Administration. July 2018. https:\/\/www.hrsa.gov\/advisory-committees\/heritable-disorders\/rusp\/index.html . Retrieved 26 November 2018 .   \n\n\u2191 Secretary\u2019s Advisory Committee on Heritable Disorders in Newborns and Children (2011). \"2011 Annual Report to Congress\" (PDF). Health Resources & Services Administration. https:\/\/www.hrsa.gov\/sites\/default\/files\/hrsa\/advisory-committees\/heritable-disorders\/reports-recommendations\/reports\/2011-annual-report.pdf . Retrieved 20 February 2018 .   \n\n\u2191 McCudden, C.R.; Henderson, M.P.A. (2016). \"Laboratory Information Systems\". In Clarke, W.. Contemporary Practice in Clinical Chemistry (3rd ed.). pp. 263\u201376. ISBN 9781594251894.   \n\n\u2191 Huhns, M.S.; Singh, M.P. (2005). \"Service-oriented computing: key concepts and principles\". IEEE Internet Computing 9 (1): 75\u201381. doi:10.1109\/MIC.2005.21.   \n\n\u2191 Hermann, S.A. (2010). \"Best-of-breed verses integrated systems\". American Journal of Health-system Pharmacy 67 (17): 1406, 1408, 1410. doi:10.2146\/ajhp100061. PMID 20720237.   \n\n\u2191 Milder, P. (2018). \"Decision Matrix Analysis\". ToolsHero. https:\/\/www.toolshero.com\/decision-making\/decision-matrix-analysis\/ . Retrieved 27 November 2018 .   \n\n\u2191 \"Advisory Council\". Newborn Screening Ontario. https:\/\/www.newbornscreening.on.ca\/en\/advisory-council . Retrieved 26 November 2018 .   \n\n\u2191 \"Epic\". Epic Systems Corporation. https:\/\/www.epic.com\/ . Retrieved 26 November 2018 .   \n\n\u2191 \"Cerner\". Cerner Corporation. https:\/\/www.cerner.com\/ . Retrieved 26 November 2018 .   \n\n\u2191 Ward, J.; Elvin, R. (1999). \"A new framework for managing IT\u2010enabled business change\". Information Systems Journal 9 (3): 197\u2013221. doi:10.1046\/j.1365-2575.1999.00059.x.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\">https:\/\/www.limswiki.org\/index.php\/Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles on health informatics\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 29 January 2019, at 00:42.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 223 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","ab125d6daef2f763e588fcd5432c1b66_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Building_a_newborn_screening_information_management_system_from_theory_to_practice skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Building a newborn screening information management system from theory to practice<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p>Information management systems are the central process management and communication hub for many newborn screening programs. In late 2014, Newborn Screening Ontario (NSO) undertook an end-to-end assessment of its <a href=\"https:\/\/www.limswiki.org\/index.php\/Information_management\" title=\"Information management\" class=\"wiki-link\" data-key=\"f8672d270c0750a858ed940158ca0a73\">information management<\/a> needs, which resulted in a project to develop a flexible information systems (IS) ecosystem and related process changes. This enabled NSO to better manage its current and future <a href=\"https:\/\/www.limswiki.org\/index.php\/Workflow\" title=\"Workflow\" class=\"wiki-link\" data-key=\"92bd8748272e20d891008dcb8243e8a8\">workflow<\/a> and communication needs. An idealized vision of a screening information management system (SIMS) was developed that was refined into enterprise and functional architectures. This was followed by the development of technical specifications, user requirements, and procurement. In undertaking a holistic full product lifecycle redesign approach, a number of change management challenges were faced by NSO across the entire program. Strong leadership support and full program engagement were key for overall project success. It is anticipated that improvements in program flexibility and the ability to innovate will outweigh the efforts and costs.\n<\/p><p><b>Keywords<\/b>: newborn screening, neonatal screening, laboratory information management system, laboratory information system, LIMS, LIS, screening information management\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Dating back to the early days of computers, a variety of software programs have been utilized to automate numerous <a href=\"https:\/\/www.limswiki.org\/index.php\/Laboratory\" title=\"Laboratory\" class=\"wiki-link\" data-key=\"c57fc5aac9e4abf31dccae81df664c33\">laboratory<\/a> workflows, processes, and related activities. In newborn screening labs, a screening information management system (SIMS)\u2014a set of integrated software components which may or may not be from a single vendor that can be used to automate screening workflows\u2014has become a useful tool for automation. In addition, a SIMS can be used to manage other aspects of overall laboratory management and programmatic aspects, such as case reporting and follow-up.\n<\/p><p>Population-based newborn screening began appearing in many North American and European countries over 50 years ago. Due to the low number of tests per patient, many programs were initially able to handle their testing and reporting functions using manual workflows, paper-based lab logs, and patient reports. Labs initially employed simple computer functions such as using word processing to create mailing lists\/labels and basic reports. Eventually, usage evolved into managing basic lab workflows and quality metrics using standard software such as spreadsheets and databases.\n<\/p><p>In the late 1990s and early 2000s, the shift to complex equipment such as <a href=\"https:\/\/www.limswiki.org\/index.php\/Tandem_mass_spectrometry\" title=\"Tandem mass spectrometry\" class=\"wiki-link\" data-key=\"55f167a11d8b5037392ba845986bf6bf\">tandem mass spectrometers<\/a> (MS\/MS), that could test for dozens of target diseases and related follow-ups, necessitated the development of a more complex information technology infrastructure to manage newborn screening programs. This shift required the development of new, often integrated, software programs to manage the new challenges that were created by these changes. Also, a number of new target disorders required very quick turnaround times (TATs) in order to locate infants who were at risk of early decompensation that could cause severe morbidity or even death. This meant that neonatal screening labs required information systems that could process large batches of results quickly in order to ensure that infants who tested positive for these disorders could be identified in a timely manner.\n<\/p><p>The current shift into new paradigms for newborn screening, including molecular (DNA-based) screening, is pushing the limits of the original software architectural approaches of these integrated packages. The ability of <a href=\"https:\/\/www.limswiki.org\/index.php\/Clinical_decision_support_system\" title=\"Clinical decision support system\" class=\"wiki-link\" data-key=\"095141425468d057aa977016869ca37d\">decision support tools<\/a> such as Collaborative Laboratory Integrated Reports (CLIR)<sup id=\"rdp-ebb-cite_ref-MayoAboutCLIR_1-0\" class=\"reference\"><a href=\"#cite_note-MayoAboutCLIR-1\">[1]<\/a><\/sup> to be linked, potentially in real time via (APIs)<sup id=\"rdp-ebb-cite_ref-PantanowitzMedical07_2-0\" class=\"reference\"><a href=\"#cite_note-PantanowitzMedical07-2\">[2]<\/a><\/sup>, also stretches the limits of current SIMS functionality. In addition, the potential of new point-of-care technologies and the possibility of screening for certain time-critical disorders prenatally have stretched these new paradigms even further. As these needs evolve, a more comprehensive information ecosystem approach to newborn screening SIMS architectures will be required to support the expanding needs of newborn screening.\n<\/p><p>In their paper \u201cThe Ideal Laboratory Information System,\u201d Sepulveda and Young<sup id=\"rdp-ebb-cite_ref-SepulvedaTheIdeal13_3-0\" class=\"reference\"><a href=\"#cite_note-SepulvedaTheIdeal13-3\">[3]<\/a><\/sup> described a set of processes and technology modules that they considered key for the development of an \u201cideal\u201d comprehensive clinical <a href=\"https:\/\/www.limswiki.org\/index.php\/Laboratory_information_system\" title=\"Laboratory information system\" class=\"wiki-link\" data-key=\"37add65b4d1c678b382a7d4817a9cf64\">laboratory information system<\/a> (LIS). An approach to developing an \u201cideal\u201d SIMS in a newborn screening context is described in this paper. The experiences of Newborn Screening Ontario (NSO) are used to illustrate the various technical and administrative approaches that are involved in undertaking such a project. The potential benefits, impacts, and risks of taking this approach are discussed and key lessons are highlighted.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Theory\">Theory<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Outgrowing_NSO.27s_current_SIMS\">Outgrowing NSO's current SIMS<\/span><\/h3>\n<p>NSO is located in the Canadian province of Ontario. Canada has a publicly funded health care system that is managed by its thirteen provinces and territories. Ontario is Canada\u2019s largest province, with a population of approximately 14 million people and an annual public healthcare budget of over $CDN 50 billion in 2017. NSO is a fully integrated program that manages all aspects of newborn screening and follow-up activities for children born in the province of Ontario (approximately 140,000 births per year). With the expansion of provincial newborn screening in 2006, the lab and program management was moved from the provincial public health branch to the Children's Hospital of Eastern Ontario (CHEO), a leading academic pediatric research <a href=\"https:\/\/www.limswiki.org\/index.php\/Hospital\" title=\"Hospital\" class=\"wiki-link\" data-key=\"b8f070c66d8123fe91063594befebdff\">hospital<\/a> that is located in the city of Ottawa. NSO receives ongoing annual base funding from the province as well as project-based funding to implement new programs as needed.\n<\/p><p>NSO performs blood-spot-based screening for metabolic and endocrine disorders as well as for hemoglobinopathies, cystic fibrosis (CF), and severe combined immune deficiencies (SCID). Multiple tier testing strategies are also used for a number of these diseases. In addition, NSO offers diagnostic and monitoring testing for many of the aforementioned disorders. Further, NSO coordinates and administers the provincial critical congenital heart disease (CCHD) point-of-care screening program.<sup id=\"rdp-ebb-cite_ref-NSO_CCHD_4-0\" class=\"reference\"><a href=\"#cite_note-NSO_CCHD-4\">[4]<\/a><\/sup> NSO is also working with Ontario\u2019s Infant Hearing Program (IHP) to phase in blood spot testing for hearing loss risk factors including cytomegalovirus (CMV) DNA and certain key genetic risk factors.<sup id=\"rdp-ebb-cite_ref-NSO_EHS_5-0\" class=\"reference\"><a href=\"#cite_note-NSO_EHS-5\">[5]<\/a><\/sup> Figure 1 shows a timeline of newborn screening in Ontario from the beginning of province-wide screening for phenylketonuria (PKU) to the near future.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" class=\"image wiki-link\" data-key=\"5ef0d90f5b4f6c2c86de3688450bd788\"><img alt=\"Fig1 Pluscauskas IntJOfNeoScreen2019 5-1.png\" src=\"https:\/\/www.limswiki.org\/images\/2\/2a\/Fig1_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Newborn Screening Ontario timeline<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Like many other newborn screening programs, NSO coordinates and manages programmatic elements in addition to testing. These include pre-screening education, distribution of collection cards and communications, case management for special screening circumstances such as transfused or premature babies, short term follow-up of screen-positive babies, and analysis\/reporting of key performance indicators. Finally, NSO has an academic mandate and coordinates regular meetings of treatment center physicians and health care providers, performs research, and participates in research collaborations, including work that is aimed at understanding the long-term outcomes of screening. It therefore requires an information infrastructure to support these diverse yet closely interrelated set of screening activities.\n<\/p><p>Newborn screening is a rapidly evolving field. An illustration of the rate of change in the last 15 years can be found in the adoption of the Recommended Universal Screening Panel (RUSP)<sup id=\"rdp-ebb-cite_ref-HRSARecommended18_6-0\" class=\"reference\"><a href=\"#cite_note-HRSARecommended18-6\">[6]<\/a><\/sup> in the United States. In 2002, the American College of Medical Genetics reviewed 81 conditions and placed 29 of them in a core screening panel, which made up the original RUSP. At that time, the majority of U.S. states screened for only six disorders. Today, all states screen for at least 29 conditions.<sup id=\"rdp-ebb-cite_ref-SAC2011Annual11_7-0\" class=\"reference\"><a href=\"#cite_note-SAC2011Annual11-7\">[7]<\/a><\/sup> The RUSP was, and continues to be, influential internationally.\n<\/p><p>A broader screening panel resulted in more test results per child screened, the need for novel analytical techniques, and the need for second\/multiple tier testing to improve specificity. Like many other newborn screening labs, the NSO laboratory uses immunoassays, enzyme assays, <a href=\"https:\/\/www.limswiki.org\/index.php\/Chromatography\" title=\"Chromatography\" class=\"wiki-link\" data-key=\"2615535d1f14c6cffdfad7285999ad9d\">chromatography<\/a>, <a href=\"https:\/\/www.limswiki.org\/index.php\/Liquid_chromatography%E2%80%93mass_spectrometry\" title=\"Liquid chromatography\u2013mass spectrometry\" class=\"wiki-link\" data-key=\"d171745b38c8d2ed7d274d2cc13fa1f3\">liquid chromatography\u2013mass spectrometry<\/a> (LC-MS\/MS), <a href=\"https:\/\/www.limswiki.org\/index.php\/Flow_injection_analysis\" title=\"Flow injection analysis\" class=\"wiki-link\" data-key=\"4c6880d74b681ef0ed1e317b2bac3647\">flow injection analysis<\/a> tandem mass spectrometry (FIA-MS\/MS), quantitative real-time PCR (qPCR), and primer extension and next generation sequencing techniques for screening. Despite the breadth of techniques used, it is sometimes necessary to reflex first-tier screen-positive samples to second-tier testing methods in order to achieve an acceptable positive predictive value for a positive screening result. An example of tiered testing strategies and complex screening logic is congenital adrenal hyperplasia (CAH). First tier CAH screening relies on the measurement of 17-hydroxyprogesterone. The results are interpreted based on gestational age and birth-weight-specific screening thresholds. A panel of steroids is measured by LC-MS\/MS in first-tier positive samples, and another set of screening logic is applied to this panel of results.\n<\/p><p>As the complexity of testing in newborn screening has increased, so has the capacity and expertise in the laboratory. As a result, newborn screening labs are well equipped to provide follow-up diagnostic and monitoring tests for screen-positive newborns and those affected by target diseases. In the case of NSO, we offer monitoring for patients with phenylketonuria, tyrosinemia, and glutaric aciduria type 1, along with molecular DNA testing for all disease targets of newborn screening. Finally, the pace of change in newborn screening requires that labs continue to develop and implement new screening, monitoring, and diagnostic approaches on an ongoing basis. NSO engages in small- to large-scale research studies that require data to be rapidly and readily available, including access to program, pre-analytical, testing, and follow-up data.\n<\/p><p>A single LIS solution for overall lab and program management was procured by NSO when it was established in 2006. This system was critical to the initial launch and subsequent development of the program, as it provided \u201ctightly-coupled\u201d support for the newborn screening laboratory information flows between equipment, quality control (QC), and case management. This approach worked well for ensuring the efficient management of lab workflows\u2019 timely movement of <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> to follow-up teams, especially in relation to the standard blood spot testing for metabolic disorders. As the complexity of NSO\u2019s mandate increased, it was determined that a broader approach was required to allow for the development of new areas. These areas included the expansion of molecular screening, diagnostic testing, complex screening algorithms, point-of-care screening, short- and long-term follow-up, and sample lifecycle management (see below). As noted above, it was determined that a broader newborn screening information ecosystem needed to be developed.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Designing_an_.22ideal.22_SIMS_for_NSO\">Designing an \"ideal\" SIMS for NSO<\/span><\/h3>\n<p>In their book, McCudden and Henderson noted \u201cthe present and future of health care relies on electronic information.\u201d<sup id=\"rdp-ebb-cite_ref-McCuddenLab16_8-0\" class=\"reference\"><a href=\"#cite_note-McCuddenLab16-8\">[8]<\/a><\/sup> The complexities that are required to capture complex demographics, manage high sample volumes in a timely manner, and integrate multiple test platforms have strained many programs\u2019 abilities to move forward with these innovations. Newborn screening programs require a SIMS that is capable of managing all data coming into and going out of the program. There must also be facilities to customize the SIMS to the needs of the program. The functions of an ideal SIMS for newborn screening programs are outlined in Figure 2.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" class=\"image wiki-link\" data-key=\"06fa7a55191ed1662ac367042cfe5dca\"><img alt=\"Fig2 Pluscauskas IntJOfNeoScreen2019 5-1.png\" src=\"https:\/\/www.limswiki.org\/images\/c\/cd\/Fig2_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> Concepts and functionality in a SIMS (screening information management system). Each secondary node represents a concept that is addressed by a SIMS. Tertiary nodes show specific examples of SIMS functionally that fall within the linked concept. Colors are used to emphasize the link between concepts and functions.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Screening for CF provides an illustrative example of the requirements of a SIMS; colors in the text are a reference to the concepts\/functions that are illustrated in Figure 2. Generally, the first point at which information for a sample enters the SIMS is when samples arrive in the lab and demographics are entered or retrieved from hospital or jurisdictional registries (blue). Samples are punched for analysis (blue) in parallel with demographic entry to reduce turnaround time. The status of all samples from time of receipt in the lab to generation of positive or negative mailers is tracked using real-time dashboards that are located in the lab and administrative areas (purple). In many screening labs, an IRT-DNA CF screening strategy is used, with immunoreactivity trypsinogen (IRT) being used as a first-tier biomarker (blue). Testing is performed in accordance with relevant procedures (red). After review and acceptance of quality control results (red), a sample with elevated IRT will generate a request (orange) for CFTR genotyping (cyan). If screened positive for CF (orange), a risk letter will be generated (green). The risk letter incorporates both the IRT (blue) and CF genotype (cyan) results. Many jurisdictions have also implemented or are considering implementing third-tier sequencing that will further add to the complexity of this test. The program will refer the screen-positive infant to the appropriate care provider and this referral will be documented (green). The NBS program will be notified when the newborn is retrieved and this will be documented (green). Diagnostic testing will be performed and the NBS program will be informed of the final diagnosis (green). Periodically, the program will evaluate CF screening performance. This evaluation requires information from all aspects of the screening program (purple) such as, however not limited to, test turnaround time (blue), time to referral and retrieval (green), final or working diagnosis, and positive and negative predictive value (green). Babies and families with CF, CF variants, or false positive results may be invited to participate in further program evaluation and research, and the SIMS must facilitate this (green).\n<\/p><p>While not exhaustive, the CF screen-positive scenario provides an overview of the requirements of a SIMS for NBS. Not described above is the need to make changes to the system as disorders, methods, tests, and screening logic changes. Once changes are made, an approach to testing the system to ensure that it functions as intended will be required.\n<\/p><p>In order to realize this vision, NSO launched a large-scale project in late 2014 to develop a comprehensive service oriented architecture (SOA)<sup id=\"rdp-ebb-cite_ref-HuhnsService05_9-0\" class=\"reference\"><a href=\"#cite_note-HuhnsService05-9\">[9]<\/a><\/sup> solution type in which information can flow seamlessly through the key areas of the program. The solution consisted of seven key functional modules (services) that, when combined, can achieve this vision, including:\n<\/p>\n<dl><dd>1. Patient record management: This module is expected to handle most of the pre-analytical aspects of NSO. The module will be used to receive samples in the laboratory and to enter demographic data for these samples. It consists of a number of processes including but not limited to sample reception, demographic\/clinical indication data entry and validation, test ordering (batch and custom), linking multiple samples to a patient; triggering workflows for samples that are unsatisfactory for testing, and ensuring electronic transmission receipt of key information (e.g., demographics in, results out) via <a href=\"https:\/\/www.limswiki.org\/index.php\/HL7\" title=\"HL7\" class=\"mw-redirect wiki-link\" data-key=\"944ec30acac5b7c05ef9ce3c1b4c22dc\">HL7<\/a> and related protocols.<\/dd><\/dl>\n<dl><dd>2. Laboratory information system\/quality control: These modules will be expected to handle most of the analytical and QC data, including workflows and dataflows within the NSO laboratory environment.<\/dd><\/dl>\n<dl><dd>3. Clinical\/medical review: The review and releasing of results is a critical component of laboratory information workflow between the technologists and medical\/scientific staff. This module is designed to apply pre-programmed logic to distill critical lab information in order to produce actionable results in an efficient manner. The clinical\/medical review module will consist of a configurable rules-based system that integrates information from multiple data sources, applies disorder logic, and streamlines workflows to drive decisions. It will also include a rules-based expert system, including a web-based graphical user interface (GUI) to support the review and reporting functions, an administrative interface for creating and managing rules, and a user-configurable knowledge base that is \u201chuman readable.\u201d In its first iteration, this functionality will be embedded in the core SIMS. The possibility of using an independent rules engine service that can be made available to multiple systems in the longer term will also be explored.<\/dd><\/dl>\n<dl><dd>4. Case management: This module will be expected to handle most of the post-analytical aspects of NSO, including follow-up with submitters and treatment centers.<\/dd><\/dl>\n<dl><dd>5. Sample lifecycle management: The need to track blood collection cards throughout their lifecycle, from distribution through transport to storage, is a key and sometimes overlooked process within newborn screening program management. A local vendor has worked with NSO to develop a system that helps manage sample card inventory management, track the transport of samples and card usage, and monitor expiry dates for filter paper; the vendor is also working towards tracking in-lab samples, off-site storage, and destruction.<\/dd><\/dl>\n<dl><dd>6. Reporting and analytics: In order to support the data intensive nature of newborn screening results, NSO has developed a data warehouse to enable user-controlled data access and to manage automated and <i>ad-hoc<\/i> reporting on all types of NSO data.<\/dd><\/dl>\n<dl><dd>7. Decision support: Third-party products can be linked in to provide decision support to key decision makers to assist in timely, effective decision making.<\/dd><\/dl>\n<p>These functional modules were combined to produce an enterprise architecture to help guide the project. Figure 3a shows the \u201ctightly coupled\u201d enterprise architecture of the original NSO information system infrastructure, and Figure 3b shows the desired final state.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" class=\"image wiki-link\" data-key=\"15789c1d5caee1f14ff08b7880ecb8bf\"><img alt=\"Fig3 Pluscauskas IntJOfNeoScreen2019 5-1.png\" src=\"https:\/\/www.limswiki.org\/images\/5\/58\/Fig3_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 3.<\/b> (<b>a<\/b>) NSO Enterprise Architecture\u2014Initial State; (<b>b<\/b>) NSO (Newborn Screening Ontario) Enterprise Architecture\u2014Desired Final State<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The NSO SIMS components are shown within the large blue box including instruments\/interfaces, LISs, and related systems such as data warehouses. The ovals outside the large box show NSO\u2019s data consumers (entities), which includes key groups such as hospitals who submit samples (submitters) and treatment centers who retrieve screen-positive and parents. They also show data holding entities such as (HIEs) and data registries. Dataflows are shown via directional arrows.\n<\/p><p>In Figure 3a, the blue color represents the fact that all functions were handled by a single \u201ctightly coupled\u201d LIS. The new enterprise architecture is more \u201cservices\u201d focused with different systems playing different roles. The colors in Figure 3b call out what components are responsible for each function in the new NSO data architecture, where green is the main LIS (OMNILab), gray represents a hybrid of the main LIS and other in-house systems, and orange represents other off-the-shelf components.\n<\/p><p>Once the enterprise view of the desired enterprise architecture was developed, the team moved on to ensure that key functional considerations for the system were taken into account by developing a functional architecture. An analogy of the flow from enterprise architecture to a functional one is similar to the link between a building\u2019s architecture and building plans. The enterprise architecture provides a high-level, idealized view of what can be accomplished across the organization, whereas the functional views (with related user requirements) provide detail about what needs to be built. Figure 4 shows the key conceptual information flows that were developed by the team to guide the project. The modules and phases of implementation are called out on the right side of the diagram.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" class=\"image wiki-link\" data-key=\"80e50601d4c2beda725ab38175a164cc\"><img alt=\"Fig4 Pluscauskas IntJOfNeoScreen2019 5-1.png\" src=\"https:\/\/www.limswiki.org\/images\/1\/1c\/Fig4_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 4.<\/b> NSO Functional Architecture. The numbers (1\u20135) provide a reference point to follow the NSO dataflows from entry of the samples into the lab (1) through the sending of reports to submitters (5).<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Practice\">Practice<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Technical_considerations\">Technical considerations<\/span><\/h3>\n<p>Once the enterprise and functional architectures were solidified, NSO faced a number of choices. An early choice was whether to take a buy, build, or hybrid approach to development. When developing any large scale IT infrastructure project, especially one like NSO where there is not a \u201cone size fits all\u201d model available, a decision needs to be made to buy a commercial off the shelf (COTS) solution or to build the software code from scratch. Each of these approaches has pros and cons as discussed below.\n<\/p><p>COTS systems are supported by a vendor. This can be critically important, especially in areas like newborn screening where access to IT human resources can be severely limited. However, in general, COTS can be less flexible in their ability to meet less-common customer requirements. This can lead to the need to either adjust lab workflows to \u201cconform\u201d to the software or to create manual workarounds that are tracked outside of the system. In addition, requests for system enhancements and\/or system fixes generally wait in a queue which, depending on the complexity and urgency of the requirement, can create frustration and potential risks.\n<\/p><p>Customized software can allow labs to focus on organizational needs and unique user requirements. However, building and supporting software can be very expensive, and recruiting and managing qualified software human resources (e.g., software engineers, programmers, QA) is not an area of expertise that is associated with laboratory program management. Therefore, undertaking this type of approach exclusively is usually not feasible for most lab programs.\n<\/p><p>Given the complexity of the architecture involved, NSO chose a hybrid approach using COTS systems wherever possible and then adding in custom solutions to fill in any gaps. Once this choice was made, NSO needed to determine whether to use a single-source system or to integrate more than one system to achieve the end goal for its COTS needs. This is often referred to as the choice between integrated single-source (one vendor) versus best-of-breed<sup id=\"rdp-ebb-cite_ref-HermannBest10_10-0\" class=\"reference\"><a href=\"#cite_note-HermannBest10-10\">[10]<\/a><\/sup> solutions. Best-of-breed entails integrating two or more independent solutions that can be from multiple vendors and\/or can also be custom built. The main risk of best-of-breed usually revolves around the integration points. The risk of integration usually resides with the client and can lead to complexity and possibly even conflict if multiple vendors are involved. In addition, a best-of-breed solution will often require more human resources on the part of the client than working with one vendor.\n<\/p><p>In order to limit risk and complexity, NSO chose to pursue a best-of-breed strategy and limit the system to the smallest number of components to meet its needs. Ultimately, the decision of whether to pursue single-vendor versus best-of-breed needs to be based on organizational capacity, budget, risk profile, and the maturity of the software options available. That being said, as the needs of the newborn screening community become more complex, the viability of a \u201cone size fits all\u201d model of software delivery will become increasingly more difficult to maintain.\n<\/p><p>Another set of parameters that was considered was the configurability versus customization of the system. Configurability refers to the ability of trained, usually internal users to adjust key parameters of the software via a user interface. This is in contrast to \u201ccustomization,\u201d which involves having vendors or custom software developers directly adjust the code of their software to enable adjustment of these parameters. A system was chosen that enabled NSO to configure items such as analyte cut-off values, the order of items on a puncher list, the contents of key screen-positive results, reports\/letters, and patient follow-up workflows.\n<\/p><p>The more configurable a piece of software, the more individual newborn screening programs can adjust it to meet their specific needs. This can give programs greater control over timelines as they do not have to wait on vendors to customize their software to meet less common end-user needs. As the degree of internal configurability increases, it becomes incumbent on programs to ensure that they have the ability to program and test their configuration changes to ensure that any changes that are made are in fact having the desired effects and no undesirable effects. This approach necessitated a change in human resources at NSO with the creation of new roles for lab subject matter experts to assist the project and the reassignment of lab and follow up personnel to configure the software. Although these HR needs will be higher during the initial build out phase, it is expected that some of the roles will remain in place on a permanent basis once the initial build is over.\n<\/p><p>The need to balance the ability to test a large volume of time-sensitive samples versus enabling a patient-centric view of individual orders was a key driver in determining NSO\u2019s approach. At its core, this dilemma comes down to the ability for a lab system to apply a standard set of tests to a series of samples quickly (known as a batch or \"sample-centric\" model) versus the ability to track individuals through the system from accessioning through resulting (known as a \"patient-centric\" model). In a screening model where there are hundreds of samples a day, speed is important in order to receive timely results on key sets of time-sensitive tests. This need is best served by a sample-centric mode. However, most standard LIS systems are based on a patient-centric model where the details about the individual to be tested are entered into the system and tests and later results are assigned to the patient file.\n<\/p><p>The majority of requisitions received at NSO are hand-filled test requisitions with attached blood filter papers. In order to facilitate timely testing, the filter papers and requisitions are separated and identical pre-printed screening labels are attached to each. These requisitions are sent to data entry clerks to enter the data over the course of the day while the blood spots are immediately punched and tested in the lab. The information is linked in the LIS by the common screening accession number. The issue that can arise is that in \u201csample-centric mode,\u201d each sample is usually \u201cpreassigned\u201d a specific set of tests in order to ensure that they can flow quickly through the lab processes that are needed to produce the standard newborn screening results. The sample-centric system can become particularly unwieldy when there is a need to order different tests on a sample at accessioning (as is the case with diagnostic, monitoring, or partial sample panels). To address this, a hybrid set of processes enabling both high-volume sample-centric workflows for standard newborn screening samples while also allowing for patient-centric test ordering to be created.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"User_considerations\">User considerations<\/span><\/h3>\n<p>Once architecture and technical considerations had been developed, user needs and requirements were gathered and tracked. A program-wide exercise was carried out to determine desired functionality at a high level, as well as improvements that were necessary to support specific work flows. This \u201cwish list\u201d was documented in order to facilitate the evaluation of overall project success.\n<\/p><p>Wish list items fell into a number of broad categories, including the need for better program data management and work-flows; functionality to support\/automate processes not currently supported in the LIS that could affect efficiency and safety; the ability to efficiently implement new lab paradigms such as molecular DNA screening and diagnostics; a desire to integrate both diagnostic (patient-centric) and screening (batch-centric) paradigms; the ability to integrate best-of-breed equipment; case management and follow-up procedure flexibility; and open access to SIMS data for program and laboratory analytics.\n<\/p><p>Consolidated lists were tracked as part of the project to ensure that the system matched user expectations. Figure S1 User Needs Tracking (see \"Supplementary materials\" section) shows a graphical representation of these consolidated user needs that are categorized by the functional area.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Procurement_considerations\">Procurement considerations<\/span><\/h3>\n<p>The user needs and functionality statements were used to develop a set of formal business and functional requirements that were utilized in the tendering process. Requirements management software (Enterprise Architect from Sparx Systems) was used to ensure requirement traceability back to the key architecture and functional areas described above. Each business requirement was coupled with a number of related functional requirements. These business requirement (BR) and functional requirements (FR) were used to create an in-depth request for proposal (RFP) for a SIMS. Table 1 shows an example of a BR and related FRs describing part of the clinical\/medical review module that was described earlier in this paper.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab1_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" class=\"image wiki-link\" data-key=\"03223e1293b98bd1c71de35bae940cfe\"><img alt=\"Tab1 Pluscauskas IntJOfNeoScreen2019 5-1.png\" src=\"https:\/\/www.limswiki.org\/images\/a\/ad\/Tab1_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 1.<\/b> Business and Functional Requirements for tendering<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The evaluation of the RFP was divided into two phases. The first phase consisted of a written response from qualified vendors that consisted of two sets of criteria. The first criteria was a pass\/fail gate that was applied to all mandatory requirements. All vendors that were able to meet these criteria were then scored for non-mandatory requirements. Non-mandatory (desirable) and results were consolidated and averaged. Scoring was weighted based upon an agreed upon relative importance of the requirements. Figure S2 Example of Phase 1 Scoring (see \"Supplementary materials\" section) shows a generic example of phase one scoring that was applied for all vendors that met the pass\/fail criteria.\n<\/p><p>The top vendors were then invited to participate for the second phase of the RFP process. In the second phase of the procurement, the vendors were brought in for in-depth meetings to describe their proposals and were then asked to provide demonstrations of how various aspects of their systems met NSO\u2019s needs.\n<\/p><p>Once phase two was complete, NSO used a technique known as decision matrix analysis<sup id=\"rdp-ebb-cite_ref-MulderDecision18_11-0\" class=\"reference\"><a href=\"#cite_note-MulderDecision18-11\">[11]<\/a><\/sup> to determine the best option for moving forward. A decision matrix is a table (usually set up in a spreadsheet) where the options you wish to analyze are divided into rows and the factors that influence this decision are divided into columns. The relative importance of these decisions was assigned a weight, usually determined by surveying the key decision makers (e.g., 1 is a low impact and 10 is a high impact). The individual scores for each area were calculated by multiplying the raw score by the assigned impact factor, and then the scores were added together to get the final score for each option.\n<\/p><p>Eight key factors were identified as critical to the overall decision, including the internal and external costs (including personnel and consultants), software and hardware costs, and project costs associated with the build for this option (higher score means lower costs); vendor support capabilities; the ability to deliver a product in a timely fashion; the ability to allow NSO to innovate; technical integration risks, including the technical underpinnings of the software and how easy or hard it will be to integrate with other systems (higher score means lower risk); and secondary benefits (which includes the vendor fixing our current issues without creating new ones). Figure 5 shows an anonymized mock-up of the decision matrix tool that was used to make the final vendor choice.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig5_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" class=\"image wiki-link\" data-key=\"d4f797cd90867cdcb823b6c12878fd4a\"><img alt=\"Fig5 Pluscauskas IntJOfNeoScreen2019 5-1.png\" src=\"https:\/\/www.limswiki.org\/images\/9\/9a\/Fig5_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 5.<\/b> NSO\u2019s decision matrix analysis tool that was used to make final vendor choices<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>A COTS solution, OMNI-Lab NBS from Integrated Software Solutions (ISS) Pty. Ltd. was chosen to fulfill a number of key pieces of the architecture. Following the award of the procurement, ISS and NSO worked on the development of a mutually acceptable contract. Although many non-procurement personnel usually view contract negations as primarily about money, there are in fact a number of other key factors that need to be resolved in addition to the initial and ongoing costs of the software, including the type of contract payment mechanisms (such as time and materials versus deliverable-based contracts). In the case of NSO, a deliverable-based contract was established where contract payments were linked to key milestones. This contractual structure has enabled NSO and the vendor to take a team approach during the design and delivery of the software. Although the upfront effort in defining milestones and working towards mutually agreeable sign offs of project phases can be relatively high, the long-term benefits of ensuring that both sides are on the same page during the implementation phase has made the effort worthwhile.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Implementation_considerations\">Implementation considerations<\/span><\/h3>\n<p>A decision as to how to move forward with system implementation created a challenge, as NSO currently has a newborn screening <a href=\"https:\/\/www.limswiki.org\/index.php\/Laboratory_information_management_system\" title=\"Laboratory information management system\" class=\"wiki-link\" data-key=\"8ff56a51d34c9b1806fcebdcde634d00\">laboratory information management system<\/a> (LIMS) in place. A staged\/agile approach that divided the project into smaller manageable pieces of work, known as sprints, was put in place to mitigate this challenge. The approach enabled the team to focus on key areas of implementation while managing risk. Higher priority areas with lower change management needs were staged earlier than those requiring more change management or where there was currently a system in place to handle day to day operations.\n<\/p><p>Applying this approach, NSO launched its CCHD screening implementation using the SIMS solution in mid-2017, and blood spot CMV DNA testing of babies not passing their hearing screen was moved into the new system in mid-2018. DBS Screening, which requires the most change management, is currently being deployed. To maximize efficiency, sprints also overlap so that development in one area is started while others are still in progress. This approach has a number of benefits, including enabling less risky product acceptance and module configuration across testing and production environments which, in turn, reduces the risk for the more complex deployments.\n<\/p><p>Several processes to ensure overall build quality have been initiated. The system is deployed across three environments (development, test, production) in order to ensure that all new features can be scoped and configured in the development environment before being tested and finally put into production. This enabled NSO to ensure that new features and configurations were not put into the working environment until they were fully tested and, therefore, do not cause any issues with the production system. Table 2 describes how each an environment is utilized in order to ensure full production quality for software, data, and configurations.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab2_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" class=\"image wiki-link\" data-key=\"8439f366701a12e7f8323c4ff9547bff\"><img alt=\"Tab2 Pluscauskas IntJOfNeoScreen2019 5-1.png\" src=\"https:\/\/www.limswiki.org\/images\/0\/06\/Tab2_Pluscauskas_IntJOfNeoScreen2019_5-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 2.<\/b> NSO environment strategy<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>A number of internal subject matter experts were consulted to develop and deploy test cases for all new system areas in order to ensure that the implications of these changes are fully understood and tested before deployment. All software bugs and requested features are tracked using issue tracking software (JIRA from Atlassian) and, where possible, system testing is automated using a standard testing tool (Ranorex) in order to be able to easily repeat key end-to-end system tests as new features are deployed. The application of these processes, in addition to the staged\/agile approach to deployment and implementation, is helping to mitigate project deployment risks, especially as the project moves into later and more complex stages.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Organizational_and_jurisdictional_considerations\">Organizational and jurisdictional considerations<\/span><\/h3>\n<p>Newborn screening is delivered via a variety of organizational structures around the world, and therefore each program\u2019s overall needs will be different. In spite of this, there are still a number of key areas where specific organizational strategies can be used to ensure the successful delivery of an IS implementation. One critical factor for success is the overall vision and support of senior management. Senior management vision and support has been critical to the overall success of the project to date. Key leadership personnel from NSO, as well as CHEO executive and IS, sit on the project steering committee, which provides guidance and oversight for the project. The project is also represented at the NSO leadership committee in order to ensure that the project aligns with overall NSO strategic direction and goals. Regular progress reports are provided to external stakeholders via the NSO Advisory Council.<sup id=\"rdp-ebb-cite_ref-NSOAdvisory_12-0\" class=\"reference\"><a href=\"#cite_note-NSOAdvisory-12\">[12]<\/a><\/sup>\n<\/p><p>Ensuring that costs and budgets align with overall program priorities is also key. Many newborn screening programs exist within budget constrained environments where day-to-day operational priorities take precedent over longer term endeavors such as updating IT systems. Beyond ongoing program support for the IS rebuild, NSO has partially mitigated this issue by treating its IT system as a depreciable asset like other capital equipment and infrastructure and including IT system replacement costs in the overall capital budgeting process. There can also be other novel approaches to working through resourcing issues; for example, other newborn screening programs have begun to pool resources across jurisdictions to share infrastructure in order to keep build and operating costs down.\n<\/p><p>Another critical area that is often overlooked is the need for strong support from related information technology and communications (ICT) groups that support the newborn screening organization. Often the modern demands of IT can sometimes outstrip the capabilities of some of the older IT systems within a public health setting. In addition, IT and network infrastructures are often procured and managed via different entities or departments than those that are responsible for operating the newborn screening lab and\/or program. The emergence of cloud-based technologies will likely help to mitigate some of these pressures in the future as the reliance on related third parties to manage the \u201cnuts and bolts\u201d of infrastructure will likely decrease.\n<\/p><p>Beyond infrastructure support, jurisdictional considerations can also have a strong influence on a program\u2019s choices of SIMS options. Some jurisdictions are procuring region-, state- or province-wide integrated electronic records solutions, such as those from EPIC Systems Corporation (Verona, WI, USA)<sup id=\"rdp-ebb-cite_ref-EpicHome_13-0\" class=\"reference\"><a href=\"#cite_note-EpicHome-13\">[13]<\/a><\/sup> and <a href=\"https:\/\/www.limswiki.org\/index.php\/Cerner_Corporation\" title=\"Cerner Corporation\" class=\"wiki-link\" data-key=\"44b952d5fb439af88c84f5ad453fee3f\">Cerner Corporation<\/a> (North Kansas, MO, USA).<sup id=\"rdp-ebb-cite_ref-CernerHome_14-0\" class=\"reference\"><a href=\"#cite_note-CernerHome-14\">[14]<\/a><\/sup> Newborn screening labs that fall within these jurisdictions are looking to understand whether they can utilize these systems for their SIMS needs, and in a few early cases they have already done so. In addition, some jurisdictions are outsourcing entire newborn screening lab functions, such as MSMS and related biochemistry, along with the related IT requirements. Although these cross-jurisdictional integrated systems can have a definite impact on a newborn screening lab\u2019s SIMS choices, they need not necessarily be a constraint to meeting a lab\u2019s overall needs.\n<\/p><p>The ability to understand its key architectural and functional requirements can greatly assist in enabling a lab to perform newborn screening functions within an integrated electronic record solution or outsourced service environment. These shared infrastructure solutions may also ease some of the infrastructure pressures and allow labs to focus on their core needs while sharing some of the IS infrastructure costs across multiple functional areas throughout their institutions.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Project_benefits_and_impacts\">Project benefits and impacts<\/span><\/h3>\n<p>A critical part of any long-term project (or group of projects) is to identify and measure their benefits in order to assess ongoing value. In formal project management terminology, this is often called \u201cbenefits realization.\u201d Benefits realization is broadly defined as \u201cthe process of organizing and managing, such that the potential benefits arising from the use of IT are actually realized .\u201d<sup id=\"rdp-ebb-cite_ref-WardANew99_15-0\" class=\"reference\"><a href=\"#cite_note-WardANew99-15\">[15]<\/a><\/sup> In more practical terms, this can be considered a synthesis of user needs identification, as discussed earlier in the paper, combined with broader program goals. The project\u2019s key success metrics were determined by the project leadership group at project initiation and are being tracked throughout the project lifecycle. Figure S3 Benefits Realization Tracking (see \"Supplementary materials\" section) shows the benefit realization for key-in lab benefits tracking across two different dimensions, namely the impact of the benefit (direction of the arrow) and program area(s) impacted (color of the arrow). In addition, once the SIMS is implemented, NSO will track the impact of the changes on critical clinical program measures such as positive predictive values (PPV) and false positive rates (FPRs). Analysis will be undertaken utilizing both the relevant LIS systems and the NSO Data Warehouse. These clinical metrics are reported as part of NSO\u2019s clinical communication to its treatment groups and, more broadly, as part of the NSO annual report. It is expected that given that NSO will be able to achieve more granular control over many of its processes, this will lead to better PPVs and decreased FPRs. Post-launch NSO will continue to monitor these measures in order to ensure continuous quality improvement for its clinical metrics.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusions\">Conclusions<\/span><\/h2>\n<p>A number of critical lessons in regard to implementing a flexible SIMS architecture were learned through the stages that were described in this paper. One of the key lessons learned was that developing, implementing, and deploying a SIMS is about much more than the technology. The SIMS has become the central analytical, process management, and communication hub for many newborn screening programs. In order to realize the full benefit of establishing an \u201cideal\u201d SIMS, it is necessary to engage your full team to ensure that the promise of the new technologies can be achieved. At its most basic level, this requires strong program leadership, vision, and strategy. Although the journey can be quite challenging, the benefits of taking a holistic approach to SIMS development in terms of the ability to be flexible and innovate far outweighs the efforts and costs in the long run.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Supplementary_materials\">Supplementary materials<\/span><\/h2>\n<p>Supplementary materials can be found at <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.mdpi.com\/2409-515X\/5\/1\/9\/s1\" data-key=\"3eccc1b08ddae4656de1606717565741\">https:\/\/www.mdpi.com\/2409-515X\/5\/1\/9\/s1<\/a>. (.zip file)\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>The authors would like to acknowledge a number of groups and individuals that without their diligent and hard work, this project and paper would not be possible. The entire Newborn Screening Ontario team spent many hours providing invaluable input into defining their needs and providing a vision of what an \u201cideal\u201d newborn screening information system would look like. Steve Conrad led a team of dedicated NSO staff and students to shape these \u201cideals\u201d into formal business and technical requirements. These were used to craft the enterprise and functional architectures that are presented in the paper and were used by the team to create the request for information and proposal (RFI\/RFP) documents that formed the basis for the procurement of the new off the shelf systems. The Project Lancet (the internal NSO name for the rebuild project) team has spent many hours bringing the vision and requirements to reality. The core Lancet implementation team (in alphabetical order: Marlene Elliott, Mike Kowalski, Shannon McClelland, Nate McIntosh, Chloe O\u2019Sullivan, Megan Sayer); the subject matter experts (Sarah Foster, Janet Marcadier, Larry Fisher, Alison Evans) and the project management group (Marina Hebert and Jim Bottomley) have continued to impress with their ability to process complex issues and move them forward through the product and project lifecycle continuum. NSO operations staff continue to dedicate many hours of their time to continuing to ensure both project and product success, especially David Lawrie and Christine McRoberts who are ensuring that both the operational implementation and quality continue to meet NSO\u2019s high standards. Medical\/scientific input at Project Lancet\u2019s demo sessions (a.k.a. \u201ctracers\u201d) from Dennis Bulman, Kristin Kernohan, Nathalie Lepage, and others has been critical, ensuring that the scientific validity of the medical decisions made utilizing the SIMS will be appropriate. Leadership support and vision is critical to ensure the success of any project of this magnitude; as such, the authors would like to thank Mari Teitelbaum for her diligent efforts to provide a bridge between NSO and CHEO provincial programs as chair of the Project Lancet steering committee and providing both timely and sage advice to ensure that the project stays on track. Finally, the authors would also like to thank everyone who helped provide assistance and input in composing the figures and tables for the paper, in particular Lauren Gallagher (Figure 1), Steve Conrad (Figure 3a,b and Figure 4), Shannon McClelland (Figure S1), and Marlene Elliott (Table 2).\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Author_contributions\">Author contributions<\/span><\/h3>\n<p>M.P. and M.H. wrote and revised the original manuscript and figures (with the input of those discussed below). J.M. and P.C. provided input into the overall manuscript content and manuscript revisions. P.C. was the primary mentor on the paper and provided oversight for content and details. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h3>\n<p>This research received no external funding.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Conflicts_of_interest\">Conflicts of interest<\/span><\/h3>\n<p>The authors declare no conflict of interest.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-MayoAboutCLIR-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MayoAboutCLIR_1-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/clir.mayo.edu\/Home\/About\" data-key=\"656759c2663998f60dc5c9c94b2811e5\">\"About CLIR\"<\/a>. Mayo Foundation for Medical Education and Research<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/clir.mayo.edu\/Home\/About\" data-key=\"656759c2663998f60dc5c9c94b2811e5\">https:\/\/clir.mayo.edu\/Home\/About<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 26 November 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=About+CLIR&rft.atitle=&rft.pub=Mayo+Foundation+for+Medical+Education+and+Research&rft_id=https%3A%2F%2Fclir.mayo.edu%2FHome%2FAbout&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PantanowitzMedical07-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PantanowitzMedical07_2-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pantanowitz, L.; Henricks, W.H.; Beckwith, B.A. (2007). \"Medical laboratory informatics\". <i>Clinics in Laboratory Medicine<\/i> <b>27<\/b> (4): 823\u201343. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.cll.2007.07.011\" data-key=\"1dd25a4920e417c8f65d105cb2603314\">10.1016\/j.cll.2007.07.011<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17950900\" data-key=\"2b86d31263a9441bad2d57da24ee2f54\">17950900<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Medical+laboratory+informatics&rft.jtitle=Clinics+in+Laboratory+Medicine&rft.aulast=Pantanowitz%2C+L.%3B+Henricks%2C+W.H.%3B+Beckwith%2C+B.A.&rft.au=Pantanowitz%2C+L.%3B+Henricks%2C+W.H.%3B+Beckwith%2C+B.A.&rft.date=2007&rft.volume=27&rft.issue=4&rft.pages=823%E2%80%9343&rft_id=info:doi\/10.1016%2Fj.cll.2007.07.011&rft_id=info:pmid\/17950900&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SepulvedaTheIdeal13-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SepulvedaTheIdeal13_3-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Sepulveda, J.L;. Young, D.S. (2013). \"The ideal laboratory information system\". pp. 1129\u201340. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.5858%2Farpa.2012-0362-RA\" data-key=\"e589c006f48a01227d3bdcf5876acd4c\">10.5858\/arpa.2012-0362-RA<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23216205\" data-key=\"b8dd92deb7f4826d914e471d8b9a29f5\">23216205<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+ideal+laboratory+information+system&rft.atitle=&rft.aulast=Sepulveda%2C+J.L%3B.+Young%2C+D.S.&rft.au=Sepulveda%2C+J.L%3B.+Young%2C+D.S.&rft.date=2013&rft.pages=pp.+1129%E2%80%9340&rft_id=info:doi\/10.5858%2Farpa.2012-0362-RA&rft_id=info:pmid\/23216205&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NSO_CCHD-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NSO_CCHD_4-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.newbornscreening.on.ca\/en\/health-care-providers\/submitters\/cchd-screening-implementation\" data-key=\"a301dc95caa37094543413317914a906\">\"CCHD Screening\"<\/a>. Newborn Screening Ontario<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.newbornscreening.on.ca\/en\/health-care-providers\/submitters\/cchd-screening-implementation\" data-key=\"a301dc95caa37094543413317914a906\">https:\/\/www.newbornscreening.on.ca\/en\/health-care-providers\/submitters\/cchd-screening-implementation<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 26 November 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=CCHD+Screening&rft.atitle=&rft.pub=Newborn+Screening+Ontario&rft_id=https%3A%2F%2Fwww.newbornscreening.on.ca%2Fen%2Fhealth-care-providers%2Fsubmitters%2Fcchd-screening-implementation&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NSO_EHS-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NSO_EHS_5-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.newbornscreening.on.ca\/en\/page\/overview\" data-key=\"1d2ea066edc39e9a1747f03c9f31e870\">\"Expanded Hearing Screenings - Overview\"<\/a>. Newborn Screening Ontario<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.newbornscreening.on.ca\/en\/page\/overview\" data-key=\"1d2ea066edc39e9a1747f03c9f31e870\">https:\/\/www.newbornscreening.on.ca\/en\/page\/overview<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 26 November 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Expanded+Hearing+Screenings+-+Overview&rft.atitle=&rft.pub=Newborn+Screening+Ontario&rft_id=https%3A%2F%2Fwww.newbornscreening.on.ca%2Fen%2Fpage%2Foverview&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HRSARecommended18-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HRSARecommended18_6-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.hrsa.gov\/advisory-committees\/heritable-disorders\/rusp\/index.html\" data-key=\"1132c83301fece3ae2f946d317b803fd\">\"Recommended Uniform Screening Panel\"<\/a>. Health Resources & Services Administration. July 2018<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.hrsa.gov\/advisory-committees\/heritable-disorders\/rusp\/index.html\" data-key=\"1132c83301fece3ae2f946d317b803fd\">https:\/\/www.hrsa.gov\/advisory-committees\/heritable-disorders\/rusp\/index.html<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 26 November 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Recommended+Uniform+Screening+Panel&rft.atitle=&rft.date=July+2018&rft.pub=Health+Resources+%26+Services+Administration&rft_id=https%3A%2F%2Fwww.hrsa.gov%2Fadvisory-committees%2Fheritable-disorders%2Frusp%2Findex.html&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SAC2011Annual11-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SAC2011Annual11_7-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Secretary\u2019s Advisory Committee on Heritable Disorders in Newborns and Children (2011). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.hrsa.gov\/sites\/default\/files\/hrsa\/advisory-committees\/heritable-disorders\/reports-recommendations\/reports\/2011-annual-report.pdf\" data-key=\"acddd25a5d90d74269cb5377bc1aaa1d\">\"2011 Annual Report to Congress\"<\/a> (PDF). Health Resources & Services Administration<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.hrsa.gov\/sites\/default\/files\/hrsa\/advisory-committees\/heritable-disorders\/reports-recommendations\/reports\/2011-annual-report.pdf\" data-key=\"acddd25a5d90d74269cb5377bc1aaa1d\">https:\/\/www.hrsa.gov\/sites\/default\/files\/hrsa\/advisory-committees\/heritable-disorders\/reports-recommendations\/reports\/2011-annual-report.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 20 February 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=2011+Annual+Report+to+Congress&rft.atitle=&rft.aulast=Secretary%E2%80%99s+Advisory+Committee+on+Heritable+Disorders+in+Newborns+and+Children&rft.au=Secretary%E2%80%99s+Advisory+Committee+on+Heritable+Disorders+in+Newborns+and+Children&rft.date=2011&rft.pub=Health+Resources+%26+Services+Administration&rft_id=https%3A%2F%2Fwww.hrsa.gov%2Fsites%2Fdefault%2Ffiles%2Fhrsa%2Fadvisory-committees%2Fheritable-disorders%2Freports-recommendations%2Freports%2F2011-annual-report.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-McCuddenLab16-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-McCuddenLab16_8-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">McCudden, C.R.; Henderson, M.P.A. (2016). \"Laboratory Information Systems\". In Clarke, W.. <i>Contemporary Practice in Clinical Chemistry<\/i> (3rd ed.). pp. 263\u201376. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9781594251894.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Laboratory+Information+Systems&rft.atitle=Contemporary+Practice+in+Clinical+Chemistry&rft.aulast=McCudden%2C+C.R.%3B+Henderson%2C+M.P.A.&rft.au=McCudden%2C+C.R.%3B+Henderson%2C+M.P.A.&rft.date=2016&rft.pages=pp.%26nbsp%3B263%E2%80%9376&rft.edition=3rd&rft.isbn=9781594251894&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HuhnsService05-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HuhnsService05_9-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Huhns, M.S.; Singh, M.P. (2005). \"Service-oriented computing: key concepts and principles\". <i>IEEE Internet Computing<\/i> <b>9<\/b> (1): 75\u201381. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FMIC.2005.21\" data-key=\"d064b9a83127a7694f16049ad37803ec\">10.1109\/MIC.2005.21<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Service-oriented+computing%3A+key+concepts+and+principles&rft.jtitle=IEEE+Internet+Computing&rft.aulast=Huhns%2C+M.S.%3B+Singh%2C+M.P.&rft.au=Huhns%2C+M.S.%3B+Singh%2C+M.P.&rft.date=2005&rft.volume=9&rft.issue=1&rft.pages=75%E2%80%9381&rft_id=info:doi\/10.1109%2FMIC.2005.21&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HermannBest10-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HermannBest10_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hermann, S.A. (2010). \"Best-of-breed verses integrated systems\". <i>American Journal of Health-system Pharmacy<\/i> <b>67<\/b> (17): 1406, 1408, 1410. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.2146%2Fajhp100061\" data-key=\"cd62af994c1e573ef38bea47ae4c7614\">10.2146\/ajhp100061<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20720237\" data-key=\"0c4c703b2d25d478a7a230b0a751e8bf\">20720237<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Best-of-breed+verses+integrated+systems&rft.jtitle=American+Journal+of+Health-system+Pharmacy&rft.aulast=Hermann%2C+S.A.&rft.au=Hermann%2C+S.A.&rft.date=2010&rft.volume=67&rft.issue=17&rft.pages=1406%2C+1408%2C+1410&rft_id=info:doi\/10.2146%2Fajhp100061&rft_id=info:pmid\/20720237&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MulderDecision18-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MulderDecision18_11-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Milder, P. (2018). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.toolshero.com\/decision-making\/decision-matrix-analysis\/\" data-key=\"e0df1f510c0feff86471acd1b437fd23\">\"Decision Matrix Analysis\"<\/a>. <i>ToolsHero<\/i><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.toolshero.com\/decision-making\/decision-matrix-analysis\/\" data-key=\"e0df1f510c0feff86471acd1b437fd23\">https:\/\/www.toolshero.com\/decision-making\/decision-matrix-analysis\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 27 November 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Decision+Matrix+Analysis&rft.atitle=ToolsHero&rft.aulast=Milder%2C+P.&rft.au=Milder%2C+P.&rft.date=2018&rft_id=https%3A%2F%2Fwww.toolshero.com%2Fdecision-making%2Fdecision-matrix-analysis%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NSOAdvisory-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NSOAdvisory_12-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.newbornscreening.on.ca\/en\/advisory-council\" data-key=\"3f3a8381bb8c3f2b8c55db9c6052c79d\">\"Advisory Council\"<\/a>. Newborn Screening Ontario<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.newbornscreening.on.ca\/en\/advisory-council\" data-key=\"3f3a8381bb8c3f2b8c55db9c6052c79d\">https:\/\/www.newbornscreening.on.ca\/en\/advisory-council<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 26 November 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Advisory+Council&rft.atitle=&rft.pub=Newborn+Screening+Ontario&rft_id=https%3A%2F%2Fwww.newbornscreening.on.ca%2Fen%2Fadvisory-council&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EpicHome-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-EpicHome_13-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.epic.com\/\" data-key=\"fd096fb57422272d7079f55a394a99d0\">\"Epic\"<\/a>. Epic Systems Corporation<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.epic.com\/\" data-key=\"fd096fb57422272d7079f55a394a99d0\">https:\/\/www.epic.com\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 26 November 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Epic&rft.atitle=&rft.pub=Epic+Systems+Corporation&rft_id=https%3A%2F%2Fwww.epic.com%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CernerHome-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CernerHome_14-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.cerner.com\/\" data-key=\"cf89cab81bf835368eb5ad881ec611ec\">\"Cerner\"<\/a>. Cerner Corporation<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.cerner.com\/\" data-key=\"cf89cab81bf835368eb5ad881ec611ec\">https:\/\/www.cerner.com\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 26 November 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Cerner&rft.atitle=&rft.pub=Cerner+Corporation&rft_id=https%3A%2F%2Fwww.cerner.com%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WardANew99-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WardANew99_15-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Ward, J.; Elvin, R. (1999). \"A new framework for managing IT\u2010enabled business change\". <i>Information Systems Journal<\/i> <b>9<\/b> (3): 197\u2013221. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1046%2Fj.1365-2575.1999.00059.x\" data-key=\"34b951e787dab638daceaf6457ee1421\">10.1046\/j.1365-2575.1999.00059.x<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+new+framework+for+managing+IT%E2%80%90enabled+business+change&rft.jtitle=Information+Systems+Journal&rft.aulast=Ward%2C+J.%3B+Elvin%2C+R.&rft.au=Ward%2C+J.%3B+Elvin%2C+R.&rft.date=1999&rft.volume=9&rft.issue=3&rft.pages=197%E2%80%93221&rft_id=info:doi\/10.1046%2Fj.1365-2575.1999.00059.x&rfr_id=info:sid\/en.wikipedia.org:Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185649\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.394 seconds\nReal time usage: 0.439 seconds\nPreprocessor visited node count: 11576\/1000000\nPreprocessor generated node count: 33612\/1000000\nPost\u2010expand include size: 69364\/2097152 bytes\nTemplate argument size: 21642\/2097152 bytes\nHighest expansion depth: 15\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 376.433 1 - -total\n 78.40% 295.107 1 - Template:Reflist\n 65.45% 246.364 15 - Template:Citation\/core\n 43.30% 162.992 10 - Template:Cite_web\n 22.76% 85.672 4 - Template:Cite_journal\n 16.57% 62.386 1 - Template:Infobox_journal_article\n 15.94% 59.986 1 - Template:Infobox\n 9.49% 35.724 80 - Template:Infobox\/row\n 6.03% 22.714 1 - Template:Cite_book\n 4.82% 18.151 9 - Template:Citation\/identifier\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10866-0!*!0!!en!5!* and timestamp 20190401185649 and revision id 34868\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice\">https:\/\/www.limswiki.org\/index.php\/Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","ab125d6daef2f763e588fcd5432c1b66_images":["https:\/\/www.limswiki.org\/images\/2\/2a\/Fig1_Pluscauskas_IntJOfNeoScreen2019_5-1.png","https:\/\/www.limswiki.org\/images\/c\/cd\/Fig2_Pluscauskas_IntJOfNeoScreen2019_5-1.png","https:\/\/www.limswiki.org\/images\/5\/58\/Fig3_Pluscauskas_IntJOfNeoScreen2019_5-1.png","https:\/\/www.limswiki.org\/images\/1\/1c\/Fig4_Pluscauskas_IntJOfNeoScreen2019_5-1.png","https:\/\/www.limswiki.org\/images\/a\/ad\/Tab1_Pluscauskas_IntJOfNeoScreen2019_5-1.png","https:\/\/www.limswiki.org\/images\/9\/9a\/Fig5_Pluscauskas_IntJOfNeoScreen2019_5-1.png","https:\/\/www.limswiki.org\/images\/0\/06\/Tab2_Pluscauskas_IntJOfNeoScreen2019_5-1.png"],"ab125d6daef2f763e588fcd5432c1b66_timestamp":1554145009,"8d21eded7dba3fec86203cded8451b7e_type":"article","8d21eded7dba3fec86203cded8451b7e_title":"Data to diagnosis in global health: A 3P approach (Pathinarupothi et al. 2018)","8d21eded7dba3fec86203cded8451b7e_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Data_to_diagnosis_in_global_health:_A_3P_approach","8d21eded7dba3fec86203cded8451b7e_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Data to diagnosis in global health: A 3P approach\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nData to diagnosis in global health: A 3P approachJournal\n \nBMC Medical Informatics and Decision MakingAuthor(s)\n \nPathinarupothi, Rahul Krishnan; Durga, P.; Rangan, Ekanath SrihariAuthor affiliation(s)\n \nAmrita School of Engineering, Amrita Institute of Medical SciencePrimary contact\n \nEmail: rahulkrishnan @ am dot amrita dot eduYear published\n \n2018Volume and issue\n \n18Page(s)\n \n78DOI\n \n10.1186\/s12911-018-0658-yISSN\n \n1472-6947Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttps:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-018-0658-yDownload\n \nhttps:\/\/bmcmedinformdecismak.biomedcentral.com\/track\/pdf\/10.1186\/s12911-018-0658-y (PDF)\n\n\n\n\n \n This article contains rendered mathematical formulae. You may require the TeX All the Things plugin for Chrome or the Native MathML add-on and fonts for Firefox if they don't render properly for you. \n\n\nContents\n\n1 Abstract \n2 Background \n\n2.1 Approach \n2.2 Related work \n\n\n3 Methods \n4 Personalization PAF \n\n4.1 Adaptive qauntization \n4.2 Personalization \n\n\n5 Precision PAF \n6 Prevention PAF \n7 Clinical relevance and validation \n\n7.1 Cardiology \n7.2 Pulmonology \n7.3 Neurology \n\n\n8 Results and discussion \n\n8.1 Dataset \n8.2 Evaluating precision hypothesis \n\n8.2.1 Significant results \n\n\n8.3 Evaluating prevention hypothesis \n\n8.3.1 Significant results \n\n\n8.4 Evaluating personalization hypothesis \n8.5 Global health deployment \n\n8.5.1 Challenges and drawbacks \n\n\n\n\n9 Conclusion \n10 Abbreviations \n11 Additional files \n12 Declarations \n\n12.1 Acknowledgements \n12.2 Funding \n12.3 Availability of data and materials \n12.4 Authors\u2019 contributions \n12.5 Ethics approval and consent to participate \n12.6 Competing interests \n\n\n13 References \n14 Notes \n\n\n\nAbstract \nBackground: With connected medical devices fast becoming ubiquitous in healthcare monitoring, there is a deluge of data coming from multiple body-attached sensors. Transforming this flood of data into effective and efficient diagnosis is a major challenge.\nMethods: To address this challenge, we present a \"3P\" approach: personalized patient monitoring, precision diagnostics, and preventive criticality alerts. In a collaborative work with doctors, we present the design, development, and testing of a healthcare data analytics and communication framework that we call RASPRO (Rapid Active Summarization for effective PROgnosis). The heart of RASPRO is \"physician assist filters\" (PAF) that 1. transform unwieldy multi-sensor time series data into summarized patient\/disease-specific trends in steps of progressive precision as demanded by the doctor for a patient\u2019s personalized condition, and 2. help in identifying and subsequently predictively alerting the onset of critical conditions. The output of PAFs is a clinically useful, yet extremely succinct summary of a patient\u2019s medical condition, represented as a motif, which could be sent to remote doctors even over SMS, reducing the need for data bandwidths. We evaluate the clinical validity of these techniques using support-vector machine (SVM) learning models measuring both the predictive power and its ability to classify disease condition. We used more than 16,000 minutes of patient data (N=70) from the openly available MIMIC II database for conducting these experiments. Furthermore, we also report the clinical utility of the system through doctor feedback from a large super-speciality hospital in India.\nResults: The results show that the RASPRO motifs perform as well as (and in many cases better than) raw time series data. In addition, we also see improvement in diagnostic performance using optimized sensor severity threshold ranges set using the personalization PAF severity quantizer.\nConclusion: The RASPRO-PAF system and the associated techniques are found to be useful in many healthcare applications, especially in remote patient monitoring. The personalization, precision, and prevention PAFs presented in the paper successfully shows remarkable performance in satisfying the goals of the 3Ps, thereby providing the advantages of \"3As\": availability, affordability, and accessibility in the global health scenario.\nKeywords: precision medicine, medical informatics, personalized healthcare, motif summarization\n\nBackground \nPrecision medicine and personalized healthcare are quickly gaining wide research interest as well as initial acceptance among the medical community. This is facilitated by the availability of ubiquitous data sources such as wearable sensors, smartphones, and internet of things (IoT) devices, along with machine learning and large-scale data analytics tools, resulting in promising outcomes in some of the niche medical domains. Our research particularly focuses on introducing the three Ps: precision, personalization, and preventive diagnosis in remote healthcare monitoring of patients, especially in a global health scenario. In our system, patients in remote areas use wearable devices to capture their vital parameters such as blood pressure (BP), blood glucose, oxygen saturation (SpO2), electro cardiographs (ECG) etc., and transmit them to doctors in tertiary care hospitals, who in turn are expected to suggest suitably needed timely interventions. While deploying our system in the highly populous region of southern India, we found that although this promises to provide hitherto unavailable healthcare services to a critically ill and aging population, particularly in the developing world, there are significant roadblocks in our expectation that doctors embrace this new paradigm in handling patients. The doctors, who are already overloaded, feel even more overwhelmed by the voluminous data flooding in from remote patients\u2019 sensors. Furthermore, interpreting such incoming multi-parameter data simultaneously from a multitude of remote patients is time-consuming and soon transforms into an unmanageable deluge.\n\nApproach \nIn this paper, we propose novel approaches to transform data into diagnosis. As a collaborative work between our researchers and clinicians in one of the largest super-specialty hospitals in India (Amrita Institute of Medical Sciences - AIMS), we developed physician assist filters (PAFs) that are designed to transform unwieldy time series sensor data into summarized patient\/disease-specific trends in steps of progressive precision as demanded by the doctor for patient\u2019s personalized condition at hand, and help in identifying and subsequently predictively alerting the onset of critical conditions. Together with the communication network and data transmission architecture, this new framework that we have designed, developed, and successfully deployed is called RASPRO (Rapid Active Summarization for effective PROgnosis) and was first introduced in 2016 IEEE Wireless Health.[1]\n\nRelated work \nWe begin by analyzing the existing systems that simply generate alerts every time one or more sensors cross the abnormality thresholds. Due to the sheer volume of such alerts, they are difficult to manage, even in the case of hospital in-patient settings, let alone for a much larger number of remotely monitored patients. Starting from some of the initial attempts reported by Anliker et al.[2], to more recent works from various researchers[3][4][5][6], the severity detection and alert generation is typically based either on predefined thresholds, or based on training of thresholds using machine learning followed by online classification of multi-sensor data. Very similar techniques of machine learning have also been used in fall detection.[7][8] Hristoskova et al.[9] propose another system wherein patient conditions are mapped to medical conditions using ontology-driven methods, and alerts are generated based on corresponding risk stratification. \nEven though there has been noticeable success in detection and diagnosis of specific disease conditions, most of these works have not explored the opportunity for personalized and precision diagnosis. In an extensive review of Big Data for Health, Andreu-Perez et al.[10] specifically emphasize the opportunity for stratified patient management and personalized health diagnostics, citing examples of customized blood pressure management.[11] More specifically, Bates et al.[12] discuss the utility of using analytics to predict adverse events, which could reduce the associated morbidity and mortality rates. The authors further argue that patient data analytics based on early information supplied to the hospital prior to admission can result in better management of staffing and other hospital resources.[12] One of the recent works in personalized criticality detection is reported by Sung et al.[13], who propose an analytical unit in which the Improved Particle Swarm Optimization (IPSO) algorithm is used to arrive at patient-specific threat ranges. \nTo improve precision in diagnosis we also need to arrive at a balance between a completely automated system on one hand, and physician assist systems on the other. Celler et al.[14] propose a balanced approach wherein sophisticated analytics are presented to physicians, who in turn identify the changes and decide on the diagnosis. This is also supported by many results, including those reported by Skubic et al.[6], wherein domain knowledge-based methods performed as well as other trained machine learning models. These arguments and results provide further impetus for personalized, precision, and preventive diagnostic techniques that are amenable to physician interventions.\n\nMethods \nThe first significant improvement that we applied is the quantization of every remotely sensed parameter based on its own customized severity boundaries. Sequential time windows of such quantized values are examined for dominant appearances of normal results or abnormalities, as the case may be, and motifs corresponding to them are extracted. Using factors set by doctors, the system then transforms these motifs by generating interventional time alerts as per clinically prescribed protocols. Both the alerts and motifs are amenable to rapid transmission to doctors, even as SMS messages on bare-minimum, bandwidth-starved wide area wireless networks. This results in the generation of more clinically relevant critical information, along with a drastic reduction in reporting every minor aberrational data that may not be indicative of any serious condition, after all. The system does not stop here. The attending doctors, when they view the alerts and\/or motifs, have the luxury to request detailed data on demand (dubbed \"DD-on-D\"), upon which the next level of detail in the data is transmitted. This level of detail could be a straightforward frequency map of normal and abnormal values, or much more intelligent machine learning classifications in the case of proven disease conditions. The heart of our system is a framework called RASPRO (see Fig. 1), consisting of physician assist filters (PAFs) that, in going from data to diagnosis, implement the three Ps: precision, personalization, and prevention. In the following sections we describe each of these three concepts in detail.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 1 RASPRO-PAF framework. The architecture shows the RASPRO-PAF framework, which progressively converts the raw multi-sensor data into quantized symbols, helpful motifs, diagnostic predictions, and critical alerts.\n\n\n\nPersonalization PAF \nDue to the distributed data gathering and processing architecture, there is an opportunity to enhance personalization in diagnosis and treatment. The first component in the RASPRO framework, the Personalization PAF takes the form of a patient- and disease-condition-specific severity quantizer that converts raw sensor values to a series of clinically relevant severity symbols.\n\nAdaptive qauntization \nIn general, let us consider N body sensors, S1,S2,\u2026,SN with varying sensing frequencies f1,f2,\u2026,fN. The raw time series values from these sensors are converted to discrete severity level symbols by the quantizer. The number of severity levels Li for a sensor Si can be set based on the sensor and many other factors. We assume that different vital parameter sensors have a different number of severity levels, and hence L1, say the number of severity levels for a blood pressure sensor, could be equal to five, whereas, L2 (say oxygen saturation levels) could be equal to seven. In our symbolic notation, the clinically accepted normal values are assigned the symbol \"A,\" while above-normal values are assigned with progressive degrees of severity as \"A+,\" \"A++,\" etc., while that of sub-normal values are assigned \"A-,\" \"A\u2212\u2212,\" etc.; the number of \u201c+\u201d and \u201c-\u201d symbols representing degree of normal and subnormal severity respectively. Figure 2 depicts how various severity levels are arrived at in the Personalization PAF severity quantizer.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 2 Personalized Quantization. Quantization of sensor data is based on multiple severity categorization criteria, resulting in the generation of patient- and disease-specific quantized values.\n\n\n\nThe quantized severity symbols are arranged into a patient-specific matrix (PSM) of N rows and W columns, where N is the total number of sensors being observed, and W is a time window in which the data is summarized. The value of W can be set by a physician or automatically derived based on the risk perception of that particular patient.\n\nPersonalization \nThe quantization breadth are decided by doctors based on the patient profile (or history), doctor\u2019s diagnostic interest (for instance, a cardiologist may assign severity ranges differently from that of a nephrologist), severity ranges as suggested by using analytics on a local hospital information system (HIS), and also based on population analytics across multiple HIS spanning multiple hospitals or even from publicly available databases such as PhysioNet.[15] Together, this approach gives ample flexibility in achieving customization in inter-patient, inter-disease, intra-patient, inter-specialty diagnosis from multi-sensor data.\n\nPrecision PAF \nWhereas in most other applications precision directly translates into great detail in data, in remote health monitoring, precision cannot come at a cost of voluminous data presentation to the doctor. Compactness has to be retained. We have developed a step-wise refinement process for precision, which is delivered on-demand to the attending doctor. Step 1 is \u201cConsensus Motifs (CM)\u201d; step 2 is a collection of statistical parameters, including severity frequency maps (SFMs); and step 3 is machine learning (ML). In the first step, motifs corresponding to commonly seen normal results and abnormalities in the severity symbols series are extracted. The outcome of this is two severity summaries: (1) the most frequent trend in sensor data that we call consensus normal motif (CNM), and (2) the most frequently occurring abnormality that we term as consensus abnormality motif (CAM). The construction of this involves the following building blocks:\n\r\n\n\n Candidate symbol: \u03b1[p] is the p-th quantized severity symbol in a row of the PSM, \u03b1[1],\u03b1[2],\u2026,\u03b1[p],\u2026,\u03b1[W].\n\r\n\n\n Normal symbol: \u03b1NORM is a candidate symbol that represents the normal level, and its value is equal to \u201cA\u201d for every sensor.\nNow, let the set Cn denote all the candidate symbols in a W-long observation window, corresponding to n-th sensor in the PSM. However, we have dropped the subscript n for better clarity of discussion.\n\n \n \n \n C\n =\n {\n α\n [\n 1\n ]\n ,\n α\n [\n 2\n ]\n ,\n …\n ,\n α\n [\n p\n ]\n ,\n …\n ,\n α\n [\n W\n ]\n }\n \n \n {\\displaystyle C=\\{\\alpha \\lbrack 1\\rbrack ,\\alpha \\lbrack 2\\rbrack ,\\ldots ,\\alpha \\lbrack p\\rbrack ,\\ldots ,\\alpha \\lbrack W\\rbrack \\}}\n \n \nLet \u03c3[p] denote the sum of hamming distances of \u03b1[p] from all other candidate symbols in C such that:\n\n \n \n \n σ\n [\n p\n ]\n =\n \n Σ\n \n i\n =\n 1\n \n \n W\n \n \n D\n (\n α\n [\n p\n ]\n ,\n α\n [\n i\n ]\n )\n \n \n {\\displaystyle \\sigma \\lbrack p\\rbrack =\\Sigma _{i=1}^{W}D(\\alpha \\lbrack p\\rbrack ,\\alpha \\lbrack i\\rbrack )}\n \n \nwhere, D(\u03b1[p],\u03b1[i]) is the hamming distance of \u03b1[p] from \u03b1[i]. Here, we assume that the hamming distance between neighboring severity levels (say, A and A+) is 1. We define a set H of all \u03c3\u2019s such that:\n\n \n \n \n H\n =\n {\n σ\n [\n 1\n ]\n ,\n σ\n [\n 2\n ]\n ,\n …\n ,\n σ\n [\n p\n ]\n ,\n …\n ,\n σ\n [\n W\n ]\n }\n \n \n {\\displaystyle H=\\{\\sigma \\lbrack 1\\rbrack ,\\sigma \\lbrack 2\\rbrack ,\\ldots ,\\sigma \\lbrack p\\rbrack ,\\ldots ,\\sigma \\lbrack W\\rbrack \\}}\n \n .\n\r\n\n\n Consensus normal symbol: \u03b1CNS[C] is defined as a candidate symbol among all the symbols in C that satisfies the following two conditions: (1) its hamming distance from the normal symbol, denoted as D(\u03b1CNS[C],\u03b1NORM), is less than a sensor specific near-normal severity threshold S[n]THRESH, and (2) its sum of hamming distances from all other candidate symbols in C is the minimum. This is formulated as:\n\n \n \n \n \n\n \n \n \n \n \n \n \n α\n \n C\n N\n S\n \n \n [\n \n C\n ]\n \n \n \n \n =\n {\n α\n [\n p\n ]\n :\n D\n \n (\n \n α\n [\n p\n ]\n ,\n \n α\n \n N\n O\n R\n M\n \n \n \n )\n \n \n <\n S\n [\n n\n \n ]\n \n T\n H\n R\n E\n S\n H\n \n \n \n \n \n \n \n \n \n \n a\n n\n d\n  \n σ\n [\n \n p\n ]\n  \n i\n s\n  \n t\n h\n e\n  \n l\n o\n w\n e\n s\n t\n  \n s\n u\n c\n h\n  \n c\n a\n n\n d\n i\n d\n a\n t\n e\n  \n i\n n\n  \n H\n }\n .\n \n \n \n \n \n \n \n {\\displaystyle {}{\\begin{array}{ll}{\\alpha _{CNS}\\lbrack \\!C\\rbrack }&{=\\{\\alpha \\lbrack p\\rbrack :D\\left(\\alpha \\lbrack p\\rbrack ,\\alpha _{NORM}\\right)\\!<S\\lbrack n\\rbrack _{THRESH}}\\\\&{\\qquad and\\ \\sigma \\lbrack \\!p\\rbrack \\ is\\ the\\ lowest\\ such\\ candidate\\ in\\ H\\}.}\\\\\\end{array}}}\n \n \n\r\n\n\n Consensus abnormality symbol: \u03b1CAS[C] is defined as a candidate symbol in C that satisfies the following two conditions: its hamming distance from normal symbol D(\u03b1CNS[C],\u03b1NORM) is greater than or equal to a sensor specific near-normal severity thresholdS[n]THRESH and the sum of hamming distances from all other candidate symbols in C is the minimum. This is formulated as:\n\n \n \n \n \n\n \n \n \n \n \n \n \n α\n \n C\n A\n S\n \n \n [\n \n C\n ]\n \n \n \n \n =\n {\n α\n [\n p\n ]\n :\n D\n (\n α\n [\n p\n ]\n ,\n \n α\n \n N\n O\n R\n M\n \n \n )\n ≥\n S\n [\n n\n \n ]\n \n T\n H\n R\n E\n S\n H\n \n \n \n \n \n \n \n \n \n \n a\n n\n d\n  \n σ\n [\n p\n ]\n  \n i\n s\n  \n t\n h\n e\n  \n l\n o\n w\n e\n s\n t\n  \n s\n u\n c\n h\n  \n c\n a\n n\n d\n i\n d\n a\n t\n e\n  \n i\n n\n  \n H\n }\n .\n \n \n \n \n \n \n \n {\\displaystyle {}{\\begin{array}{ll}{\\alpha _{CAS}\\lbrack \\!C\\rbrack }&{=\\{\\alpha \\lbrack p\\rbrack :D(\\alpha \\lbrack p\\rbrack ,\\alpha _{NORM})\\geq S\\lbrack n\\rbrack _{THRESH}}\\\\&{\\qquad and\\ \\sigma \\lbrack p\\rbrack \\ is\\ the\\ lowest\\ such\\ candidate\\ in\\ H\\}.}\\\\\\end{array}}}\n \n \n\r\n\n\n Consensus normal motif: \u03bcCNM[P] is an ordered sequence of consensus normal symbols belonging to N rows in the PSM of a patient P, and is represented as <\u03b1CNS[C1],\u03b1CNS[C2],\u2026,\u03b1CNS[CN]>. The n-th consensus normal symbol \u03b1CNS[CN] in \u03bcCNM[P] can be indexed as \u03bcCNM[P][n].\n\r\n\n\n Consensus abnormality motif: \u03bcCAM[P] is an ordered sequence of consensus abnormality symbols belonging to N rows in the PSM of patient P, which is represented as <\u03b1CAS[C1],\u03b1CAS[C2],\u2026,\u03b1CAS[CN]>. The n-th consensus abnormality symbol \u03b1CAS[CN] in \u03bcCAM[P] can be indexed as \u03bcCAM[P][n].\nTo reiterate in the above formulation, each row of a PSM is considered as an observation window set C (corresponding to a summarization time window W) to find the corresponding consensus symbols, \u03b1CNS[C] and \u03b1CAS[C]. The sequence of these symbols over the N rows in a PSM form column vector motifs \u03bcCNM[P] and \u03bcCAM[P] (refer to Fig. 3).\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 3 RASPRO severity detection, summarization, and AMI calculated using CAMs and sensor specific severity weight matrix. It also shows an AMI-based patient prioritization table that can help physicians in attending to the neediest patient.\n\n\n\nIn subsequent steps of Precision PAF, the system generates a frequency map that shows how frequently different multi-sensor parameters have crossed the personalized severity thresholds. Finally, the motif time series is further used as input to proven deep learning (DL) and machine learning (ML) techniques such as long short-term memory (LSTM) recurrent neural networks (RNN)[16] or support vector machines (SVM)[17] that could help the doctors in diagnosis. In the next section, we use the above consensus motifs for alert generation to aid in criticality prevention.\n\nPrevention PAF \nImplemented as an alert generation technique that uses simple or complex mathematical models, to calculate the amount of time available to the physicians for effective intervention, the Prevention PAF is amenable to changes based on patient, disease, and physician diagnostic interest. The output of the Prevention PAF is an alert measure index (AMI) that is used to prioritize the patients based on their urgency for physicians\u2019 interventional attention.\nEach severity symbol in a motif also communicates how much time is available with the doctor for deciding an intervention (any if needed). Hence, for each sensor S1, S2, \u2026, SN and its corresponding severity symbol \u03b1 in \u03bcCNM[P] and \u03bcCAM[P] (where \u03b1 could be A, A+, A-, etc.) we associate it with a corresponding medically accepted intervention time \u03b4[Sn][\u03b1]. Across different sensors Sn for a patient P, let us consider \u03b8[Sn][\u03b1] as a sensor and severity symbol indexed matrix of weights derived from interventional time using the following relationship:\n\n \n \n \n θ\n \n [\n \n S\n \n n\n \n \n ]\n \n [\n \n α\n ]\n =\n \n \n \n K\n \n P\n \n \n \n δ\n \n [\n \n S\n \n n\n \n \n ]\n \n [\n \n α\n ]\n \n \n \n \n \n {\\displaystyle \\theta \\left\\lbrack S_{n}\\right\\rbrack \\lbrack \\!\\alpha \\rbrack ={\\frac {K_{P}}{\\delta \\left\\lbrack S_{n}\\right\\rbrack \\lbrack \\!\\alpha \\rbrack }}}\n \n \nIn the above equation, the constant KP can be set by the physician considering the context of a patient\u2019s health condition (including historical medical records and specific sensitivities and vulnerabilities documented therein) or derived through machine learning techniques. The above equation may be substituted by more complex equations for progressively complicated disease conditions.\nAt the end of each observation time-window W, for every patient P, we also define an aggregate criticality alert score, called the Alert Measure Index (AMI), which is calculated as:\n\n \n \n \n \n\n \n \n \n \n \n \n A\n M\n I\n =\n \n Σ\n \n n\n =\n 1\n \n \n N\n \n \n \n (\n \n θ\n \n [\n \n S\n \n n\n \n \n ]\n \n \n [\n \n \n μ\n \n C\n A\n M\n \n \n [\n P\n ]\n [\n \n n\n ]\n \n ]\n \n \n )\n \n ∗\n n\n u\n m\n \n (\n \n \n μ\n \n C\n A\n M\n \n \n [\n \n P\n ]\n [\n \n n\n ]\n \n \n )\n \n \n \n \n \n \n \n \n {\\displaystyle {}{\\begin{array}{l}{AMI=\\Sigma _{n=1}^{N}\\left(\\theta \\left\\lbrack S_{n}\\right\\rbrack \\left\\lbrack \\mu _{CAM}\\lbrack P\\rbrack \\lbrack \\!n\\rbrack \\right\\rbrack \\right)\\ast num\\left(\\mu _{CAM}\\lbrack \\!P\\rbrack \\lbrack \\!n\\rbrack \\!\\right)}\\\\\\end{array}}}\n \n \nwherein, each severity quantized symbol in the \u03bcCAM[P] of the n-th sensor is converted into a numerical value (e.g., A\u00b1 is assigned 1, A++ or A\u2212\u2212 is assigned 2) using num(\u03bcCAM[P][n]), and scales it up by the sensor-severity specific weight \u03b8[Sn][\u03b1] (as defined just prior). The resulting AMI is indicative of the immediacy of patient priority for physician\u2019s consultative attention. The process of motif detection, AMI calculation, and patient prioritization is summarized in Fig. 3. The data used to arrive at the AMI scores could be other statistical parameters (such as frequency maps) or machine learning prediction scores. Also, the technique for calculating the score may also be based on predefined simple mathematical models or complex machine learning algorithms.\n\nClinical relevance and validation \nIn October 2016, the RASPRO framework was introduced to doctors in multiple specialties in our super-specialty hospital, wherein they validated its clinical deployment applications. We present some of the specific clinical scenarios that emerged from this pilot study.\n\nCardiology \nThe electrocardiogram is a potential indicator of cardiac events and can be exploited for personalized and precision diagnosis by varying the parametric thresholds and summarization window, based on patient profile\/disease condition and associated factors. For instance, taking into account the disease condition, a 3mm depression in the ST segment would be graded as A++ for an active patient having exertion related chest pain, indicating cardiac ischemia, whereas the same if occurred in a patient at rest, would be graded as A+++ with limited time of intervention (30 min), indicating cardiac muscle death. To extend the spectrum of diseases that ST segment depression would cover, a chronic hypertensive with left ventricular hypertrophy of the heart (and no chest pain) would also presumably have a continuous 3mm dip in the ST segment which does not require any interventional attention, and hence, would be graded as A\/A+ (near normal) by the severity quantizer. Next, taking into account the patient profile, in sedentary workers, aged above 45 having smoking habit, with high cholesterol levels and other associated risks, the thresholds will be low (A+, A++, and A+++ would be assigned to 1\u20132mm, 2\u20133mm, and above 3mm ST depression respectively), while in highly active but risk patients with age less than 45, and no previous associated history, the levels will be high (A+, A++, and A+++ would correspondingly be assigned to 2\u20133mm, 3\u20133.5mm, and above 3.5mm respectively). Also, in the former case the summarization window W (capturing how long ST depression sustains) would be 3\u20134 minutes (more critical), whereas in the latter it would be 7\u20139 minutes.\n\nPulmonology \nSimple but vital parameters such as oxygen saturation levels in the body (SpO2), blood pressure (BP), heart rate variability (HRV), and respiratory rate variability (RRV), present in unique combinations, would facilitate differentiating between benign diseases such as interstitial lung disease\/sleep apnea for which the thresholds for alert (set through the interventional time constant KP) will be fairly high, and emergencies such as pulmonary edema\/pulmonary embolism (blood clot in an artery in the lung) for which the thresholds will be kept low if any of the predisposing factors such as left heart failure, pulmonary hypertension, prolonged immobilization, pregnancy, etc. are present. Hence, the physician would preset these combinations of vitals to be looked for as sequence of symbols in the CAM. Since the number of parameters that could be picked up to indicate disease are a few, it is pertinent that stepwise precision techniques such as machine learning algorithms be used for distinguishing between closely mimicking conditions. Obstructive sleep apnea and chronic obstructive pulmonary disease provide solid examples, both of which would show similar trends in SpO2, BP, HRV, and RRV. In a trial that was conducted at our hospital, we were able to achieve 99% precision in diagnosing sleep apnea from HRV using a deep learning algorithm called long short-term memory recurrent neural networks (LSTM-RNN) and is reported in one of our previous works.[16] The algorithm evaluation was done using the multi-sensor patient data from the Physionet Challenge 2000[18], which contained annotated data from 35 patients who underwent overnight sleep study.\n\nNeurology \nOne of the early markers of autonomic neuropathy in epileptic patients is the discrepancy between the BP and the pulse rate of the patients. In this scenario, the severity levels of BP and pulse rate would be set accordingly (as a combination) to alert the practitioner. Suppose S1 is BP and S2 is heart rate sensor respectively. Let us say for the patient P1, \u03bcCAM[P1]=<A\u2212\u2212,A>, and for patient P2, \u03bcCAM[P]=<A\u2212,A+>. In both cases, the diagnosis, alert level, and treatment vary because P1 has BP decline with no change in heart rate (critical), while P2 has a compensatory increase in heart rate, which indicates good autonomic function.\nThough these are representative clinical scenarios, we found wide agreement among the doctors from other specialties too that personalization, step-wise precision, and prevention introduced through the RASPRO framework is of high utility in remote monitoring and critical alert generation.\n\nResults and discussion \nIn order to quantitatively evaluate the effectiveness of RASPRO, we measure both the diagnostic ability as well as the preventive predictive power of this technique. We formulate three hypotheses and evaluate the effectiveness of RASPRO in satisfying these:\n\n Precision hypothesis: RASPRO consensus motif time series can replace raw sensor data time series for the task of identification\/classification of specific disease conditions.\n Prevention hypothesis: RAPSRO-based consensus motifs can predict future disease condition with as much accuracy as raw sensor data time series.\n Personalization hypothesis: There exists an inter-patient variability in severity levels and summarization frequencies, which if optimized individually can result in better accuracy in predicting\/classifying a specific disease condition.\nBy assessing the validity of the first hypotheses, we aim to evaluate the extent to which RASPRO motifs can provide precision in diagnostics. The second hypothesis evaluates the utility of RASPRO as a tool for predictive analytics in critical conditions, while the third hypothesis helps us understand if there exists a case for personalization in disease discovery and prediction.\n\nDataset \nThe first step to evaluate these hypotheses is to identify datasets that are extensive, long-term and critically significant. We used a large time series dataset from the MIMIC II database[19], which contains multiple body sensor values from over 20,000 ICU patients. This dataset consists of ECG, ABP (Arterial Blood Pressure), Heart Rate (HR), Non-obtrusive BP (NBP), SpO2, Mean Arterial BP (MAP), and other vital signs. From this, we selected a curated set of patient and control group data that contained a long time series data followed by a critical event. We selected patients with acute hypotensive episodes (AHE), which is a potentially fatal condition, found quite common in ICUs as well as caused due to postural hypotension. An AHE event is analytically identified as when MAP measurements remain below 60 mmHg for more than 30 minutes. This is a potentially fatal event and requires immediate intervention. We also made sure that the dataset provides uninterrupted MAP signal with a minimum sampling rate of one per minute, over at least three hours for both the event-patients as well as the control group. We selected a group of 35 patients (called group H) who had AHE during some time during their stay in ICU, and another 35 patients (called group G) who did not have AHE during their ICU stay. This dataset was selected from the PhysioNet[15] Challenge 2009.[20] The H dataset also had a time marker t0, after which AHE occurred in that patient within a one-hour window. Since the data was obtained from publicly available sources, we did not require getting prior approval of IRB for this work.\n\nEvaluating precision hypothesis \nThe first task is to measure the replaceability of the original time series data with the quantized symbols and consensus motifs. To evaluate this, the H and G group time series data comprising of mean arterial pressure (MAP), of length 60 minutes after t0 are modeled as feature vectors of length 60. These vectors are called original time series (OTS) and are used for training an SVM model for classifying the data as having AHE or not. The vectors belonging to AHE were labelled as H, and G otherwise. After using OTS, we then generate quantized time series (QTS) vectors with different quantization breadth. The quantization breadth (denoted by B) are varied as 5, 10, 15, and 20. For instance, when B=10, each of the OTS MAP values between 60 mmHg and 50 mmHg are quantized into the same severity symbol, say \u201cA-,\u201d whereas for B=5, the symbol \u201cA-\u201d quantizes all OTS MAP values between 60 mmHg and 55 mmHg. These vectors are used in similar manner to first train and then test the SVM model. Finally, we generate the corresponding motif time series (MTS) for each of the QTS, with varying the summarization time window W as 5, 10, and 15. The value of W corresponds to the time window in which all the severity symbols in the QTS are converted to a single consensus symbol. A comparison of OTS, QTS, and MTS is done using the statistical measure of binary classification, the F-score. An F-score (also called F1-score) is calculated as:\n\r\n\n\n \n \n \n \n F\n \n 1\n \n \n s\n c\n o\n r\n e\n =\n 2\n ∗\n \n \n \n (\n P\n r\n e\n c\n i\n s\n i\n o\n n\n ∗\n R\n e\n c\n a\n l\n l\n )\n \n \n (\n P\n r\n e\n c\n i\n s\n i\n o\n n\n +\n R\n e\n c\n a\n l\n l\n )\n \n \n \n \n \n {\\displaystyle F_{1}score=2\\ast {\\frac {(Precision\\ast Recall)}{(Precision+Recall)}}}\n \n \n\nSignificant results \nThe F1-scores for the SVM models are summarized in Fig. 4. It shows that OTS-based SVM model gave an F1-score of 0.76, which is the gold standard that we compare other models with. The QTS- and MTS-based SVM models were able to perform as well as OTS in most of the cases. Furthermore, MTS with (B=10 and W=15) and (B=20 and W=5,10,15) performed better than the OTS in the classification problem. In fact, these MTS models showed more than 12% better F1-score compared to OTS. These results support the precision hypothesis that motif time series can replace original time series data for the task of identification\/classification of specific disease conditions, in this case AHE.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 4 AHE classification F1-score The F1-scores of SVM models trained and tested for classifying the given 60 minutes of data as AHE or not using OTS, QTS (B=5,10,15,20), and MTS (W=5,10,15). It shows that QTS and MTS with different B and W values are able to classify the AHE signal with F1-scores that are better than one obtained using OTS SVM model.\n\n\n\nEvaluating prevention hypothesis \nThe next evaluation parameter of RASPRO is to identify if a priori motif series could predict future disease condition, and thereby aid in preventive intervention. For this, the H and G group time series data comprising of mean arterial pressure (MAP), of length T minutes prior to t0 is modeled as a T-long feature vector (OTS). These vectors are used for training (using 70% data, with five-fold cross validation) and testing (using 30% data) an SVM model for predicting them as AHE or not, where patients belonging to H group are annotated as having AHE and G group patients are annotated otherwise. In effect, we try to classify sensor data prior to an AHE event as a predictor for ensuing an AHE condition. Since G group data did not have a time marker t0, we selected a random but continuous time series of length T from each of the G group patients. SVM was selected due to its widely accepted performance in classification problems involving multiple features, although we might obtain comparable results using other classification techniques too.\nThe backward offset time T (from t0) is varied as 30, 60, 90, 120, 150, and 180 minutes as an expanding window. In the next step, the raw feature vectors are quantized using severity quantizer to form a quantized time series (QTS). Once again, the quantization breadth B is varied as 5, 10, 15, and 20. In the third step, the QTS are summarized and motifs extracted to form a RASPRO motif time series (MTS), with varying observation time window sizes W: 5, 10 and 15 minutes. The QTS and MTS are then given as input to train and test the SVM model (one for QTS and another for MTS) for predicting AHE before its onset.\n\nSignificant results \nFrom the comparative analysis of OTS and QTS (Fig. 5), we observe that QTS with B=15 has better F1-score in comparison to OTS in all the time-offsets T, although the root mean square error (RMSE) between these two series is an insignificant 0.001, pointing to the fact that OTS could be replaced with QTS. We select this QTS (B=15) and then compare it with MTS of varying time windows in Fig. 6. We observe from Fig. 6 that QTS has a higher F1-score compared to the best MTS with W=10. However, RMSE between QTS and MTS (W=10 and W=15) is a statistically insignificant value of 0.01, which implies that MTS using W=10 and 15 performs as well as QTS on an average across different time windows. Now, we further compare the OTS against the best performing B and W values corresponding to QTS and MTS respectively, and the results are plotted in Fig. 7. These data points are marked as QTSmax and MTSmax respectively. In Fig. 7, QTSmax and MTSmax show closely similar F1-score with the RMSE as 0.018, which could be considered statistically insignificant.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 5 Expanding Window: OTS Vs. QTS Comparison of F1-score of OTS and QTS for classification of AHE using expanding time windows shows better performance of QTS with B=15.\n\n\n\n\n\n\n\n\n\n\n\n\n Fig. 6 Expanding Window: QTS Vs. MTS. Comparison of F1-score of QTS (B=15) and MTS (varying W) for classification of AHE using expanding time windows.\n\n\n\n\n\n\n\n\n\n\n\n\n Fig. 7 Expanding Window: QTSmax Vs. MTSmax. Comparison of F1-score of OTS with QTSmax and MTSmax corresponding to best performing B and W values respectively for classifying AHE using expanding time windows.\n\n\n\nGoing further, we used data from a moving time window of 30 minutes each, instead of an expanding window. This simulates the situation when we obtain data for 30 minutes alone and are required to classify it as a predictor for AHE. Here, we do not have the luxury of having data till t0, as the 30 minutes slice of data could be from anywhere up to three hours before t0. We show in Fig. 8 the comparative analysis of OTS against the best B and W values corresponding to QTS and MTS in the moving window experiment. The results plotted in Fig. 8 show that MTS and QTS perform better than OTS in most of the time intervals, while the RMSE between MTS and QTS is 0.018 on an average. The results comparing QTS with different B values against MTS with different W values are given in Additional files 1\u20135 (at the end of this paper).\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 8 Moving Window: QTSmax Vs. MTSmax. Comparison of F1-score of OTS with QTSmax and MTSmax corresponding to best performing B and W values respectively for classifying AHE using moving window of 30 minutes duration.\n\n\n\nFrom these results, we can conclude that quantized symbols, as well as summarized motifs, are as good as (or better in many cases) compared to raw time series in identifying predictors for AHE, both in expanding and moving windows, thereby supporting our prevention hypothesis.\n\nEvaluating personalization hypothesis \nThe third hypothesis aims to find out if there are patient specific custom severity levels, and summarization frequencies, which if optimized could lead to better accuracy in diagnosis. For this, we further analyze our earlier results. We observe from Figs. 7 and 8 that by selecting different severity quantization breadth (B) and through varying the summarization window size (W), we are able to predict the onset of AHE with a higher F1-score. This supports an argument for using disease and time-specific B and W values for achieving better accuracy in classification problems. We observe very similar results in Fig. 4, which shows that by choosing optimized W and B values, the machine learning models can perform better in classification problems too. These results further support our third hypothesis, that there exists an opportunity for personalization at least at disease specific and time specific level. Though the above experiments using AHE are only representative of how step-wise precision, personalization, and prevention can be achieved using RASPRO, the practitioners as a whole agree that in wide-ranging scenarios patient-sensor-disease-time specific severity levels need to be defined that is both practical to manage alerts as well as effective in identifying emergencies.\n\nGlobal health deployment \nThese medical benefits of RASPRO framework would contribute directly to fulfill the primary goals of remote health monitoring in a global health scenario. We call these benefits the \"3As\": availability, accessibility, and affordability.\n\n Availability: By enabling the doctors to prioritize their time based on the AMI, we effectively increase the availability of doctors for the neediest of remote patients.\n Accessibility: A patient\u2019s summarized health status represented by the consensus motifs could be sent over even bare-minimum communication networks (for instance, in the form of SMS). The clinically validated RASPRO motifs would then enable the doctors to use it instead of voluminous raw sensor data for arriving at timely diagnosis. In addition, by providing step-wise precision through detailed data-on-demand (DD-on-D), the doctors can choose to get more data if needed. Together, these techniques, as illustrated in Fig. 9, increase the accessibility of patients to quality and critical remote healthcare services.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 9 Detailed Data-on-Demand. The DD-on-D technique as implemented in RASPRO-PAF framework enables patient\u2019s multi-sensor data to be sent over even SMS to remote doctors, who then initiate emergency intervention through telemedicine units stationed near to the patients location.\n\n\n\n Affordability: Remote health monitoring combined with timely criticality detection can substantially reduce the healthcare costs, by reducing the number of unnecessary hospital visits and smartly managing the available time of doctors who could focus on the neediest of patients. For instance, in a developing country the patients could be spending anywhere between $4-5 for travelling to the nearest hospital. Combined with the loss of their daily wages due to taking a break from their work, the cost to the patient for a hospital visit could be around $10-20 per day, not including the consultation charges (which ranges between $5-10 per visit). Through an initial survey of the patients visiting the cardiology department in our hospital it was observed that a majority of the patients do not, at the end of examination, have a cardiac disease. These could well have been diagnosed as such using remote monitoring of their vital parameters and hence avoid unnecessary hospital visits. Also, for a majority of revisiting patients, the visits could have been avoided using remote monitoring.\nThese advantages would help bring quality healthcare to millions of people who are currently under-served in the global health scenario. We are readying for large-scale deployment of the RASPRO framework, including the \"3P\" RASPRO-PAF analytical tools using a network of more than 45+ telemedicine nodes (as shown in Fig. 10) and remote health centers across the Indian sub-continent, which are connected to the AIMS hospital.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 10 Global health deployment. The RASPRO-PAF system is being readied for deployment using the telemedicine network of AIMS hospital, which has more than 45 remote nodes spread across India and Africa, connected through a satellite network.\n\n\n\nPractitioner education is one of the key challenges in global deployment of any new data analytics technique. To ensure usability of the system, we have involved the doctors from the design and conceptual phase of the RASPRO-PAF system. In order to provide hands-on training, acceptability, and experience in the use of the data analytics techniques, we also aim to introduce these to all the practitioners as part of the annual continuing medical education (CME) program.\n\nChallenges and drawbacks \nOne of the major drawbacks of any severity detection and summarization technique is the risk of missing important data. In the RASPRO technique, we try to mitigate some of these risks by providing a graded information flow from the multiple sensors to the doctors. The alerts are calculated based on patient- and disease-specific quantization and threshold levels. Hence, the chances of generation of unnecessary alerts are low. On the other hand, upon receiving these alerts the doctors can further request for detailed data on demand (DD-on-D), using which the doctors can see actual sensor values, the calculated motifs, the frequency maps, as well as any other machine-learning-based assistive diagnosis. This provides the flexibility to doctors and emergency responders to obtain complete view of the patient condition before deciding upon any intervention. However, any such system is also fraught with the danger of system failures that could jeopardize the patient\u2019s life, though this could be overcome to a large extent by developing robust hardware and fail-safe firmware. We are also aware that a thorough cost-risk-benefit analysis needs to be carried out before any wide scale deployment. Apart from these, in developing countries there are implementation gaps that need to be addressed which include: (a) intermittent and unreliable mobile connectivity in rural regions, (b) capturing and transmission of data while the patient is mobile, (c) power management in edge devices such as mobile phones to ensure timely processing and transmission of data, (d) whether to do the RASPRO-PAF processing at the edge or in the cloud, and (e) efficient management of remote patient monitoring through educating the support staff in hospitals.\n\nConclusion \nIn this paper, we have reported on the successful design, development, and deployment of a set of \"3P\" tools for healthcare data analytics, called RASPRO-PAFs that transform voluminous physiological sensor data into meaningful motifs using personalized disease severity levels. These motifs have been found to be as effective as, or in many cases better than, the raw sensor data in the identification and prediction of critical conditions in patients. Through a step-wise precision process, the doctors can gain further insight into the medical condition of the patient, progressively using quantized symbols, motifs, frequency maps, and machine learning. Furthermore, the criticality of a patient is analyzed from these motifs using a novel interventional time relationship that helps doctors prioritize their time more efficiently. Together, the 3P PAFs helps in personalized, precision, and preventive diagnosis of the patients. We have also clinically validated the efficacy of the system using both doctor feedback from the hospital as well as using machine learning techniques. Given the initial acceptance of this tool among the medical community, we are preparing for testing and evaluation in other medical domains, as well as large-scale field deployment in a global health scenario.\n\nAbbreviations \n3P: Precision, personalization, and prevention\nABP: Arterial blood pressure\nAHE: Acute hypotensive episode\nAIMS: Amrita institute of medical science\nAMI: Alert measure index\nBP: Blood pressure\nCM: Consensus motifs\nCAM: Consensus abnormality motif\nCNM: Consensus normality motif\nDD-on-D: Detailed data on demand\nDL: Deep learning\nECG: Electrocardiogram\nHIS: Hospital information system\nHR: Heart rate\nHRV: Heart rate variability\nICU: Intensive care unit\nIoT: Internet of things\nIPSO: Improved particle swarm optimization\nLSTM: Long short term memory\nMAP: Mean arterial pressure\nMIMIC: Multiparameter intelligent monitoring in intensive care\nML: Machine learning\nMTS: Motif time series\nNBP: Non-obtrusive blood pressure\nOTS: Original time series\nPAF: Physician assist filter\nPSM: Patient specific matrix\nQTS: Quantized time series\nRASPRO: Rapid active summarization for effective PROgnosis\nRMSE: Root mean squared error\nRRV: Respiratory rate variability\nSFM: Severity frequency maps\nSpO2: Peripheral capillary oxygen saturation\nSVM: Support vector machine\n\nAdditional files \nAdditional file 1: Moving Window OTS Vs. QTS. The figure shows the F1-score while using a moving window of size 30 mins with varying backward offset from t0. The results show that QTS is always better than OTS in classifying a given window as predictor for AHE or not. (PNG 28 kb)\nAdditional file 2: Moving Window QTS (B=5) Vs. MTS. The figure shows the F1-score comparison of QTS with B=5 and MTS while using a moving window of size 30 mins with varying backward offset from t0. The results show that MTS is better than QTS except in two time slots. (PNG 21 kb)\nAdditional file 3: Moving Window QTS (B=10) Vs. MTS. The figure shows the F1-score comparison of QTS with B=10 and MTS while using a moving window of size 30 mins with varying backward offset from t0. The results show that MTS is better than QTS except in two time slots, and also W=10 and W=15 are better summarization windows. (PNG 22 kb)\nAdditional file 4: Moving Window QTS (B=15) Vs. MTS. The figure shows the F1-score comparison of QTS with B=15 and MTS while using a moving window of size 30 mins with varying backward offset from t0. The results show that MTS is better than QTS except in two time slots, and also W=10 and W=15 are better summarization windows. (PNG 21 kb)\nAdditional file 5: Moving Window QTS (B=20) Vs. MTS. The figure shows the F1-score comparison of QTS with B=20 and MTS while using a moving window of size 30 mins with varying backward offset from t0. The results show that QTS is marginally better than MTS in four time slots. (PNG 22 kb)\n\nDeclarations \nAcknowledgements \nWe deeply thank the support and inspiration provided by the Chancellor of Amrita University, Mata Amritanandamayi Devi (known as \u201cAmma\u201d). This project has materialized due to her constant guidance. We also thank Dr. P Venkat Rangan, who has contributed to the idea of 3P platform as well as gave important inputs to the manuscript. We thank the doctors at AIMS hospital, who has helped us in initial clinical use case analysis of the system.\n\nFunding \nThis work was not supported by any funding organization and hence not applicable.\n\nAvailability of data and materials \nThe datasets analysed during the current study are available in the MIMIC II Waveform repository: https:\/\/physionet.org\/physiobank\/database\/mimic2db\/.\n\nAuthors\u2019 contributions \nRKP and ESR designed the RASPRO-PAF 3P architecture. RKP analyzed and interpretted the results, and was also a major contributor in writing the manuscript. PD conducted the experiments and analysed the results. ESR interpretted and applied the algorithms on clinical cases, and was also a major contributor in writing the manuscript. All authors read and approved the final manuscript.\n\nEthics approval and consent to participate \nThe data used in this study was obtained from a publicly available anonymized dataset, the MIMIC II Waveform repository, and hence did not require any independent\/separate ethics approval or consent to participate from any of the patients.\n\nCompeting interests \nThe authors declare that they have no competing interests.\n\nReferences \n\n\n\u2191 Pathinarupothi, R.K.; Rangan, E.S.; Alangot, B. et al. (2016). \"RASPRO: Rapid summarization for effective prognosis in wireless remote health monitoring\". 2016 IEEE Wireless Health: 1\u20136. doi:10.1109\/WH.2016.7764566.   \n\n\u2191 Anliker, U.; Ward, J.A.; Lukowicz, P. (2004). \"AMON: A wearable multiparameter medical monitoring and alert system\". IEEE Transactions on Information Technology in Biomedicine 8 (4): 415\u201327. PMID 15615032.   \n\n\u2191 Baig, M.M.; GholamHosseini, H.; Connolly, M.J. et al. (2014). \"Real-time vital signs monitoring and interpretation system for early detection of multiple physical signs in older adults\". Proceeding from the IEEE-EMBS International Conference on Biomedical and Health Informatics: 355\u20138. doi:10.1109\/BHI.2014.6864376.   \n\n\u2191 Rajevenceltha, J.; Kumar, C.S.; Kimar, A.A. (2016). \"Improving the performance of multi-parameter patient monitors using feature mapping and decision fusion\". Proceedings from the 2016 IEEE Region 10 Conference: 1515\u20138. doi:10.1109\/TENCON.2016.7848268.   \n\n\u2191 Sreejith, S.; Rahul, S.; Jisha, R.C. (2016). \"A Real Time Patient Monitoring System for Heart Disease Prediction Using Random Forest Algorithm\". Advances in Signal Processing and Intelligent Recognition Systems 425: 485\u2013500. doi:10.1007\/978-3-319-28658-7_41.   \n\n\u2191 6.0 6.1 Skubic, M.; Guevara, R.D.; Rantz, M. (2015). \"Automated Health Alerts Using In-Home Sensor Data for Embedded Health Assessment\". IEEE Journal of Translational Engineering in Health and Medicine 3: 1\u201311. doi:10.1109\/JTEHM.2015.2421499.   \n\n\u2191 Lopes. I.C.; Vaidya, B.; Rodrigues, J.J.P.C. (2013). \"Towards an autonomous fall detection and alerting system on a mobile and pervasive environment\". Telecommunications Systems 52 (4): 2299\u2013310. doi:10.1007\/s11235-011-9534-0.   \n\n\u2191 Balasubramanian, A.; Wang, J.; Prabhakaran, B. (2016). \"Discovering Multidimensional Motifs in Physiological Signals for Personalized Healthcare\". EEE Journal of Selected Topics in Signal Processing 10 (5): 832\u201341. doi:10.1109\/JSTSP.2016.2543679.   \n\n\u2191 Hristoskova, A.; Sakkalis, V.; Zacharioudakis, G. et al. (2014). \"Ontology-driven monitoring of patient's vital signs enabling personalized medical detection and alert\". Sensors 14 (1): 1598-628. doi:10.3390\/s140101598. PMC PMC3926628. PMID 24445411. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3926628 .   \n\n\u2191 Andreu-Perez, J.; Poon, C.C.; Merrifield, R.D. et al. (2015). \"Big data for health\". IEEE Journal of Biomedical and Health Informatics 19 (4): 1193-208. doi:10.1109\/JBHI.2015.2450362. PMID 26173222.   \n\n\u2191 Liu, Q.; Yan, B.P.; Yu, C.M. et al. (2014). \"Attenuation of systolic blood pressure and pulse transit time hysteresis during exercise and recovery in cardiovascular patients\". IEEE Transactions on Bio-medical engineering 61 (2): 346\u201352. doi:10.1109\/TBME.2013.2286998. PMID 24158470.   \n\n\u2191 12.0 12.1 Bates, D.W.; Saria, S.; Ohno-Machado, L. et al. (2014). \"Big data in health care: using analytics to identify and manage high-risk and high-cost patients\". Health Affairs 33 (7): 1123-31. doi:10.1377\/hlthaff.2014.0041. PMID 25006137.   \n\n\u2191 Sung, W.-T.; Chen, J.-H.; Chang, K.-W. (2014). \"Mobile Physiological Measurement Platform With Cloud and Analysis Functions Implemented via IPSO\". IEEE Sensors Journal 14 (1): 111\u201323. doi:10.1109\/JSEN.2013.2280398.   \n\n\u2191 Celler, B.G.; Sparks, R.S. (2015). \"Home telemonitoring of vital signs--technical challenges and future directions\". IEEE Journal of Biomedical and Health Informatics 19 (1): 82\u201391. doi:10.1109\/JBHI.2014.2351413. PMID 25163076.   \n\n\u2191 15.0 15.1 Goldberger, A.L.; Amaral, L.A.; Glass, L. (2000). \"PhysioBank, PhysioToolkit, and PhysioNet\". Circulation 101 (23): e215\u2013e220. doi:10.1161\/01.CIR.101.23.e215.   \n\n\u2191 16.0 16.1 Pathinarupothi, R.K.; Vinaykumar, R.; Rangan, E. et al. (2017). \"Instantaneous heart rate as a robust feature for sleep apnea severity detection using deep learning\". Proceedings from the 2017 IEEE EMBS International Conference on Biomedical & Health Informatics: 293\u20136. doi:10.1109\/BHI.2017.7897263.   \n\n\u2191 Arunan, A.; Pathinarupothi, R.K.; Ramesh, M.V. (2016). \"A real-time detection and warning of cardiovascular disease LAHB for a wearable wireless ECG device\". Proceedings from the 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics: 98\u2013101. doi:10.1109\/BHI.2016.7455844.   \n\n\u2191 Penzel, T.; Moody, G.B.; Mark, R.G. et al. (2000). \"The Apnea-ECG Database\". Proceedings from Computers in Cardiology 2000 27: 255\u201358. doi:10.1109\/CIC.2000.898505.   \n\n\u2191 Saeed, M.; Villarroel, M.; Reisner, A.T. et al. (2011). \"Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): A public-access intensive care unit database\". Critical Care Medicine 39 (5): 952\u201360. doi:10.1097\/CCM.0b013e31820a92c6. PMC PMC3124312. PMID 21283005. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3124312 .   \n\n\u2191 Moody, G.B.; Lehman, L.H. (2009). \"Predicting Acute Hypotensive Episodes: The 10th Annual PhysioNet\/Computers in Cardiology Challenge\". Computers in Cardiology 36 (5445351): 541\u2013544. PMC PMC2937253. PMID 20842209. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2937253 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. Grammar and punctuation was edited to American English, and in some cases additional context was added to text when necessary. In some cases important information was missing from the references, and that information was added.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\">https:\/\/www.limswiki.org\/index.php\/Journal:Data_to_diagnosis_in_global_health:_A_3P_approach<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles (with rendered math)LIMSwiki journal articles on big dataLIMSwiki journal articles on health informaticsLIMSwiki journal articles on sensor networks\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 1 April 2019, at 17:51.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 267 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","8d21eded7dba3fec86203cded8451b7e_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Data_to_diagnosis_in_global_health_A_3P_approach skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Data to diagnosis in global health: A 3P approach<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><b>Background<\/b>: With connected medical devices fast becoming ubiquitous in healthcare monitoring, there is a deluge of data coming from multiple body-attached sensors. Transforming this flood of data into effective and efficient diagnosis is a major challenge.\n<\/p><p><b>Methods<\/b>: To address this challenge, we present a \"3P\" approach: personalized patient monitoring, precision diagnostics, and preventive criticality alerts. In a collaborative work with doctors, we present the design, development, and testing of a healthcare data analytics and communication framework that we call RASPRO (Rapid Active Summarization for effective PROgnosis). The heart of RASPRO is \"physician assist filters\" (PAF) that 1. transform unwieldy multi-sensor time series data into summarized patient\/disease-specific trends in steps of progressive precision as demanded by the doctor for a patient\u2019s personalized condition, and 2. help in identifying and subsequently predictively alerting the onset of critical conditions. The output of PAFs is a clinically useful, yet extremely succinct summary of a patient\u2019s medical condition, represented as a motif, which could be sent to remote doctors even over SMS, reducing the need for data bandwidths. We evaluate the clinical validity of these techniques using support-vector machine (SVM) learning models measuring both the predictive power and its ability to classify disease condition. We used more than 16,000 minutes of patient data (N=70) from the openly available MIMIC II database for conducting these experiments. Furthermore, we also report the clinical utility of the system through doctor feedback from a large super-speciality hospital in India.\n<\/p><p><b>Results<\/b>: The results show that the RASPRO motifs perform as well as (and in many cases better than) raw time series data. In addition, we also see improvement in diagnostic performance using optimized sensor severity threshold ranges set using the personalization PAF severity quantizer.\n<\/p><p><b>Conclusion<\/b>: The RASPRO-PAF system and the associated techniques are found to be useful in many healthcare applications, especially in remote patient monitoring. The personalization, precision, and prevention PAFs presented in the paper successfully shows remarkable performance in satisfying the goals of the 3Ps, thereby providing the advantages of \"3As\": availability, affordability, and accessibility in the global health scenario.\n<\/p><p><b>Keywords<\/b>: precision medicine, medical informatics, personalized healthcare, motif summarization\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Background\">Background<\/span><\/h2>\n<p>Precision medicine and personalized healthcare are quickly gaining wide research interest as well as initial acceptance among the medical community. This is facilitated by the availability of ubiquitous data sources such as wearable sensors, smartphones, and <a href=\"https:\/\/www.limswiki.org\/index.php\/Internet_of_things\" title=\"Internet of things\" class=\"wiki-link\" data-key=\"13e0b826fa1770fe4bea72e3cb942f0f\">internet of things<\/a> (IoT) devices, along with machine learning and large-scale data analytics tools, resulting in promising outcomes in some of the niche medical domains. Our research particularly focuses on introducing the three Ps: precision, personalization, and preventive diagnosis in remote healthcare monitoring of patients, especially in a global health scenario. In our system, patients in remote areas use wearable devices to capture their vital parameters such as blood pressure (BP), blood glucose, oxygen saturation (SpO2), electro cardiographs (ECG) etc., and transmit them to doctors in tertiary care <a href=\"https:\/\/www.limswiki.org\/index.php\/Hospital\" title=\"Hospital\" class=\"wiki-link\" data-key=\"b8f070c66d8123fe91063594befebdff\">hospitals<\/a>, who in turn are expected to suggest suitably needed timely interventions. While deploying our system in the highly populous region of southern India, we found that although this promises to provide hitherto unavailable healthcare services to a critically ill and aging population, particularly in the developing world, there are significant roadblocks in our expectation that doctors embrace this new paradigm in handling patients. The doctors, who are already overloaded, feel even more overwhelmed by the voluminous data flooding in from remote patients\u2019 sensors. Furthermore, interpreting such incoming multi-parameter data simultaneously from a multitude of remote patients is time-consuming and soon transforms into an unmanageable deluge.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Approach\">Approach<\/span><\/h3>\n<p>In this paper, we propose novel approaches to transform data into diagnosis. As a collaborative work between our researchers and clinicians in one of the largest super-specialty hospitals in India (Amrita Institute of Medical Sciences - AIMS), we developed physician assist filters (PAFs) that are designed to transform unwieldy time series sensor data into summarized patient\/disease-specific trends in steps of progressive precision as demanded by the doctor for patient\u2019s personalized condition at hand, and help in identifying and subsequently predictively alerting the onset of critical conditions. Together with the communication network and data transmission architecture, this new framework that we have designed, developed, and successfully deployed is called RASPRO (Rapid Active Summarization for effective PROgnosis) and was first introduced in <i>2016 IEEE Wireless Health<\/i>.<sup id=\"rdp-ebb-cite_ref-PathinarupothiRASPRO16_1-0\" class=\"reference\"><a href=\"#cite_note-PathinarupothiRASPRO16-1\">[1]<\/a><\/sup>\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Related_work\">Related work<\/span><\/h3>\n<p>We begin by analyzing the existing systems that simply generate alerts every time one or more sensors cross the abnormality thresholds. Due to the sheer volume of such alerts, they are difficult to manage, even in the case of hospital in-patient settings, let alone for a much larger number of remotely monitored patients. Starting from some of the initial attempts reported by Anliker <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-AnlikerAMON04_2-0\" class=\"reference\"><a href=\"#cite_note-AnlikerAMON04-2\">[2]<\/a><\/sup>, to more recent works from various researchers<sup id=\"rdp-ebb-cite_ref-BaigReal14_3-0\" class=\"reference\"><a href=\"#cite_note-BaigReal14-3\">[3]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-RajevencelthaImprov16_4-0\" class=\"reference\"><a href=\"#cite_note-RajevencelthaImprov16-4\">[4]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-SreejithAReal15_5-0\" class=\"reference\"><a href=\"#cite_note-SreejithAReal15-5\">[5]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-SkubicAuto15_6-0\" class=\"reference\"><a href=\"#cite_note-SkubicAuto15-6\">[6]<\/a><\/sup>, the severity detection and alert generation is typically based either on predefined thresholds, or based on training of thresholds using machine learning followed by online classification of multi-sensor data. Very similar techniques of machine learning have also been used in fall detection.<sup id=\"rdp-ebb-cite_ref-LopesTowards13_7-0\" class=\"reference\"><a href=\"#cite_note-LopesTowards13-7\">[7]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-BalasubramanianDisco16_8-0\" class=\"reference\"><a href=\"#cite_note-BalasubramanianDisco16-8\">[8]<\/a><\/sup> Hristoskova <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-HristoskovaOnto14_9-0\" class=\"reference\"><a href=\"#cite_note-HristoskovaOnto14-9\">[9]<\/a><\/sup> propose another system wherein patient conditions are mapped to medical conditions using ontology-driven methods, and alerts are generated based on corresponding risk stratification. \n<\/p><p>Even though there has been noticeable success in detection and diagnosis of specific disease conditions, most of these works have not explored the opportunity for personalized and precision diagnosis. In an extensive review of Big Data for Health, Andreu-Perez <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-Andreu-PerezBigData15_10-0\" class=\"reference\"><a href=\"#cite_note-Andreu-PerezBigData15-10\">[10]<\/a><\/sup> specifically emphasize the opportunity for stratified patient management and personalized health diagnostics, citing examples of customized blood pressure management.<sup id=\"rdp-ebb-cite_ref-LiuAttenu14_11-0\" class=\"reference\"><a href=\"#cite_note-LiuAttenu14-11\">[11]<\/a><\/sup> More specifically, Bates <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-BatesBigData14_12-0\" class=\"reference\"><a href=\"#cite_note-BatesBigData14-12\">[12]<\/a><\/sup> discuss the utility of using analytics to predict adverse events, which could reduce the associated morbidity and mortality rates. The authors further argue that patient data analytics based on early <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> supplied to the hospital prior to admission can result in better management of staffing and other hospital resources.<sup id=\"rdp-ebb-cite_ref-BatesBigData14_12-1\" class=\"reference\"><a href=\"#cite_note-BatesBigData14-12\">[12]<\/a><\/sup> One of the recent works in personalized criticality detection is reported by Sung <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-SungMobile14_13-0\" class=\"reference\"><a href=\"#cite_note-SungMobile14-13\">[13]<\/a><\/sup>, who propose an analytical unit in which the Improved Particle Swarm Optimization (IPSO) algorithm is used to arrive at patient-specific threat ranges. \n<\/p><p>To improve precision in diagnosis we also need to arrive at a balance between a completely automated system on one hand, and physician assist systems on the other. Celler <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-CellerHome15_14-0\" class=\"reference\"><a href=\"#cite_note-CellerHome15-14\">[14]<\/a><\/sup> propose a balanced approach wherein sophisticated analytics are presented to physicians, who in turn identify the changes and decide on the diagnosis. This is also supported by many results, including those reported by Skubic <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-SkubicAuto15_6-1\" class=\"reference\"><a href=\"#cite_note-SkubicAuto15-6\">[6]<\/a><\/sup>, wherein domain knowledge-based methods performed as well as other trained machine learning models. These arguments and results provide further impetus for personalized, precision, and preventive diagnostic techniques that are amenable to physician interventions.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Methods\">Methods<\/span><\/h2>\n<p>The first significant improvement that we applied is the quantization of every remotely sensed parameter based on its own customized severity boundaries. Sequential time windows of such quantized values are examined for dominant appearances of normal results or abnormalities, as the case may be, and motifs corresponding to them are extracted. Using factors set by doctors, the system then transforms these motifs by generating interventional time alerts as per clinically prescribed protocols. Both the alerts and motifs are amenable to rapid transmission to doctors, even as SMS messages on bare-minimum, bandwidth-starved wide area wireless networks. This results in the generation of more clinically relevant critical information, along with a drastic reduction in reporting every minor aberrational data that may not be indicative of any serious condition, after all. The system does not stop here. The attending doctors, when they view the alerts and\/or motifs, have the luxury to request detailed data on demand (dubbed \"DD-on-D\"), upon which the next level of detail in the data is transmitted. This level of detail could be a straightforward frequency map of normal and abnormal values, or much more intelligent machine learning classifications in the case of proven disease conditions. The heart of our system is a framework called RASPRO (see Fig. 1), consisting of physician assist filters (PAFs) that, in going from data to diagnosis, implement the three Ps: precision, personalization, and prevention. In the following sections we describe each of these three concepts in detail.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" class=\"image wiki-link\" data-key=\"849b6db52d44646675cb269ff2e0b2ce\"><img alt=\"Fig1 Pathinarupothi BMCMedInfoDecMak2018 18.png\" src=\"https:\/\/www.limswiki.org\/images\/8\/84\/Fig1_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 1<\/b> RASPRO-PAF framework. The architecture shows the RASPRO-PAF framework, which progressively converts the raw multi-sensor data into quantized symbols, helpful motifs, diagnostic predictions, and critical alerts.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Personalization_PAF\">Personalization PAF<\/span><\/h2>\n<p>Due to the distributed data gathering and processing architecture, there is an opportunity to enhance personalization in diagnosis and treatment. The first component in the RASPRO framework, the Personalization PAF takes the form of a patient- and disease-condition-specific severity quantizer that converts raw sensor values to a series of clinically relevant severity symbols.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Adaptive_qauntization\">Adaptive qauntization<\/span><\/h3>\n<p>In general, let us consider <i>N<\/i> body sensors, <i>S<\/i><sub>1<\/sub>,<i>S<\/i><sub>2<\/sub>,\u2026,<i>S<\/i><sub><i>N<\/i><\/sub> with varying sensing frequencies <i>f<\/i><sub>1<\/sub>,<i>f<\/i><sub>2<\/sub>,\u2026,<i>f<sub>N<\/sub><\/i>. The raw time series values from these sensors are converted to discrete severity level symbols by the quantizer. The number of severity levels <i>L<sub>i<\/sub><\/i> for a sensor <i>S<sub>i<\/sub><\/i> can be set based on the sensor and many other factors. We assume that different vital parameter sensors have a different number of severity levels, and hence <i>L<\/i><sub>1<\/sub>, say the number of severity levels for a blood pressure sensor, could be equal to five, whereas, <i>L<\/i><sub>2<\/sub> (say oxygen saturation levels) could be equal to seven. In our symbolic notation, the clinically accepted normal values are assigned the symbol \"A,\" while above-normal values are assigned with progressive degrees of severity as \"A+,\" \"A++,\" etc., while that of sub-normal values are assigned \"A-,\" \"A\u2212\u2212,\" etc.; the number of \u201c+\u201d and \u201c-\u201d symbols representing degree of normal and subnormal severity respectively. Figure 2 depicts how various severity levels are arrived at in the Personalization PAF severity quantizer.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" class=\"image wiki-link\" data-key=\"eb97e54389a475a724e2f8d60d1a484a\"><img alt=\"Fig2 Pathinarupothi BMCMedInfoDecMak2018 18.png\" src=\"https:\/\/www.limswiki.org\/images\/1\/10\/Fig2_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 2<\/b> Personalized Quantization. Quantization of sensor data is based on multiple severity categorization criteria, resulting in the generation of patient- and disease-specific quantized values.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The quantized severity symbols are arranged into a patient-specific matrix (PSM) of <i>N<\/i> rows and <i>W<\/i> columns, where <i>N<\/i> is the total number of sensors being observed, and <i>W<\/i> is a time window in which the data is summarized. The value of <i>W<\/i> can be set by a physician or automatically derived based on the risk perception of that particular patient.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Personalization\">Personalization<\/span><\/h3>\n<p>The quantization breadth are decided by doctors based on the patient profile (or history), doctor\u2019s diagnostic interest (for instance, a cardiologist may assign severity ranges differently from that of a nephrologist), severity ranges as suggested by using analytics on a local <a href=\"https:\/\/www.limswiki.org\/index.php\/Hospital_information_system\" title=\"Hospital information system\" class=\"wiki-link\" data-key=\"d8385de7b1f39a39d793f8ce349b448d\">hospital information system<\/a> (HIS), and also based on population analytics across multiple HIS spanning multiple hospitals or even from publicly available databases such as PhysioNet.<sup id=\"rdp-ebb-cite_ref-GoldbergerPhysio00_15-0\" class=\"reference\"><a href=\"#cite_note-GoldbergerPhysio00-15\">[15]<\/a><\/sup> Together, this approach gives ample flexibility in achieving customization in inter-patient, inter-disease, intra-patient, inter-specialty diagnosis from multi-sensor data.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Precision_PAF\">Precision PAF<\/span><\/h2>\n<p>Whereas in most other applications precision directly translates into great detail in data, in remote health monitoring, precision cannot come at a cost of voluminous data presentation to the doctor. Compactness has to be retained. We have developed a step-wise refinement process for precision, which is delivered on-demand to the attending doctor. Step 1 is \u201cConsensus Motifs (CM)\u201d; step 2 is a collection of statistical parameters, including severity frequency maps (SFMs); and step 3 is machine learning (ML). In the first step, motifs corresponding to commonly seen normal results and abnormalities in the severity symbols series are extracted. The outcome of this is two severity summaries: (1) the most frequent trend in sensor data that we call consensus normal motif (CNM), and (2) the most frequently occurring abnormality that we term as consensus abnormality motif (CAM). The construction of this involves the following building blocks:\n<\/p><p><br \/>\n<\/p>\n<ul><li> <b>Candidate symbol<\/b>: <i>\u03b1<\/i>[<i>p<\/i>] is the <i>p<\/i>-th quantized severity symbol in a row of the PSM, <i>\u03b1<\/i>[1],<i>\u03b1<\/i>[2],\u2026,<i>\u03b1<\/i>[<i>p<\/i>],\u2026,<i>\u03b1<\/i>[<i>W<\/i>].<\/li><\/ul>\n<p><br \/>\n<\/p>\n<ul><li> <b>Normal symbol<\/b>: <i>\u03b1<\/i><sub><i>NORM<\/i><\/sub> is a candidate symbol that represents the normal level, and its value is equal to \u201cA\u201d for every sensor.<\/li><\/ul>\n<dl><dd>Now, let the set <i>C<sub>n<\/sub><\/i> denote all the candidate symbols in a <i>W<\/i>-long observation window, corresponding to <i>n<\/i>-th sensor in the PSM. However, we have dropped the subscript <i>n<\/i> for better clarity of discussion.<\/dd><\/dl>\n<dl><dd><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3fb56d1b62ce700f0fa5304067525ba39c901d61'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.838ex; width:35.635ex; height:2.843ex;\" \/><\/span><\/dd><\/dl>\n<dl><dd>Let <i>\u03c3<\/i>[<i>p<\/i>] denote the sum of hamming distances of <i>\u03b1<\/i>[<i>p<\/i>] from all other candidate symbols in <i>C<\/i> such that:<\/dd><\/dl>\n<dl><dd><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/83d3eaf025b8491f2d2f065d726b757107364bfa'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -1.005ex; width:23.772ex; height:3.176ex;\" \/><\/span><\/dd><\/dl>\n<dl><dd>where, <i>D<\/i>(<i>\u03b1<\/i>[<i>p<\/i>],<i>\u03b1<\/i>[<i>i<\/i>]) is the hamming distance of <i>\u03b1<\/i>[<i>p<\/i>] from <i>\u03b1<\/i>[<i>i<\/i>]. Here, we assume that the hamming distance between neighboring severity levels (say, A and A+) is 1. We define a set <i>H<\/i> of all <i>\u03c3<\/i>\u2019s such that:<\/dd><\/dl>\n<dl><dd><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3773d2a4bceb8dea7f34627cac813cc09e082fd8'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.838ex; width:35.3ex; height:2.843ex;\" \/><\/span>.<\/dd><\/dl>\n<p><br \/>\n<\/p>\n<ul><li> <b>Consensus normal symbol<\/b>: <i>\u03b1<sub>CNS<\/sub><\/i>[<i>C<\/i>] is defined as a candidate symbol among all the symbols in <i>C<\/i> that satisfies the following two conditions: (1) its hamming distance from the normal symbol, denoted as <i>D<\/i>(<i>\u03b1<sub>CNS<\/sub><\/i>[<i>C<\/i>],<i>\u03b1<\/i><sub><i>NORM<\/i><\/sub>), is less than a sensor specific near-normal severity threshold <i>S<\/i>[<i>n<\/i>]<i><sub>THRESH<\/sub><\/i>, and (2) its sum of hamming distances from all other candidate symbols in <i>C<\/i> is the minimum. This is formulated as:<\/li><\/ul>\n<dl><dd><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/dfdb77368dca00e704c34a476d76577717398185'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -2.505ex; width:60.28ex; height:6.176ex;\" \/><\/span><\/dd><\/dl>\n<p><br \/>\n<\/p>\n<ul><li> <b>Consensus abnormality symbol<\/b>: <i>\u03b1<sub>CAS<\/sub><\/i>[<i>C<\/i>] is defined as a candidate symbol in <i>C<\/i> that satisfies the following two conditions: its hamming distance from normal symbol <i>D<\/i>(<i>\u03b1<sub>CNS<\/sub><\/i>[<i>C<\/i>],<i>\u03b1<\/i><sub><i>NORM<\/i><\/sub>) is greater than or equal to a sensor specific near-normal severity threshold<i>S<\/i>[<i>n<\/i>]<i><sub>THRESH<\/sub><\/i> and the sum of hamming distances from all other candidate symbols in <i>C<\/i> is the minimum. This is formulated as:<\/li><\/ul>\n<dl><dd><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b25f490d21d34fb43019d2917f2ea86b2e03fdf0'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -2.505ex; width:60.44ex; height:6.176ex;\" \/><\/span><\/dd><\/dl>\n<p><br \/>\n<\/p>\n<ul><li> <b>Consensus normal motif<\/b>: <i>\u03bc<sub>CNM<\/sub><\/i>[<i>P<\/i>] is an ordered sequence of consensus normal symbols belonging to <i>N<\/i> rows in the PSM of a patient <i>P<\/i>, and is represented as <<i>\u03b1<sub>CNS<\/sub><\/i>[<i>C<\/i><sub>1<\/sub>],<i>\u03b1<sub>CNS<\/sub><\/i>[<i>C<\/i><sub>2<\/sub>],\u2026,<i>\u03b1<sub>CNS<\/sub><\/i>[<i>C<\/i><sub><i>N<\/i><\/sub>]>. The <i>n<\/i>-th consensus normal symbol <i>\u03b1<sub>CNS<\/sub><\/i>[<i>C<\/i><sub><i>N<\/i><\/sub>] in <i>\u03bc<sub>CNM<\/sub><\/i>[<i>P<\/i>] can be indexed as <i>\u03bc<sub>CNM<\/sub><\/i>[<i>P<\/i>][<i>n<\/i>].<\/li><\/ul>\n<p><br \/>\n<\/p>\n<ul><li> <b>Consensus abnormality motif<\/b>: <i>\u03bc<sub>CAM<\/sub><\/i>[<i>P<\/i>] is an ordered sequence of consensus abnormality symbols belonging to <i>N<\/i> rows in the PSM of patient <i>P<\/i>, which is represented as <<i>\u03b1<sub>CAS<\/sub><\/i>[<i>C<\/i><sub>1<\/sub>],<i>\u03b1<sub>CAS<\/sub><\/i>[<i>C<\/i><sub>2<\/sub>],\u2026,<i>\u03b1<sub>CAS<\/sub><\/i>[<i>C<\/i><sub><i>N<\/i><\/sub>]>. The <i>n<\/i>-th consensus abnormality symbol <i>\u03b1<sub>CAS<\/sub><\/i>[<i>C<\/i><sub><i>N<\/i><\/sub>] in <i>\u03bc<sub>CAM<\/sub><\/i>[<i>P<\/i>] can be indexed as <i>\u03bc<sub>CAM<\/sub><\/i>[<i>P<\/i>][<i>n<\/i>].<\/li><\/ul>\n<dl><dd>To reiterate in the above formulation, each row of a PSM is considered as an observation window set <i>C<\/i> (corresponding to a summarization time window <i>W<\/i>) to find the corresponding consensus symbols, <i>\u03b1<sub>CNS<\/sub><\/i>[<i>C<\/i>] and <i>\u03b1<sub>CAS<\/sub><\/i>[<i>C<\/i>]. The sequence of these symbols over the <i>N<\/i> rows in a PSM form column vector motifs <i>\u03bc<sub>CNM<\/sub><\/i>[<i>P<\/i>] and <i>\u03bc<sub>CAM<\/sub><\/i>[<i>P<\/i>] (refer to Fig. 3).<\/dd><\/dl>\n<p><br \/>\n<\/p>\n<dl><dd><a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" class=\"image wiki-link\" data-key=\"3ee27e1edaf67a7cb46de35ff951ce90\"><img alt=\"Fig3 Pathinarupothi BMCMedInfoDecMak2018 18.png\" src=\"https:\/\/www.limswiki.org\/images\/d\/dd\/Fig3_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a><\/dd><\/dl>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 3<\/b> RASPRO severity detection, summarization, and AMI calculated using CAMs and sensor specific severity weight matrix. It also shows an AMI-based patient prioritization table that can help physicians in attending to the neediest patient.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>In subsequent steps of Precision PAF, the system generates a frequency map that shows how frequently different multi-sensor parameters have crossed the personalized severity thresholds. Finally, the motif time series is further used as input to proven deep learning (DL) and machine learning (ML) techniques such as long short-term memory (LSTM) recurrent neural networks (RNN)<sup id=\"rdp-ebb-cite_ref-PathinarupothInstant17_16-0\" class=\"reference\"><a href=\"#cite_note-PathinarupothInstant17-16\">[16]<\/a><\/sup> or support vector machines (SVM)<sup id=\"rdp-ebb-cite_ref-ArunanAReal16_17-0\" class=\"reference\"><a href=\"#cite_note-ArunanAReal16-17\">[17]<\/a><\/sup> that could help the doctors in diagnosis. In the next section, we use the above consensus motifs for alert generation to aid in criticality prevention.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Prevention_PAF\">Prevention PAF<\/span><\/h2>\n<p>Implemented as an alert generation technique that uses simple or complex mathematical models, to calculate the amount of time available to the physicians for effective intervention, the Prevention PAF is amenable to changes based on patient, disease, and physician diagnostic interest. The output of the Prevention PAF is an alert measure index (AMI) that is used to prioritize the patients based on their urgency for physicians\u2019 interventional attention.\n<\/p><p>Each severity symbol in a motif also communicates how much time is available with the doctor for deciding an intervention (any if needed). Hence, for each sensor <i>S<\/i><sub>1<\/sub>, <i>S<\/i><sub>2<\/sub>, \u2026, <i>S<sub>N<\/sub><\/i> and its corresponding severity symbol <i>\u03b1<\/i> in <i>\u03bc<sub>CNM<\/sub><\/i>[<i>P<\/i>] and <i>\u03bc<sub>CAM<\/sub><\/i>[<i>P<\/i>] (where <i>\u03b1<\/i> could be A, A+, A-, etc.) we associate it with a corresponding medically accepted intervention time <i>\u03b4<\/i>[<i>S<sub>n<\/sub>][<\/i>\u03b1<i>]. Across different sensors <\/i>S<sub>n<\/sub> for a patient <i>P<\/i>, let us consider <i>\u03b8<\/i>[<i>S<sub>n<\/sub>][<\/i>\u03b1<i>] as a sensor and severity symbol indexed matrix of weights derived from interventional time using the following relationship:<\/i>\n<\/p><p><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1acc39b1922ce311cc0a9ea8ce45682d7a96166c'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -2.671ex; width:20.285ex; height:6.009ex;\" \/><\/span>\n<\/p><p>In the above equation, the constant <i>K<sub>P<\/sub><\/i> can be set by the physician considering the context of a patient\u2019s health condition (including historical medical records and specific sensitivities and vulnerabilities documented therein) or derived through machine learning techniques. The above equation may be substituted by more complex equations for progressively complicated disease conditions.\n<\/p><p>At the end of each observation time-window <i>W<\/i>, for every patient <i>P<\/i>, we also define an aggregate criticality alert score, called the Alert Measure Index (AMI), which is calculated as:\n<\/p><p><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c638ccff92c220801f11af05931f1bb05c12dcc2'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.748ex; margin-bottom: -0.256ex; width:54.243ex; height:3.176ex;\" \/><\/span>\n<\/p><p>wherein, each severity quantized symbol in the <i>\u03bc<sub>CAM<\/sub><\/i>[<i>P<\/i>] of the <i>n<\/i>-th sensor is converted into a numerical value (e.g., A\u00b1 is assigned 1, A++ or A\u2212\u2212 is assigned 2) using <i>num<\/i>(<i>\u03bc<sub>CAM<\/sub><\/i>[<i>P<\/i>][<i>n<\/i>]), and scales it up by the sensor-severity specific weight <i>\u03b8<\/i>[<i>S<sub>n<\/sub>][<\/i>\u03b1<i>] (as defined just prior). The resulting AMI is indicative of the immediacy of patient priority for physician\u2019s consultative attention. The process of motif detection, AMI calculation, and patient prioritization is summarized in Fig. 3. The data used to arrive at the AMI scores could be other statistical parameters (such as frequency maps) or machine learning prediction scores. Also, the technique for calculating the score may also be based on predefined simple mathematical models or complex machine learning algorithms.<\/i>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Clinical_relevance_and_validation\">Clinical relevance and validation<\/span><\/h2>\n<p>In October 2016, the RASPRO framework was introduced to doctors in multiple specialties in our super-specialty hospital, wherein they validated its clinical deployment applications. We present some of the specific clinical scenarios that emerged from this pilot study.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Cardiology\">Cardiology<\/span><\/h3>\n<p>The electrocardiogram is a potential indicator of cardiac events and can be exploited for personalized and precision diagnosis by varying the parametric thresholds and summarization window, based on patient profile\/disease condition and associated factors. For instance, taking into account the disease condition, a 3mm depression in the ST segment would be graded as A++ for an active patient having exertion related chest pain, indicating cardiac ischemia, whereas the same if occurred in a patient at rest, would be graded as A+++ with limited time of intervention (30 min), indicating cardiac muscle death. To extend the spectrum of diseases that ST segment depression would cover, a chronic hypertensive with left ventricular hypertrophy of the heart (and no chest pain) would also presumably have a continuous 3mm dip in the ST segment which does not require any interventional attention, and hence, would be graded as A\/A+ (near normal) by the severity quantizer. Next, taking into account the patient profile, in sedentary workers, aged above 45 having smoking habit, with high cholesterol levels and other associated risks, the thresholds will be low (A+, A++, and A+++ would be assigned to 1\u20132mm, 2\u20133mm, and above 3mm ST depression respectively), while in highly active but risk patients with age less than 45, and no previous associated history, the levels will be high (A+, A++, and A+++ would correspondingly be assigned to 2\u20133mm, 3\u20133.5mm, and above 3.5mm respectively). Also, in the former case the summarization window <i>W<\/i> (capturing how long ST depression sustains) would be 3\u20134 minutes (more critical), whereas in the latter it would be 7\u20139 minutes.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Pulmonology\">Pulmonology<\/span><\/h3>\n<p>Simple but vital parameters such as oxygen saturation levels in the body (SpO2), blood pressure (BP), heart rate variability (HRV), and respiratory rate variability (RRV), present in unique combinations, would facilitate differentiating between benign diseases such as interstitial lung disease\/sleep apnea for which the thresholds for alert (set through the interventional time constant KP) will be fairly high, and emergencies such as pulmonary edema\/pulmonary embolism (blood clot in an artery in the lung) for which the thresholds will be kept low if any of the predisposing factors such as left heart failure, pulmonary hypertension, prolonged immobilization, pregnancy, etc. are present. Hence, the physician would preset these combinations of vitals to be looked for as sequence of symbols in the CAM. Since the number of parameters that could be picked up to indicate disease are a few, it is pertinent that stepwise precision techniques such as machine learning algorithms be used for distinguishing between closely mimicking conditions. Obstructive sleep apnea and chronic obstructive pulmonary disease provide solid examples, both of which would show similar trends in SpO2, BP, HRV, and RRV. In a trial that was conducted at our hospital, we were able to achieve 99% precision in diagnosing sleep apnea from HRV using a deep learning algorithm called long short-term memory recurrent neural networks (LSTM-RNN) and is reported in one of our previous works.<sup id=\"rdp-ebb-cite_ref-PathinarupothInstant17_16-1\" class=\"reference\"><a href=\"#cite_note-PathinarupothInstant17-16\">[16]<\/a><\/sup> The algorithm evaluation was done using the multi-sensor patient data from the Physionet Challenge 2000<sup id=\"rdp-ebb-cite_ref-PenzelTheAp00_18-0\" class=\"reference\"><a href=\"#cite_note-PenzelTheAp00-18\">[18]<\/a><\/sup>, which contained annotated data from 35 patients who underwent overnight sleep study.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Neurology\">Neurology<\/span><\/h3>\n<p>One of the early markers of autonomic neuropathy in epileptic patients is the discrepancy between the BP and the pulse rate of the patients. In this scenario, the severity levels of BP and pulse rate would be set accordingly (as a combination) to alert the practitioner. Suppose S1 is BP and S2 is heart rate sensor respectively. Let us say for the patient P1, <i>\u03bc<sub>CAM<\/sub><\/i>[<i>P<\/i>1]=<<i>A\u2212\u2212<\/i>,<i>A<\/i>>, and for patient P2, <i>\u03bc<sub>CAM<\/sub><\/i>[<i>P<\/i>]=<<i>A\u2212<\/i>,<i>A+<\/i>>. In both cases, the diagnosis, alert level, and treatment vary because P1 has BP decline with no change in heart rate (critical), while P2 has a compensatory increase in heart rate, which indicates good autonomic function.\n<\/p><p>Though these are representative clinical scenarios, we found wide agreement among the doctors from other specialties too that personalization, step-wise precision, and prevention introduced through the RASPRO framework is of high utility in remote monitoring and critical alert generation.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Results_and_discussion\">Results and discussion<\/span><\/h2>\n<p>In order to quantitatively evaluate the effectiveness of RASPRO, we measure both the diagnostic ability as well as the preventive predictive power of this technique. We formulate three hypotheses and evaluate the effectiveness of RASPRO in satisfying these:\n<\/p>\n<ul><li> Precision hypothesis: RASPRO consensus motif time series can replace raw sensor data time series for the task of identification\/classification of specific disease conditions.<\/li><\/ul>\n<ul><li> Prevention hypothesis: RAPSRO-based consensus motifs can predict future disease condition with as much accuracy as raw sensor data time series.<\/li><\/ul>\n<ul><li> Personalization hypothesis: There exists an inter-patient variability in severity levels and summarization frequencies, which if optimized individually can result in better accuracy in predicting\/classifying a specific disease condition.<\/li><\/ul>\n<p>By assessing the validity of the first hypotheses, we aim to evaluate the extent to which RASPRO motifs can provide precision in diagnostics. The second hypothesis evaluates the utility of RASPRO as a tool for predictive analytics in critical conditions, while the third hypothesis helps us understand if there exists a case for personalization in disease discovery and prediction.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Dataset\">Dataset<\/span><\/h3>\n<p>The first step to evaluate these hypotheses is to identify datasets that are extensive, long-term and critically significant. We used a large time series dataset from the MIMIC II database<sup id=\"rdp-ebb-cite_ref-SaeedMulti11_19-0\" class=\"reference\"><a href=\"#cite_note-SaeedMulti11-19\">[19]<\/a><\/sup>, which contains multiple body sensor values from over 20,000 ICU patients. This dataset consists of ECG, ABP (Arterial Blood Pressure), Heart Rate (HR), Non-obtrusive BP (NBP), SpO2, Mean Arterial BP (MAP), and other vital signs. From this, we selected a curated set of patient and control group data that contained a long time series data followed by a critical event. We selected patients with acute hypotensive episodes (AHE), which is a potentially fatal condition, found quite common in ICUs as well as caused due to postural hypotension. An AHE event is analytically identified as when MAP measurements remain below 60 mmHg for more than 30 minutes. This is a potentially fatal event and requires immediate intervention. We also made sure that the dataset provides uninterrupted MAP signal with a minimum sampling rate of one per minute, over at least three hours for both the event-patients as well as the control group. We selected a group of 35 patients (called group H) who had AHE during some time during their stay in ICU, and another 35 patients (called group G) who did not have AHE during their ICU stay. This dataset was selected from the PhysioNet<sup id=\"rdp-ebb-cite_ref-GoldbergerPhysio00_15-1\" class=\"reference\"><a href=\"#cite_note-GoldbergerPhysio00-15\">[15]<\/a><\/sup> Challenge 2009.<sup id=\"rdp-ebb-cite_ref-MoodyPredict09_20-0\" class=\"reference\"><a href=\"#cite_note-MoodyPredict09-20\">[20]<\/a><\/sup> The H dataset also had a time marker <i>t<\/i><sub>0<\/sub>, after which AHE occurred in that patient within a one-hour window. Since the data was obtained from publicly available sources, we did not require getting prior approval of IRB for this work.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Evaluating_precision_hypothesis\">Evaluating precision hypothesis<\/span><\/h3>\n<p>The first task is to measure the replaceability of the original time series data with the quantized symbols and consensus motifs. To evaluate this, the H and G group time series data comprising of mean arterial pressure (MAP), of length 60 minutes after <i>t<\/i><sub>0<\/sub> are modeled as feature vectors of length 60. These vectors are called original time series (OTS) and are used for training an SVM model for classifying the data as having AHE or not. The vectors belonging to AHE were labelled as H, and G otherwise. After using OTS, we then generate quantized time series (QTS) vectors with different quantization breadth. The quantization breadth (denoted by B) are varied as 5, 10, 15, and 20. For instance, when B=10, each of the OTS MAP values between 60 mmHg and 50 mmHg are quantized into the same severity symbol, say \u201cA-,\u201d whereas for B=5, the symbol \u201cA-\u201d quantizes all OTS MAP values between 60 mmHg and 55 mmHg. These vectors are used in similar manner to first train and then test the SVM model. Finally, we generate the corresponding motif time series (MTS) for each of the QTS, with varying the summarization time window W as 5, 10, and 15. The value of W corresponds to the time window in which all the severity symbols in the QTS are converted to a single consensus symbol. A comparison of OTS, QTS, and MTS is done using the statistical measure of binary classification, the F-score. An F-score (also called <i>F<\/i><sub>1<\/sub>-score) is calculated as:\n<\/p><p><br \/>\n<span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/904cd611f67060e1fb653cecd01468c3b793464a'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -2.671ex; width:36.42ex; height:6.509ex;\" \/><\/span>\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Significant_results\">Significant results<\/span><\/h4>\n<p>The F1-scores for the SVM models are summarized in Fig. 4. It shows that OTS-based SVM model gave an F1-score of 0.76, which is the gold standard that we compare other models with. The QTS- and MTS-based SVM models were able to perform as well as OTS in most of the cases. Furthermore, MTS with (B=10 and W=15) and (B=20 and W=5,10,15) performed better than the OTS in the classification problem. In fact, these MTS models showed more than 12% better F1-score compared to OTS. These results support the precision hypothesis that motif time series can replace original time series data for the task of identification\/classification of specific disease conditions, in this case AHE.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" class=\"image wiki-link\" data-key=\"b61dfbaed52be631b039501fef50e60d\"><img alt=\"Fig4 Pathinarupothi BMCMedInfoDecMak2018 18.png\" src=\"https:\/\/www.limswiki.org\/images\/9\/9c\/Fig4_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 4<\/b> AHE classification F1-score The F1-scores of SVM models trained and tested for classifying the given 60 minutes of data as AHE or not using OTS, QTS (B=5,10,15,20), and MTS (W=5,10,15). It shows that QTS and MTS with different B and W values are able to classify the AHE signal with F1-scores that are better than one obtained using OTS SVM model.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Evaluating_prevention_hypothesis\">Evaluating prevention hypothesis<\/span><\/h3>\n<p>The next evaluation parameter of RASPRO is to identify if <i>a priori<\/i> motif series could predict future disease condition, and thereby aid in preventive intervention. For this, the H and G group time series data comprising of mean arterial pressure (MAP), of length T minutes prior to <i>t<\/i><sub>0<\/sub> is modeled as a T-long feature vector (OTS). These vectors are used for training (using 70% data, with five-fold cross validation) and testing (using 30% data) an SVM model for predicting them as AHE or not, where patients belonging to H group are annotated as having AHE and G group patients are annotated otherwise. In effect, we try to classify sensor data prior to an AHE event as a predictor for ensuing an AHE condition. Since G group data did not have a time marker <i>t<\/i><sub>0<\/sub>, we selected a random but continuous time series of length T from each of the G group patients. SVM was selected due to its widely accepted performance in classification problems involving multiple features, although we might obtain comparable results using other classification techniques too.\n<\/p><p>The backward offset time T (from <i>t<\/i><sub>0<\/sub>) is varied as 30, 60, 90, 120, 150, and 180 minutes as an expanding window. In the next step, the raw feature vectors are quantized using severity quantizer to form a quantized time series (QTS). Once again, the quantization breadth B is varied as 5, 10, 15, and 20. In the third step, the QTS are summarized and motifs extracted to form a RASPRO motif time series (MTS), with varying observation time window sizes W: 5, 10 and 15 minutes. The QTS and MTS are then given as input to train and test the SVM model (one for QTS and another for MTS) for predicting AHE before its onset.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Significant_results_2\">Significant results<\/span><\/h4>\n<p>From the comparative analysis of OTS and QTS (Fig. 5), we observe that QTS with B=15 has better F1-score in comparison to OTS in all the time-offsets T, although the root mean square error (RMSE) between these two series is an insignificant 0.001, pointing to the fact that OTS could be replaced with QTS. We select this QTS (B=15) and then compare it with MTS of varying time windows in Fig. 6. We observe from Fig. 6 that QTS has a higher F1-score compared to the best MTS with W=10. However, RMSE between QTS and MTS (W=10 and W=15) is a statistically insignificant value of 0.01, which implies that MTS using W=10 and 15 performs as well as QTS on an average across different time windows. Now, we further compare the OTS against the best performing B and W values corresponding to QTS and MTS respectively, and the results are plotted in Fig. 7. These data points are marked as QTSmax and MTSmax respectively. In Fig. 7, QTSmax and MTSmax show closely similar F1-score with the RMSE as 0.018, which could be considered statistically insignificant.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig5_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" class=\"image wiki-link\" data-key=\"c85696badf57620cb23b33343752171e\"><img alt=\"Fig5 Pathinarupothi BMCMedInfoDecMak2018 18.png\" src=\"https:\/\/www.limswiki.org\/images\/2\/23\/Fig5_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 5<\/b> Expanding Window: OTS Vs. QTS Comparison of F1-score of OTS and QTS for classification of AHE using expanding time windows shows better performance of QTS with B=15.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p><a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig6_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" class=\"image wiki-link\" data-key=\"93199055d62c312c9c6ad6a5c474dec2\"><img alt=\"Fig6 Pathinarupothi BMCMedInfoDecMak2018 18.png\" src=\"https:\/\/www.limswiki.org\/images\/d\/d7\/Fig6_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 6<\/b> Expanding Window: QTS Vs. MTS. Comparison of F1-score of QTS (B=15) and MTS (varying W) for classification of AHE using expanding time windows.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p><a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig7_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" class=\"image wiki-link\" data-key=\"4d3264a597e820ee6441c01fd1290c59\"><img alt=\"Fig7 Pathinarupothi BMCMedInfoDecMak2018 18.png\" src=\"https:\/\/www.limswiki.org\/images\/b\/b8\/Fig7_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 7<\/b> Expanding Window: QTSmax Vs. MTSmax. Comparison of F1-score of OTS with QTSmax and MTSmax corresponding to best performing B and W values respectively for classifying AHE using expanding time windows.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Going further, we used data from a moving time window of 30 minutes each, instead of an expanding window. This simulates the situation when we obtain data for 30 minutes alone and are required to classify it as a predictor for AHE. Here, we do not have the luxury of having data till <i>t<\/i><sub>0<\/sub>, as the 30 minutes slice of data could be from anywhere up to three hours before <i>t<\/i><sub>0<\/sub>. We show in Fig. 8 the comparative analysis of OTS against the best B and W values corresponding to QTS and MTS in the moving window experiment. The results plotted in Fig. 8 show that MTS and QTS perform better than OTS in most of the time intervals, while the RMSE between MTS and QTS is 0.018 on an average. The results comparing QTS with different B values against MTS with different W values are given in Additional files 1\u20135 (at the end of this paper).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig8_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" class=\"image wiki-link\" data-key=\"b4998085a8348748693b0e944a1a907b\"><img alt=\"Fig8 Pathinarupothi BMCMedInfoDecMak2018 18.png\" src=\"https:\/\/www.limswiki.org\/images\/7\/79\/Fig8_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 8<\/b> Moving Window: QTSmax Vs. MTSmax. Comparison of F1-score of OTS with QTSmax and MTSmax corresponding to best performing B and W values respectively for classifying AHE using moving window of 30 minutes duration.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>From these results, we can conclude that quantized symbols, as well as summarized motifs, are as good as (or better in many cases) compared to raw time series in identifying predictors for AHE, both in expanding and moving windows, thereby supporting our prevention hypothesis.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Evaluating_personalization_hypothesis\">Evaluating personalization hypothesis<\/span><\/h3>\n<p>The third hypothesis aims to find out if there are patient specific custom severity levels, and summarization frequencies, which if optimized could lead to better accuracy in diagnosis. For this, we further analyze our earlier results. We observe from Figs. 7 and 8 that by selecting different severity quantization breadth (B) and through varying the summarization window size (W), we are able to predict the onset of AHE with a higher F1-score. This supports an argument for using disease and time-specific B and W values for achieving better accuracy in classification problems. We observe very similar results in Fig. 4, which shows that by choosing optimized W and B values, the machine learning models can perform better in classification problems too. These results further support our third hypothesis, that there exists an opportunity for personalization at least at disease specific and time specific level. Though the above experiments using AHE are only representative of how step-wise precision, personalization, and prevention can be achieved using RASPRO, the practitioners as a whole agree that in wide-ranging scenarios patient-sensor-disease-time specific severity levels need to be defined that is both practical to manage alerts as well as effective in identifying emergencies.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Global_health_deployment\">Global health deployment<\/span><\/h3>\n<p>These medical benefits of RASPRO framework would contribute directly to fulfill the primary goals of remote health monitoring in a global health scenario. We call these benefits the \"3As\": availability, accessibility, and affordability.\n<\/p>\n<ul><li> Availability: By enabling the doctors to prioritize their time based on the AMI, we effectively increase the availability of doctors for the neediest of remote patients.<\/li><\/ul>\n<ul><li> Accessibility: A patient\u2019s summarized health status represented by the consensus motifs could be sent over even bare-minimum communication networks (for instance, in the form of SMS). The clinically validated RASPRO motifs would then enable the doctors to use it instead of voluminous raw sensor data for arriving at timely diagnosis. In addition, by providing step-wise precision through detailed data-on-demand (DD-on-D), the doctors can choose to get more data if needed. Together, these techniques, as illustrated in Fig. 9, increase the accessibility of patients to quality and critical remote healthcare services.<\/li><\/ul>\n<p><br \/>\n<\/p>\n<dl><dd><a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig9_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" class=\"image wiki-link\" data-key=\"963c3dc39176645eab7a92453aeb5e27\"><img alt=\"Fig9 Pathinarupothi BMCMedInfoDecMak2018 18.png\" src=\"https:\/\/www.limswiki.org\/images\/e\/ed\/Fig9_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a><\/dd><\/dl>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 9<\/b> Detailed Data-on-Demand. The DD-on-D technique as implemented in RASPRO-PAF framework enables patient\u2019s multi-sensor data to be sent over even SMS to remote doctors, who then initiate emergency intervention through telemedicine units stationed near to the patients location.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<ul><li> Affordability: Remote health monitoring combined with timely criticality detection can substantially reduce the healthcare costs, by reducing the number of unnecessary hospital visits and smartly managing the available time of doctors who could focus on the neediest of patients. For instance, in a developing country the patients could be spending anywhere between $4-5 for travelling to the nearest hospital. Combined with the loss of their daily wages due to taking a break from their work, the cost to the patient for a hospital visit could be around $10-20 per day, not including the consultation charges (which ranges between $5-10 per visit). Through an initial survey of the patients visiting the cardiology department in our hospital it was observed that a majority of the patients do not, at the end of examination, have a cardiac disease. These could well have been diagnosed as such using remote monitoring of their vital parameters and hence avoid unnecessary hospital visits. Also, for a majority of revisiting patients, the visits could have been avoided using remote monitoring.<\/li><\/ul>\n<p>These advantages would help bring quality healthcare to millions of people who are currently under-served in the global health scenario. We are readying for large-scale deployment of the RASPRO framework, including the \"3P\" RASPRO-PAF analytical tools using a network of more than 45+ <a href=\"https:\/\/www.limswiki.org\/index.php\/Telemedicine\" title=\"Telemedicine\" class=\"wiki-link\" data-key=\"d2cc9ab69dbfb679bcee20472b08fe93\">telemedicine<\/a> nodes (as shown in Fig. 10) and remote health centers across the Indian sub-continent, which are connected to the AIMS hospital.\n<\/p><p><br \/>\n<\/p>\n<dl><dd><a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig10_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" class=\"image wiki-link\" data-key=\"98a14697e61d7138a4a478b481876722\"><img alt=\"Fig10 Pathinarupothi BMCMedInfoDecMak2018 18.png\" src=\"https:\/\/www.limswiki.org\/images\/8\/8f\/Fig10_Pathinarupothi_BMCMedInfoDecMak2018_18.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a><\/dd><\/dl>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 10<\/b> Global health deployment. The RASPRO-PAF system is being readied for deployment using the telemedicine network of AIMS hospital, which has more than 45 remote nodes spread across India and Africa, connected through a satellite network.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Practitioner education is one of the key challenges in global deployment of any new data analytics technique. To ensure usability of the system, we have involved the doctors from the design and conceptual phase of the RASPRO-PAF system. In order to provide hands-on training, acceptability, and experience in the use of the data analytics techniques, we also aim to introduce these to all the practitioners as part of the annual continuing medical education (CME) program.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Challenges_and_drawbacks\">Challenges and drawbacks<\/span><\/h4>\n<p>One of the major drawbacks of any severity detection and summarization technique is the risk of missing important data. In the RASPRO technique, we try to mitigate some of these risks by providing a graded information flow from the multiple sensors to the doctors. The alerts are calculated based on patient- and disease-specific quantization and threshold levels. Hence, the chances of generation of unnecessary alerts are low. On the other hand, upon receiving these alerts the doctors can further request for detailed data on demand (DD-on-D), using which the doctors can see actual sensor values, the calculated motifs, the frequency maps, as well as any other machine-learning-based assistive diagnosis. This provides the flexibility to doctors and emergency responders to obtain complete view of the patient condition before deciding upon any intervention. However, any such system is also fraught with the danger of system failures that could jeopardize the patient\u2019s life, though this could be overcome to a large extent by developing robust hardware and fail-safe firmware. We are also aware that a thorough cost-risk-benefit analysis needs to be carried out before any wide scale deployment. Apart from these, in developing countries there are implementation gaps that need to be addressed which include: (a) intermittent and unreliable mobile connectivity in rural regions, (b) capturing and transmission of data while the patient is mobile, (c) power management in edge devices such as mobile phones to ensure timely processing and transmission of data, (d) whether to do the RASPRO-PAF processing at the edge or in the cloud, and (e) efficient management of remote patient monitoring through educating the support staff in hospitals.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusion\">Conclusion<\/span><\/h2>\n<p>In this paper, we have reported on the successful design, development, and deployment of a set of \"3P\" tools for healthcare data analytics, called RASPRO-PAFs that transform voluminous physiological sensor data into meaningful motifs using personalized disease severity levels. These motifs have been found to be as effective as, or in many cases better than, the raw sensor data in the identification and prediction of critical conditions in patients. Through a step-wise precision process, the doctors can gain further insight into the medical condition of the patient, progressively using quantized symbols, motifs, frequency maps, and machine learning. Furthermore, the criticality of a patient is analyzed from these motifs using a novel interventional time relationship that helps doctors prioritize their time more efficiently. Together, the 3P PAFs helps in personalized, precision, and preventive diagnosis of the patients. We have also clinically validated the efficacy of the system using both doctor feedback from the hospital as well as using machine learning techniques. Given the initial acceptance of this tool among the medical community, we are preparing for testing and evaluation in other medical domains, as well as large-scale field deployment in a global health scenario.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Abbreviations\">Abbreviations<\/span><\/h2>\n<p><b>3P<\/b>: Precision, personalization, and prevention\n<\/p><p><b>ABP<\/b>: Arterial blood pressure\n<\/p><p><b>AHE<\/b>: Acute hypotensive episode\n<\/p><p><b>AIMS<\/b>: Amrita institute of medical science\n<\/p><p><b>AMI<\/b>: Alert measure index\n<\/p><p><b>BP<\/b>: Blood pressure\n<\/p><p><b>CM<\/b>: Consensus motifs\n<\/p><p><b>CAM<\/b>: Consensus abnormality motif\n<\/p><p><b>CNM<\/b>: Consensus normality motif\n<\/p><p><b>DD-on-D<\/b>: Detailed data on demand\n<\/p><p><b>DL<\/b>: Deep learning\n<\/p><p><b>ECG<\/b>: Electrocardiogram\n<\/p><p><b>HIS<\/b>: Hospital information system\n<\/p><p><b>HR<\/b>: Heart rate\n<\/p><p><b>HRV<\/b>: Heart rate variability\n<\/p><p><b>ICU<\/b>: Intensive care unit\n<\/p><p><b>IoT<\/b>: Internet of things\n<\/p><p><b>IPSO<\/b>: Improved particle swarm optimization\n<\/p><p><b>LSTM<\/b>: Long short term memory\n<\/p><p><b>MAP<\/b>: Mean arterial pressure\n<\/p><p><b>MIMIC<\/b>: Multiparameter intelligent monitoring in intensive care\n<\/p><p><b>ML<\/b>: Machine learning\n<\/p><p><b>MTS<\/b>: Motif time series\n<\/p><p><b>NBP<\/b>: Non-obtrusive blood pressure\n<\/p><p><b>OTS<\/b>: Original time series\n<\/p><p><b>PAF<\/b>: Physician assist filter\n<\/p><p><b>PSM<\/b>: Patient specific matrix\n<\/p><p><b>QTS<\/b>: Quantized time series\n<\/p><p><b>RASPRO<\/b>: Rapid active summarization for effective PROgnosis\n<\/p><p><b>RMSE<\/b>: Root mean squared error\n<\/p><p><b>RRV<\/b>: Respiratory rate variability\n<\/p><p><b>SFM<\/b>: Severity frequency maps\n<\/p><p><b>SpO2<\/b>: Peripheral capillary oxygen saturation\n<\/p><p><b>SVM<\/b>: Support vector machine\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Additional_files\">Additional files<\/span><\/h2>\n<p><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/static-content.springer.com\/esm\/art%3A10.1186%2Fs12911-018-0658-y\/MediaObjects\/12911_2018_658_MOESM1_ESM.png\" data-key=\"92e2bc5db680885c952e68f4925572e4\">Additional file 1<\/a>: Moving Window OTS Vs. QTS. The figure shows the F1-score while using a moving window of size 30 mins with varying backward offset from t0. The results show that QTS is always better than OTS in classifying a given window as predictor for AHE or not. (PNG 28 kb)\n<\/p><p><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/static-content.springer.com\/esm\/art%3A10.1186%2Fs12911-018-0658-y\/MediaObjects\/12911_2018_658_MOESM2_ESM.png\" data-key=\"a266cc34d316f759e46c186d005d8349\">Additional file 2<\/a>: Moving Window QTS (B=5) Vs. MTS. The figure shows the F1-score comparison of QTS with B=5 and MTS while using a moving window of size 30 mins with varying backward offset from t0. The results show that MTS is better than QTS except in two time slots. (PNG 21 kb)\n<\/p><p><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/static-content.springer.com\/esm\/art%3A10.1186%2Fs12911-018-0658-y\/MediaObjects\/12911_2018_658_MOESM3_ESM.png\" data-key=\"2b66493769c4aac276a01fbd0122daad\">Additional file 3<\/a>: Moving Window QTS (B=10) Vs. MTS. The figure shows the F1-score comparison of QTS with B=10 and MTS while using a moving window of size 30 mins with varying backward offset from t0. The results show that MTS is better than QTS except in two time slots, and also W=10 and W=15 are better summarization windows. (PNG 22 kb)\n<\/p><p><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/static-content.springer.com\/esm\/art%3A10.1186%2Fs12911-018-0658-y\/MediaObjects\/12911_2018_658_MOESM4_ESM.png\" data-key=\"d008ddc3db38da29af4c0ba173e3f9ca\">Additional file 4<\/a>: Moving Window QTS (B=15) Vs. MTS. The figure shows the F1-score comparison of QTS with B=15 and MTS while using a moving window of size 30 mins with varying backward offset from t0. The results show that MTS is better than QTS except in two time slots, and also W=10 and W=15 are better summarization windows. (PNG 21 kb)\n<\/p><p><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/static-content.springer.com\/esm\/art%3A10.1186%2Fs12911-018-0658-y\/MediaObjects\/12911_2018_658_MOESM5_ESM.png\" data-key=\"b1805a07c5935baf86f4ea53f8ea2937\">Additional file 5<\/a>: Moving Window QTS (B=20) Vs. MTS. The figure shows the F1-score comparison of QTS with B=20 and MTS while using a moving window of size 30 mins with varying backward offset from t0. The results show that QTS is marginally better than MTS in four time slots. (PNG 22 kb)\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Declarations\">Declarations<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h3>\n<p>We deeply thank the support and inspiration provided by the Chancellor of Amrita University, Mata Amritanandamayi Devi (known as \u201cAmma\u201d). This project has materialized due to her constant guidance. We also thank Dr. P Venkat Rangan, who has contributed to the idea of 3P platform as well as gave important inputs to the manuscript. We thank the doctors at AIMS hospital, who has helped us in initial clinical use case analysis of the system.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h3>\n<p>This work was not supported by any funding organization and hence not applicable.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Availability_of_data_and_materials\">Availability of data and materials<\/span><\/h3>\n<p>The datasets analysed during the current study are available in the MIMIC II Waveform repository: <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/physionet.org\/physiobank\/database\/mimic2db\/\" data-key=\"ddb3b6d7cb5fce79e226c5add171ce0e\">https:\/\/physionet.org\/physiobank\/database\/mimic2db\/<\/a>.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Authors.E2.80.99_contributions\">Authors\u2019 contributions<\/span><\/h3>\n<p>RKP and ESR designed the RASPRO-PAF 3P architecture. RKP analyzed and interpretted the results, and was also a major contributor in writing the manuscript. PD conducted the experiments and analysed the results. ESR interpretted and applied the algorithms on clinical cases, and was also a major contributor in writing the manuscript. All authors read and approved the final manuscript.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Ethics_approval_and_consent_to_participate\">Ethics approval and consent to participate<\/span><\/h3>\n<p>The data used in this study was obtained from a publicly available anonymized dataset, the MIMIC II Waveform repository, and hence did not require any independent\/separate ethics approval or consent to participate from any of the patients.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h3>\n<p>The authors declare that they have no competing interests.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-PathinarupothiRASPRO16-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PathinarupothiRASPRO16_1-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pathinarupothi, R.K.; Rangan, E.S.; Alangot, B. et al. (2016). \"RASPRO: Rapid summarization for effective prognosis in wireless remote health monitoring\". <i>2016 IEEE Wireless Health<\/i>: 1\u20136. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FWH.2016.7764566\" data-key=\"2604138155301401280a4eb7291d615c\">10.1109\/WH.2016.7764566<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=RASPRO%3A+Rapid+summarization+for+effective+prognosis+in+wireless+remote+health+monitoring&rft.jtitle=2016+IEEE+Wireless+Health&rft.aulast=Pathinarupothi%2C+R.K.%3B+Rangan%2C+E.S.%3B+Alangot%2C+B.+et+al.&rft.au=Pathinarupothi%2C+R.K.%3B+Rangan%2C+E.S.%3B+Alangot%2C+B.+et+al.&rft.date=2016&rft.pages=1%E2%80%936&rft_id=info:doi\/10.1109%2FWH.2016.7764566&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AnlikerAMON04-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AnlikerAMON04_2-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Anliker, U.; Ward, J.A.; Lukowicz, P. (2004). \"AMON: A wearable multiparameter medical monitoring and alert system\". <i>IEEE Transactions on Information Technology in Biomedicine<\/i> <b>8<\/b> (4): 415\u201327. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/15615032\" data-key=\"7d8f487e65961643920ed390d5ed305a\">15615032<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=AMON%3A+A+wearable+multiparameter+medical+monitoring+and+alert+system&rft.jtitle=IEEE+Transactions+on+Information+Technology+in+Biomedicine&rft.aulast=Anliker%2C+U.%3B+Ward%2C+J.A.%3B+Lukowicz%2C+P.&rft.au=Anliker%2C+U.%3B+Ward%2C+J.A.%3B+Lukowicz%2C+P.&rft.date=2004&rft.volume=8&rft.issue=4&rft.pages=415%E2%80%9327&rft_id=info:pmid\/15615032&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BaigReal14-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BaigReal14_3-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Baig, M.M.; GholamHosseini, H.; Connolly, M.J. et al. (2014). \"Real-time vital signs monitoring and interpretation system for early detection of multiple physical signs in older adults\". <i>Proceeding from the IEEE-EMBS International Conference on Biomedical and Health Informatics<\/i>: 355\u20138. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FBHI.2014.6864376\" data-key=\"bcb761acbd44e283f0c4dae83953cb47\">10.1109\/BHI.2014.6864376<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Real-time+vital+signs+monitoring+and+interpretation+system+for+early+detection+of+multiple+physical+signs+in+older+adults&rft.jtitle=Proceeding+from+the+IEEE-EMBS+International+Conference+on+Biomedical+and+Health+Informatics&rft.aulast=Baig%2C+M.M.%3B+GholamHosseini%2C+H.%3B+Connolly%2C+M.J.+et+al.&rft.au=Baig%2C+M.M.%3B+GholamHosseini%2C+H.%3B+Connolly%2C+M.J.+et+al.&rft.date=2014&rft.pages=355%E2%80%938&rft_id=info:doi\/10.1109%2FBHI.2014.6864376&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RajevencelthaImprov16-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RajevencelthaImprov16_4-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Rajevenceltha, J.; Kumar, C.S.; Kimar, A.A. (2016). \"Improving the performance of multi-parameter patient monitors using feature mapping and decision fusion\". <i>Proceedings from the 2016 IEEE Region 10 Conference<\/i>: 1515\u20138. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FTENCON.2016.7848268\" data-key=\"35ca051f271e2e857ea92e76d99b6112\">10.1109\/TENCON.2016.7848268<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+the+performance+of+multi-parameter+patient+monitors+using+feature+mapping+and+decision+fusion&rft.jtitle=Proceedings+from+the+2016+IEEE+Region+10+Conference&rft.aulast=Rajevenceltha%2C+J.%3B+Kumar%2C+C.S.%3B+Kimar%2C+A.A.&rft.au=Rajevenceltha%2C+J.%3B+Kumar%2C+C.S.%3B+Kimar%2C+A.A.&rft.date=2016&rft.pages=1515%E2%80%938&rft_id=info:doi\/10.1109%2FTENCON.2016.7848268&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SreejithAReal15-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SreejithAReal15_5-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sreejith, S.; Rahul, S.; Jisha, R.C. (2016). \"A Real Time Patient Monitoring System for Heart Disease Prediction Using Random Forest Algorithm\". <i>Advances in Signal Processing and Intelligent Recognition Systems<\/i> <b>425<\/b>: 485\u2013500. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-319-28658-7_41\" data-key=\"1ec34b18862656cc717923e87d823d9b\">10.1007\/978-3-319-28658-7_41<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Real+Time+Patient+Monitoring+System+for+Heart+Disease+Prediction+Using+Random+Forest+Algorithm&rft.jtitle=Advances+in+Signal+Processing+and+Intelligent+Recognition+Systems&rft.aulast=Sreejith%2C+S.%3B+Rahul%2C+S.%3B+Jisha%2C+R.C.&rft.au=Sreejith%2C+S.%3B+Rahul%2C+S.%3B+Jisha%2C+R.C.&rft.date=2016&rft.volume=425&rft.pages=485%E2%80%93500&rft_id=info:doi\/10.1007%2F978-3-319-28658-7_41&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SkubicAuto15-6\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-SkubicAuto15_6-0\">6.0<\/a><\/sup> <sup><a href=\"#cite_ref-SkubicAuto15_6-1\">6.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Skubic, M.; Guevara, R.D.; Rantz, M. (2015). \"Automated Health Alerts Using In-Home Sensor Data for Embedded Health Assessment\". <i>IEEE Journal of Translational Engineering in Health and Medicine<\/i> <b>3<\/b>: 1\u201311. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FJTEHM.2015.2421499\" data-key=\"1de1ef1393bbd7c24a208fc853ef650e\">10.1109\/JTEHM.2015.2421499<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automated+Health+Alerts+Using+In-Home+Sensor+Data+for+Embedded+Health+Assessment&rft.jtitle=IEEE+Journal+of+Translational+Engineering+in+Health+and+Medicine&rft.aulast=Skubic%2C+M.%3B+Guevara%2C+R.D.%3B+Rantz%2C+M.&rft.au=Skubic%2C+M.%3B+Guevara%2C+R.D.%3B+Rantz%2C+M.&rft.date=2015&rft.volume=3&rft.pages=1%E2%80%9311&rft_id=info:doi\/10.1109%2FJTEHM.2015.2421499&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LopesTowards13-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LopesTowards13_7-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Lopes. I.C.; Vaidya, B.; Rodrigues, J.J.P.C. (2013). \"Towards an autonomous fall detection and alerting system on a mobile and pervasive environment\". <i>Telecommunications Systems<\/i> <b>52<\/b> (4): 2299\u2013310. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2Fs11235-011-9534-0\" data-key=\"5316d773a491af5ee6173d8b143123c1\">10.1007\/s11235-011-9534-0<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Towards+an+autonomous+fall+detection+and+alerting+system+on+a+mobile+and+pervasive+environment&rft.jtitle=Telecommunications+Systems&rft.aulast=Lopes.+I.C.%3B+Vaidya%2C+B.%3B+Rodrigues%2C+J.J.P.C.&rft.au=Lopes.+I.C.%3B+Vaidya%2C+B.%3B+Rodrigues%2C+J.J.P.C.&rft.date=2013&rft.volume=52&rft.issue=4&rft.pages=2299%E2%80%93310&rft_id=info:doi\/10.1007%2Fs11235-011-9534-0&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BalasubramanianDisco16-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BalasubramanianDisco16_8-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Balasubramanian, A.; Wang, J.; Prabhakaran, B. (2016). \"Discovering Multidimensional Motifs in Physiological Signals for Personalized Healthcare\". <i>EEE Journal of Selected Topics in Signal Processing<\/i> <b>10<\/b> (5): 832\u201341. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FJSTSP.2016.2543679\" data-key=\"f7f96d148a2e48ff6652c96dc7fbeefb\">10.1109\/JSTSP.2016.2543679<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Discovering+Multidimensional+Motifs+in+Physiological+Signals+for+Personalized+Healthcare&rft.jtitle=EEE+Journal+of+Selected+Topics+in+Signal+Processing&rft.aulast=Balasubramanian%2C+A.%3B+Wang%2C+J.%3B+Prabhakaran%2C+B.&rft.au=Balasubramanian%2C+A.%3B+Wang%2C+J.%3B+Prabhakaran%2C+B.&rft.date=2016&rft.volume=10&rft.issue=5&rft.pages=832%E2%80%9341&rft_id=info:doi\/10.1109%2FJSTSP.2016.2543679&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HristoskovaOnto14-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HristoskovaOnto14_9-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hristoskova, A.; Sakkalis, V.; Zacharioudakis, G. et al. (2014). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3926628\" data-key=\"369b9952287870bc40da6cea1d9b567b\">\"Ontology-driven monitoring of patient's vital signs enabling personalized medical detection and alert\"<\/a>. <i>Sensors<\/i> <b>14<\/b> (1): 1598-628. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.3390%2Fs140101598\" data-key=\"dd4cc78b53fc7aa5105332d537874324\">10.3390\/s140101598<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3926628\/\" data-key=\"f0fd3831873bc2227b7919713530d7fc\">PMC3926628<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24445411\" data-key=\"08344d19157e2d01c0a326cd4d402950\">24445411<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3926628\" data-key=\"369b9952287870bc40da6cea1d9b567b\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3926628<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ontology-driven+monitoring+of+patient%27s+vital+signs+enabling+personalized+medical+detection+and+alert&rft.jtitle=Sensors&rft.aulast=Hristoskova%2C+A.%3B+Sakkalis%2C+V.%3B+Zacharioudakis%2C+G.+et+al.&rft.au=Hristoskova%2C+A.%3B+Sakkalis%2C+V.%3B+Zacharioudakis%2C+G.+et+al.&rft.date=2014&rft.volume=14&rft.issue=1&rft.pages=1598-628&rft_id=info:doi\/10.3390%2Fs140101598&rft_id=info:pmc\/PMC3926628&rft_id=info:pmid\/24445411&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3926628&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Andreu-PerezBigData15-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Andreu-PerezBigData15_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Andreu-Perez, J.; Poon, C.C.; Merrifield, R.D. et al. (2015). \"Big data for health\". <i>IEEE Journal of Biomedical and Health Informatics<\/i> <b>19<\/b> (4): 1193-208. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FJBHI.2015.2450362\" data-key=\"fc18b9fe81bab7da0cc5174fbbb392f4\">10.1109\/JBHI.2015.2450362<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26173222\" data-key=\"3e08b96a3b713e22820f589be926e159\">26173222<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Big+data+for+health&rft.jtitle=IEEE+Journal+of+Biomedical+and+Health+Informatics&rft.aulast=Andreu-Perez%2C+J.%3B+Poon%2C+C.C.%3B+Merrifield%2C+R.D.+et+al.&rft.au=Andreu-Perez%2C+J.%3B+Poon%2C+C.C.%3B+Merrifield%2C+R.D.+et+al.&rft.date=2015&rft.volume=19&rft.issue=4&rft.pages=1193-208&rft_id=info:doi\/10.1109%2FJBHI.2015.2450362&rft_id=info:pmid\/26173222&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LiuAttenu14-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LiuAttenu14_11-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Liu, Q.; Yan, B.P.; Yu, C.M. et al. (2014). \"Attenuation of systolic blood pressure and pulse transit time hysteresis during exercise and recovery in cardiovascular patients\". <i>IEEE Transactions on Bio-medical engineering<\/i> <b>61<\/b> (2): 346\u201352. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FTBME.2013.2286998\" data-key=\"56768fcde8740bc65201cbedcd9b3c18\">10.1109\/TBME.2013.2286998<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24158470\" data-key=\"3faab628a377acf49a8e802c91672ef7\">24158470<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Attenuation+of+systolic+blood+pressure+and+pulse+transit+time+hysteresis+during+exercise+and+recovery+in+cardiovascular+patients&rft.jtitle=IEEE+Transactions+on+Bio-medical+engineering&rft.aulast=Liu%2C+Q.%3B+Yan%2C+B.P.%3B+Yu%2C+C.M.+et+al.&rft.au=Liu%2C+Q.%3B+Yan%2C+B.P.%3B+Yu%2C+C.M.+et+al.&rft.date=2014&rft.volume=61&rft.issue=2&rft.pages=346%E2%80%9352&rft_id=info:doi\/10.1109%2FTBME.2013.2286998&rft_id=info:pmid\/24158470&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BatesBigData14-12\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BatesBigData14_12-0\">12.0<\/a><\/sup> <sup><a href=\"#cite_ref-BatesBigData14_12-1\">12.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bates, D.W.; Saria, S.; Ohno-Machado, L. et al. (2014). \"Big data in health care: using analytics to identify and manage high-risk and high-cost patients\". <i>Health Affairs<\/i> <b>33<\/b> (7): 1123-31. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1377%2Fhlthaff.2014.0041\" data-key=\"fb7dc706f605938054e0b7de409997a0\">10.1377\/hlthaff.2014.0041<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25006137\" data-key=\"8607ec02cedec31e5ec52535e72dc012\">25006137<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Big+data+in+health+care%3A+using+analytics+to+identify+and+manage+high-risk+and+high-cost+patients&rft.jtitle=Health+Affairs&rft.aulast=Bates%2C+D.W.%3B+Saria%2C+S.%3B+Ohno-Machado%2C+L.+et+al.&rft.au=Bates%2C+D.W.%3B+Saria%2C+S.%3B+Ohno-Machado%2C+L.+et+al.&rft.date=2014&rft.volume=33&rft.issue=7&rft.pages=1123-31&rft_id=info:doi\/10.1377%2Fhlthaff.2014.0041&rft_id=info:pmid\/25006137&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SungMobile14-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SungMobile14_13-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sung, W.-T.; Chen, J.-H.; Chang, K.-W. (2014). \"Mobile Physiological Measurement Platform With Cloud and Analysis Functions Implemented via IPSO\". <i>IEEE Sensors Journal<\/i> <b>14<\/b> (1): 111\u201323. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FJSEN.2013.2280398\" data-key=\"a671f05baa01f9537d909f7cca6a2a66\">10.1109\/JSEN.2013.2280398<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mobile+Physiological+Measurement+Platform+With+Cloud+and+Analysis+Functions+Implemented+via+IPSO&rft.jtitle=IEEE+Sensors+Journal&rft.aulast=Sung%2C+W.-T.%3B+Chen%2C+J.-H.%3B+Chang%2C+K.-W.&rft.au=Sung%2C+W.-T.%3B+Chen%2C+J.-H.%3B+Chang%2C+K.-W.&rft.date=2014&rft.volume=14&rft.issue=1&rft.pages=111%E2%80%9323&rft_id=info:doi\/10.1109%2FJSEN.2013.2280398&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CellerHome15-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CellerHome15_14-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Celler, B.G.; Sparks, R.S. (2015). \"Home telemonitoring of vital signs--technical challenges and future directions\". <i>IEEE Journal of Biomedical and Health Informatics<\/i> <b>19<\/b> (1): 82\u201391. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FJBHI.2014.2351413\" data-key=\"9c118ea57294d0ab62181da7ba49f791\">10.1109\/JBHI.2014.2351413<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25163076\" data-key=\"1c125ad9d9774ca9a5d62f590d3a48b9\">25163076<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Home+telemonitoring+of+vital+signs--technical+challenges+and+future+directions&rft.jtitle=IEEE+Journal+of+Biomedical+and+Health+Informatics&rft.aulast=Celler%2C+B.G.%3B+Sparks%2C+R.S.&rft.au=Celler%2C+B.G.%3B+Sparks%2C+R.S.&rft.date=2015&rft.volume=19&rft.issue=1&rft.pages=82%E2%80%9391&rft_id=info:doi\/10.1109%2FJBHI.2014.2351413&rft_id=info:pmid\/25163076&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GoldbergerPhysio00-15\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-GoldbergerPhysio00_15-0\">15.0<\/a><\/sup> <sup><a href=\"#cite_ref-GoldbergerPhysio00_15-1\">15.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Goldberger, A.L.; Amaral, L.A.; Glass, L. (2000). \"PhysioBank, PhysioToolkit, and PhysioNet\". <i>Circulation<\/i> <b>101<\/b> (23): e215\u2013e220. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1161%2F01.CIR.101.23.e215\" data-key=\"e3a3da4754ff5c8c72a6714e5df5e8a4\">10.1161\/01.CIR.101.23.e215<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PhysioBank%2C+PhysioToolkit%2C+and+PhysioNet&rft.jtitle=Circulation&rft.aulast=Goldberger%2C+A.L.%3B+Amaral%2C+L.A.%3B+Glass%2C+L.&rft.au=Goldberger%2C+A.L.%3B+Amaral%2C+L.A.%3B+Glass%2C+L.&rft.date=2000&rft.volume=101&rft.issue=23&rft.pages=e215%E2%80%93e220&rft_id=info:doi\/10.1161%2F01.CIR.101.23.e215&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PathinarupothInstant17-16\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-PathinarupothInstant17_16-0\">16.0<\/a><\/sup> <sup><a href=\"#cite_ref-PathinarupothInstant17_16-1\">16.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pathinarupothi, R.K.; Vinaykumar, R.; Rangan, E. et al. (2017). \"Instantaneous heart rate as a robust feature for sleep apnea severity detection using deep learning\". <i>Proceedings from the 2017 IEEE EMBS International Conference on Biomedical & Health Informatics<\/i>: 293\u20136. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FBHI.2017.7897263\" data-key=\"ecf86dd17a088c4367cb4b3c9d4ca0e0\">10.1109\/BHI.2017.7897263<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Instantaneous+heart+rate+as+a+robust+feature+for+sleep+apnea+severity+detection+using+deep+learning&rft.jtitle=Proceedings+from+the+2017+IEEE+EMBS+International+Conference+on+Biomedical+%26+Health+Informatics&rft.aulast=Pathinarupothi%2C+R.K.%3B+Vinaykumar%2C+R.%3B+Rangan%2C+E.+et+al.&rft.au=Pathinarupothi%2C+R.K.%3B+Vinaykumar%2C+R.%3B+Rangan%2C+E.+et+al.&rft.date=2017&rft.pages=293%E2%80%936&rft_id=info:doi\/10.1109%2FBHI.2017.7897263&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ArunanAReal16-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ArunanAReal16_17-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Arunan, A.; Pathinarupothi, R.K.; Ramesh, M.V. (2016). \"A real-time detection and warning of cardiovascular disease LAHB for a wearable wireless ECG device\". <i>Proceedings from the 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics<\/i>: 98\u2013101. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FBHI.2016.7455844\" data-key=\"8a0c99ce0a2b680ab19a32c948b720d0\">10.1109\/BHI.2016.7455844<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+real-time+detection+and+warning+of+cardiovascular+disease+LAHB+for+a+wearable+wireless+ECG+device&rft.jtitle=Proceedings+from+the+2016+IEEE-EMBS+International+Conference+on+Biomedical+and+Health+Informatics&rft.aulast=Arunan%2C+A.%3B+Pathinarupothi%2C+R.K.%3B+Ramesh%2C+M.V.&rft.au=Arunan%2C+A.%3B+Pathinarupothi%2C+R.K.%3B+Ramesh%2C+M.V.&rft.date=2016&rft.pages=98%E2%80%93101&rft_id=info:doi\/10.1109%2FBHI.2016.7455844&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PenzelTheAp00-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PenzelTheAp00_18-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Penzel, T.; Moody, G.B.; Mark, R.G. et al. (2000). \"The Apnea-ECG Database\". <i>Proceedings from Computers in Cardiology 2000<\/i> <b>27<\/b>: 255\u201358. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FCIC.2000.898505\" data-key=\"10f83630e7ac0bf6333ce4a39c92f423\">10.1109\/CIC.2000.898505<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Apnea-ECG+Database&rft.jtitle=Proceedings+from+Computers+in+Cardiology+2000&rft.aulast=Penzel%2C+T.%3B+Moody%2C+G.B.%3B+Mark%2C+R.G.+et+al.&rft.au=Penzel%2C+T.%3B+Moody%2C+G.B.%3B+Mark%2C+R.G.+et+al.&rft.date=2000&rft.volume=27&rft.pages=255%E2%80%9358&rft_id=info:doi\/10.1109%2FCIC.2000.898505&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SaeedMulti11-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SaeedMulti11_19-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Saeed, M.; Villarroel, M.; Reisner, A.T. et al. (2011). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3124312\" data-key=\"4d03df790e83dc7d25550ea54252b8ce\">\"Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): A public-access intensive care unit database\"<\/a>. <i>Critical Care Medicine<\/i> <b>39<\/b> (5): 952\u201360. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1097%2FCCM.0b013e31820a92c6\" data-key=\"08b51f66760f0029b1babdbbafd3ef7c\">10.1097\/CCM.0b013e31820a92c6<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3124312\/\" data-key=\"b206f3c197baac91afbcb7aa7b07c21b\">PMC3124312<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21283005\" data-key=\"39316f5e5278ef3b4669aee757360ec9\">21283005<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3124312\" data-key=\"4d03df790e83dc7d25550ea54252b8ce\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3124312<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multiparameter+Intelligent+Monitoring+in+Intensive+Care+II+%28MIMIC-II%29%3A+A+public-access+intensive+care+unit+database&rft.jtitle=Critical+Care+Medicine&rft.aulast=Saeed%2C+M.%3B+Villarroel%2C+M.%3B+Reisner%2C+A.T.+et+al.&rft.au=Saeed%2C+M.%3B+Villarroel%2C+M.%3B+Reisner%2C+A.T.+et+al.&rft.date=2011&rft.volume=39&rft.issue=5&rft.pages=952%E2%80%9360&rft_id=info:doi\/10.1097%2FCCM.0b013e31820a92c6&rft_id=info:pmc\/PMC3124312&rft_id=info:pmid\/21283005&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3124312&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MoodyPredict09-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MoodyPredict09_20-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Moody, G.B.; Lehman, L.H. (2009). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2937253\" data-key=\"41d8f15af096c47a477c6f1c0e593e53\">\"Predicting Acute Hypotensive Episodes: The 10th Annual PhysioNet\/Computers in Cardiology Challenge\"<\/a>. <i>Computers in Cardiology<\/i> <b>36<\/b> (5445351): 541\u2013544. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2937253\/\" data-key=\"4ff07c2f72cbe50b028aa1d63f1a3dea\">PMC2937253<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20842209\" data-key=\"d59d768ec504ef4909a8e9fb3987ff39\">20842209<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2937253\" data-key=\"41d8f15af096c47a477c6f1c0e593e53\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2937253<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Predicting+Acute+Hypotensive+Episodes%3A+The+10th+Annual+PhysioNet%2FComputers+in+Cardiology+Challenge&rft.jtitle=Computers+in+Cardiology&rft.aulast=Moody%2C+G.B.%3B+Lehman%2C+L.H.&rft.au=Moody%2C+G.B.%3B+Lehman%2C+L.H.&rft.date=2009&rft.volume=36&rft.issue=5445351&rft.pages=541%E2%80%93544&rft_id=info:pmc\/PMC2937253&rft_id=info:pmid\/20842209&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC2937253&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. Grammar and punctuation was edited to American English, and in some cases additional context was added to text when necessary. In some cases important information was missing from the references, and that information was added.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185648\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.677 seconds\nReal time usage: 1.725 seconds\nPreprocessor visited node count: 16837\/1000000\nPreprocessor generated node count: 30821\/1000000\nPost\u2010expand include size: 125314\/2097152 bytes\nTemplate argument size: 44560\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 562.337 1 - -total\n 75.91% 426.865 1 - Template:Reflist\n 70.49% 396.406 20 - Template:Cite_journal\n 67.28% 378.322 20 - Template:Citation\/core\n 16.80% 94.451 1 - Template:Infobox_journal_article\n 15.90% 89.426 1 - Template:Infobox\n 9.22% 51.857 80 - Template:Infobox\/row\n 8.02% 45.073 29 - Template:Citation\/identifier\n 3.26% 18.346 20 - Template:Citation\/make_link\n 3.00% 16.882 61 - Template:Hide_in_print\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10874-0!*!0!!en!5!*!math=5 and timestamp 20190401185647 and revision id 35339\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Data_to_diagnosis_in_global_health:_A_3P_approach\">https:\/\/www.limswiki.org\/index.php\/Journal:Data_to_diagnosis_in_global_health:_A_3P_approach<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","8d21eded7dba3fec86203cded8451b7e_images":["https:\/\/www.limswiki.org\/images\/8\/84\/Fig1_Pathinarupothi_BMCMedInfoDecMak2018_18.png","https:\/\/www.limswiki.org\/images\/1\/10\/Fig2_Pathinarupothi_BMCMedInfoDecMak2018_18.png","https:\/\/www.limswiki.org\/images\/d\/dd\/Fig3_Pathinarupothi_BMCMedInfoDecMak2018_18.png","https:\/\/www.limswiki.org\/images\/9\/9c\/Fig4_Pathinarupothi_BMCMedInfoDecMak2018_18.png","https:\/\/www.limswiki.org\/images\/2\/23\/Fig5_Pathinarupothi_BMCMedInfoDecMak2018_18.png","https:\/\/www.limswiki.org\/images\/d\/d7\/Fig6_Pathinarupothi_BMCMedInfoDecMak2018_18.png","https:\/\/www.limswiki.org\/images\/b\/b8\/Fig7_Pathinarupothi_BMCMedInfoDecMak2018_18.png","https:\/\/www.limswiki.org\/images\/7\/79\/Fig8_Pathinarupothi_BMCMedInfoDecMak2018_18.png","https:\/\/www.limswiki.org\/images\/e\/ed\/Fig9_Pathinarupothi_BMCMedInfoDecMak2018_18.png","https:\/\/www.limswiki.org\/images\/8\/8f\/Fig10_Pathinarupothi_BMCMedInfoDecMak2018_18.png"],"8d21eded7dba3fec86203cded8451b7e_timestamp":1554145007,"625b72cffd2a8d803eb5cb58c6ef954e_type":"article","625b72cffd2a8d803eb5cb58c6ef954e_title":"Development of an electronic information system for the management of laboratory data of tuberculosis and atypical mycobacteria at the Pasteur Institute in C\u00f4te d\u2019Ivoire (Kon\u00e9 et al. 2019)","625b72cffd2a8d803eb5cb58c6ef954e_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire","625b72cffd2a8d803eb5cb58c6ef954e_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Development of an electronic information system for the management of laboratory data of tuberculosis and atypical mycobacteria at the Pasteur Institute in C\u00f4te d\u2019Ivoire\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nDevelopment of an electronic information system for the management of laboratory data of tuberculosis and atypical mycobacteria at the Pasteur Institute in C\u00f4te d\u2019IvoireJournal\n \nJournal of Health Management and InformaticsAuthor(s)\n \nKon\u00e9, Constant J.; Tour\u00e9, Assata; N\u2019Dri, Mathias K.; Nguessan, Raymond; Soumahoro, Man-KoumbaAuthor affiliation(s)\n \nPasteur Institute of C\u00f4te d\u2019IvoirePrimary contact\n \nEmail: koneconstant at pasteur dot ciYear published\n \n2019Volume and issue\n \n6(1)Page(s)\n \n1\u20136DOI\n \nNoneISSN\n \n2423-5857Distribution license\n \nCreative Commons Attribution 3.0 UnportedWebsite\n \nhttp:\/\/jhmi.sums.ac.ir\/index.php\/JHMI\/article\/view\/513\/160Download\n \nhttp:\/\/jhmi.sums.ac.ir\/index.php\/JHMI\/article\/download\/513\/160 (PDF)\n\nContents\n\n1 Abstract \n2 Introduction \n3 Methods \n\n3.1 Design \n3.2 Architecture and system features \n3.3 On-site system installation and user training \n3.4 Qualitative assessment \n\n\n4 Results \n\n4.1 Results of the system launch \n4.2 Perception and usability of the system \n\n\n5 Discussion \n6 Conclusions \n7 Additional material \n8 Acknowledgements \n\n8.1 Funding \n8.2 Contributions of the authors \n8.3 Conflict of interest \n\n\n9 References \n10 Notes \n\n\n\nAbstract \nIntroduction: Tuberculosis remains a public health problem despite all the efforts made to eradicate it. To strengthen the surveillance system for this condition, it is necessary to have a good data management system. Indeed, the use of electronic information systems in data management can improve the quality of data. The objective of this project was to set up a laboratory-specific electronic information system for mycobacteria and atypical tuberculosis.\nMethods: The design of this laboratory information system required a general understanding of the workflow and the implementation processes in order to generate a realistic model. For the implementation of the system, Java technology was used to develop a web application compatible with the intranet of the company. The impact and the acceptability of the use of the system on the running of the laboratory were evaluated using the Likert scale.\nResults: The system in place has been in operation for about 12 months, in conjunction with the paper registers. Since then, 4811 requests for examinations concerning 6083 samples have been registered. The results of analysis of 3892 patients were printed from the laboratory information system. In order to produce tuberculosis drug resistance reports and laboratory performance reports, dashboards have been developed.\nConclusion: The system has been adopted by the staff because of the time and efficiency gained in managing laboratory data. However, obtaining an optimized tool will only be done in a cycle of sustained improvement.\nKeywords: clinical laboratory information systems, public health, tuberculosis\n\nIntroduction \nTuberculosis remains a public health problem despite all efforts to eradicate it. According to the World Health Organization (WHO), the number of tuberculosis cases in the world was estimated 9.6 million in 2014.[1] To strengthen the surveillance system for this disease, we need to have a good data management system. \nThe use of electronic information systems can improve the quality of clinical data[2] and thus the management of patients. Studies have shown that the implementation of computer tools is a performance factor for laboratory activities.[3] The results of laboratory analyses provide important data for clinical decision-making and treatment. Effective management of this data is essential for strengthening the health care system as a whole. \nLaboratory information systems have been developed and used for about half a century.[3][4] Like all laboratory information systems, the microbiology laboratory information system must be secure, user-friendly, and able to interact with other information systems. However, there are several unique features of microbiology that are not used in other clinical laboratories, i.e., tracking multiple drifts, laboratory electronic notes, reporting results, and taking the preliminary and final results into account.[3]\nThe strategic plan of development of e-health has been defined and is being implemented.[5] All health facilities have to implement an electronic information system for better management of data generated by their activities. \nThe purpose of this project was to set up a laboratory information system (LIS) at Institute Pasteur of C\u00f4te d\u2019Ivoire to collect specific data for Mycobacterium tuberculosis. This LIS will allow the recording of the data resulting from the activities of the laboratory. It should also allow for the generation of results that are ultimately given to the patient as well as the generation of reports on the activities of the laboratory. Finally, an analysis of the impact of this new system in the laboratory routine was carried out. \n\nMethods \nDesign \nDesigning a laboratory information system requires an understanding of the workflow and the processes implemented. The data collection and recording sheet was the starting point for this work as the first objective, which was to build a system capable of recording the data and results from the laboratory\u2019s activities. This data sheet is a key element in the activities of the laboratory since all patient information such as socio-demographic data, information on the samples, the preliminary and final results, and the notes of the various examinations are notified. This sheet was previously reorganized by the epidemiology unit. Then, from this new form and a series of interviews with biologists, some improvements were made. It was noted that in the paper recording system used before, the biological monitoring of a patient throughout his\/her treatment was not possible. This feature was thus integrated into the new system. After analysis of all data, an application with conceptual, logic, and physical characteristics was designed using the Merise method in the open source tool MySQL Workbench.[6][7]\n\nArchitecture and system features \nWe opted for a web-based application used in the company\u2019s intranet. It should be noted that the project was entirely based on Java technology. The design of the application was based on the Apache Struts 2.0 framework[8], which defined a controller-view-template three-layer design. The database connection layer uses Hibernate[9], which is an object-relational mapping (ORM) open-source solution that facilitates the development of the persistence layer; this database connection layer is also based on Java Database Connectivity (JDBC) for some modules of the application. The production of PDF reports was possible by the use of the JasperReports Library.[10] The reports in the Excel format required the libraries of the Apache POI project.[11] The Java Enterprise Edition of Glassfish 4.1[12] deployment server in its free version was used to host the application, and the MySQL 5.1 database management system (DBMS) was used to manage the database. The MySQL DBMS and Glassfish were installed on an HP ProLiant server running on a UNIX CentOS 6.7 OS. All the client machines were PC-type computers. All web browsers could be used as web clients for the application, though with a preference for Google Chrome; browsers such as Internet Explorer, Safari, and Firefox do not fully support the \u201cdate\u201d and \u201ctime\u201d input types.[13] The data stored in the database benefited from automatic backups to an external hard drive, and replication of this external hard drive was done at the end of the week. The source code of the application as well as the different steps of the implementation was documented to allow future maintenance. The user interface was modeled to be fairly similar to the original paper form. System access management was also taken into account in the implementation. The system modules are accessible according to the rights assigned by the administrator. \n\nOn-site system installation and user training \nThe server housing the executable files of the information system and the database was installed in a room which was secure and protected from the weather. A high capacity UPS was connected to this server to prevent the system from shutting down in the event of a power outage. The client computers comprised the ordinary service computers. The application was available on the corporate intranet from a web browser. \nTraining of six members of the laboratory staff was conducted in half a day. The assimilation took place quite quickly because the members of the laboratory all had good notions of the computer tool. Consultations were often sought when certain aspects had been forgotten when using the system. After each update, a short information session was organized to share the new features, after which use of the software continued without interruption. Biologists' use of the LIS often resulted in the discovery of bugs, which were later corrected; suggestions and comments were often made by users to improve this information system.\n\nQualitative assessment \nThe management of patients with tuberculosis involves regular biological tests. Monitoring of the analyses over time for a patient becomes problematic, which is also the case with the search for information on a sample, as well as the completion of activity reports and the delivery of results. The information system put in place should be able to improve the realization of these tasks on a daily basis. \nAfter 12 months of usage, we evaluated the acceptability of this information system and the impact of implementation on the functioning of the laboratory by different users of the system, namely a data manager, two technicians, two engineers and a biologist who had to use the software continuously. This evaluation was based on the Likert scale.[14] To do this, we administered a questionnaire to selected the staff members. The interviewees were to choose one response from the five possible responses given for each of the questions which included: Strongly Agree (5), Agree (4), Neither Agree or Disagree (3), Disagree (2), and Strongly Disagree (1). Summary statistics were generated from the responses obtained. \n\nResults \nResults of the system launch \nThe system in place has been in use since the January 1, 2017. By the end of the year, 3,892 patients were registered in the LIS, with 4,811 visits and 6,083 collected samples recorded. Of the registered patients, 805 made at least two visits to the laboratory. In this information system, 93.9% (185\/197) of the entry fields from patient and visit cards are fields with drop-down lists of those pre-recorded at the level of the administration module. These mechanisms were put in place to avoid the errors inherent in free text field filling. LIS allowed the patient folder creation as well as the creation of visit cards related to requests for examinations. These visit sheets were linked to the same record for a patient, which helps to monitor his\/her condition (Figures 1, 2, 3). \n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. This page for creating a new patient record\n\n\n\n\n\n\n\n\n\n\n\n\n Figure 2. Patient card\n\n\n\n\n\n\n\n\n\n\n\n\n Figure 3. Visit card\n\n\n\nThe printing of the results intended for the patients requires preliminary validation by rhe biologist responsible for test results; otherwise, the request for printing is not executed. Six laboratory-delivered results could be printed, including microscopy, Genexpert, solid and\/or fluid culture, genotypic test, standard antibiogram, and extended antibiogram. In the interests of quality assurance of the laboratory, the second impression for the same result carries the duplicate mark, so during this period, 6,075 printed results were obtained, which included 3,415 microscopy examinations, 1,591 Genexpert and LPA tests, 35 culture samples, 90 classical antibiogram, and five expanded. \nA dashboard system allows the calculation of indicators for the monitoring of laboratory activities and also provides indicators on the state of the health of the patients. These dashboards generate indicators for activities, including microscopy, Genexpert, culture in liquids and solids media, LPA, and antibiotic sensitivity (antibiogram) for a period defined by a time interval to complete. It also allows you to generate an Excel spreadsheet, the list of samples with associated information such as sample number, methods, and test results (Figures 4, 5). \n\r\n\n\n\n\n\n\n\n\n\n\n Figure 4. Dashboard of microscopy\n\n\n\n\n\n\n\n\n\n\n\n\n Figure 5. Dashboard of LPA\n\n\n\nPerception and usability of the system \nThe questionnaire was administered to the laboratory staff. Table 1 details the results obtained for these five questions. Regarding the assertions, they\u2019re saying \u201cthe user feels comfortable with the LIS,\u201d \u201cLIS allows security and quick access to information,\u201d \u201cthe LIS improves the quality of the work in the laboratory,\u201d and \u201cthe user considers to switch to exclusive use of the electronics for managing the laboratory data\u201d; 100% (6\/6) responded favorably. One in six found that LIS did not save time in performing data management tasks.\n\n\n\n\n\n\n\nTable 1. Results of the evaluation of the acceptability and impact of the implementation of the information system on the functioning of the laboratory\n\n\n\n\nStrongly Agree\n\nAgree\n\nNeither Agree\r\nNor Disagree\n\nDisagree\n\nStrongly Disagree\n\n\nI am completely comfortable with LIS\n\n5 (25)\n\n1 (4)\n\n0\n\n0\n\n0\n\n\nLIS saves time in completing tasks\n\n5 (25)\n\n0\n\n0\n\n1 (2)\n\n0\n\n\nThe LIS allows access to information quickly and securely\n\n6 (30)\n\n0\n\n0\n\n0\n\n0\n\n\nLIS improves the overall quality of work\n\n5 (25)\n\n1 (4)\n\n0\n\n0\n\n0\n\n\nI plan to move to the exclusive use of the LIS for laboratory data management\n\n5 (25)\n\n1 (4)\n\n0\n\n0\n\n0\n\n\n\nDiscussion \nWe have developed an electronic laboratory information system that takes the specifics of a microbiology laboratory focused on mycobacteria into account. This LIS, based on the existing paper collection form, has been implemented to manage the data resulting from the activities of this laboratory, except for the management of laboratory inputs. \nThe workflow begins with the sample arrival in the laboratory for examination of the bulletins coupled to billing. The invoicing forms attest to the agreement of the office of entries to the processing of the application by the laboratory. From the information on the request for examination, a search is made on the last name and name of the patient in the LIS at the search field of the page \u201cCreate new patient record.\u201c If there is ambiguity for a name\u2014e.g, say that we find several identical names and surnames\u2014other parameters such as date of birth, date of registration, telephone number, or residential city can be used to discriminate the patients and find the one that corresponds to the patient on the examination form. In case there is no match, a new folder is created, and the identification number used for this electronic file is generated manually from the laboratory register. This step of the process is often a bottleneck because people can have the same names and first names; the search can then become long. A unique identification number generated from the laboratory\u2019s office of entries would solve this problem, but it requires re-engineering of the procedures of the office of entries. \nAfter sample processing, the corresponding patient record previously created is searched from the sample number or name and then opened. The visit form is created and all the results obtained are recorded. Thus, for the same patient, the LIS makes it possible to follow the evolution of its biological parameters from its visit forms. The LIS has saved time in the process of generating results before the patients results were enrolled manually on a pre-print form made in Microsoft Word. This process required at least five minutes per result; the generation time was less than a minute with the LIS.[4]\nTo this day, all the results of LMTA come from the LIS; this has the spirit of improving the completeness of the data recorded. Pre-recorded drop-down lists, formatted fields such as the date field, and controls set up on free fields such as sample number fields reduce the recording of erroneous data and consequently improve their quality, but there are errors inherent in the user that the system cannot control such as the registration error on a patient\u2019s name or age. \nDashboards play an important role in monitoring laboratory activities and checking the evolution of the population health status. They also help monitor the laboratory\u2019s performance indicator and the drug resistance shown by the MDR-TB and XDR-TB. \nOverall, the LIS was very well accepted, and all the staff of the laboratory found it comfortable since it is commonly used in the laboratory\u2019s data management activities. The figures of the survey carried out is in the same line with those of the study carried out in South Africa which showed the good perception of computer systems in the management of health structures.[15] A lab technician disagreed with the statement that the system saves time in performing data management tasks. After investigation, some bugs were found during the generation of the smear distribution table depending on the result, type of the sample, origin of the biological product, and the profile of the sample, thus motivating this response. This particular malfunction has been corrected. Unlike the study conducted by Barbara Castelnuovo et al.[2] who showed unwilling users adopted the electronic information systems, our study demonstrated the total approval of the laboratory staff as to upgrading to the LIS, as shown by the results of the LIS evaluation by users. \n\nConclusions \nThe introduction of the information system for the microbiology laboratory has necessitated the understanding and modeling of various processes used from the arrival of the examination requests to the printing of the results. The system, currently in use, has been readily adopted by the laboratory staff, who see a tool facilitating their work for laboratory technicians. LIS reduces the workload by automating a generation of the results and reports. For the biologists in charge of the laboratory, the LIS\u2014through the dashboards\u2014allows them to have in real time the indicators on the follow-up of samples, the activity carried out in the laboratory, and the state of resistance to antituberculosis treatments. The LIS has a positive impact on the laboratory activities, but obtaining a perfect tool is only done in a cycle of sustained improvement. \n\nAdditional material \n\n\n\n\n\n\nAppendix A Questionnaire Instrument (Likert Scale Structured Questions)\r\nPlease answer the following questions by indicating which answer most accurately represents the extent to which you agree or disagree with the statement on the left. There can only be one answer per statement.\n\n\n\n\nStrongly Agree\n\nAgree\n\nNeither Agree\r\nNor Disagree\n\nDisagree\n\nStrongly Disagree\n\n\n1. I am completely comfortable with LIS\n\n\n\n\n\n\n\n\n\n\n\n\n2. LIS saves time in completing tasks\n\n\n\n\n\n\n\n\n\n\n\n\n3. The LIS allows access to information quickly and securely\n\n\n\n\n\n\n\n\n\n\n\n\n4. LIS improves the overall quality of work\n\n\n\n\n\n\n\n\n\n\n\n\n5. I plan to move to the exclusive use of the LIS for laboratory data management\n\n\n\n\n\n\n\n\n\n\n\n\n\nAcknowledgements \nThe authors would like to thank all the members of the Epidemiology and Laboratory Unit for Tuberculous and Atypical Mycobacteria. Special thanks to Anatole Mian for the English translation. \n\nFunding \nThis work did not receive any internal funding for its design. This is a project of the Epidemiology Unit to improve the data management of the National Reference Center for Tuberculosis in C\u00f4te d\u2019Ivoire. No external funding was also received for this project.\n\nContributions of the authors \nCJK made the modeling and design, coding the application and writing the manuscript, TA made the conception and modeling of the Fact Sheet, MKD reading and validation of the manuscript, NR made the conception and validation of graphical interfaces and dashboard, SMK conception and modelisation epidemiological record reading and validation of the manuscript. All authors read and approved the final manuscript.\n\nConflict of interest \nNone declared.\n\nReferences \n\n\n\u2191 Wourld Health organization (2016). Global Tuberculosis Report 2016. World Health Organization. pp. 214. ISBN 9789241565394. http:\/\/apps.who.int\/medicinedocs\/en\/d\/Js23098en\/ . Retrieved 04 October 2017 .   \n\n\u2191 2.0 2.1 Castelnuovo, B.; Kiragga, A.; Afayo, V. et al. (2012). \"Implementation of provider-based electronic medical records and improvement of the quality of data in a large HIV program in Sub-Saharan Africa\". PLoS One 7 (12): e51631. doi:10.1371\/journal.pone.0051631. PMC PMC3524185. PMID 23284728. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3524185 .   \n\n\u2191 3.0 3.1 3.2 Rhoads, D.D.; Sintchenko, V.; Rauch, C.A. et al. (2014). \"Clinical microbiology informatics\". Clinical Microbiology Reviews 27 (4): 1025-47. doi:10.1128\/CMR.00049-14. PMC PMC4187636. PMID 25278581. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4187636 .   \n\n\u2191 4.0 4.1 El-Kareh, R.; Roy, C.; Williams, D.H. et al. (2012). \"Impact of automated alerts on follow-up of post-discharge microbiology results: a cluster randomized controlled trial\". Journal of General Internal Medicine 27 (10): 1243-50. doi:10.1007\/s11606-012-1986-8. PMC PMC3445692. PMID 22278302. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3445692 .   \n\n\u2191 Ordre National Des Medecins de C\u00f4te d\u2019Ivoire (2017). \"Plan Strategique de Cybersante\". http:\/\/doczz.fr\/doc\/3118472\/pdncs-ci---order-national-des-des-decins-de-ivoire-divoire . Retrieved 15 January 2018 .   \n\n\u2191 Matheron, J.P. (2003). Comprendre Merise: Outils conceptuels et organisationnels (10th ed.). Eyrolles. ISBN 9782212075021.   \n\n\u2191 \"MySQL Workbench\". Oracle Corporation. https:\/\/www.mysql.com\/fr\/products\/workbench\/ . Retrieved 04 October 2017 .   \n\n\u2191 \"Apache Struts\". Apache Software Foundation. https:\/\/struts.apache.org\/ . Retrieved 04 October 2017 .   \n\n\u2191 \"Hibernate\". Red Hat, Inc. 04 May 2018. http:\/\/hibernate.org\/ .   \n\n\u2191 \"JasperReports Library\". TIBCO Software, Inc. https:\/\/community.jaspersoft.com\/project\/jasperreports-library . Retrieved 16 January 2018 .   \n\n\u2191 \"Apache POI\". Apache Software Foundation.   \n\n\u2191 Oracle Corporation. \"GlassFish: The Open Source Java EE Reference Implementation\". Github. https:\/\/javaee.github.io\/glassfish . Retrieved 14 January 2019 .   \n\n\u2191 \"Date and time input types\". Can I use. Patreon. https:\/\/caniuse.com\/#feat=input-datetime . Retrieved 14 January 2019 .   \n\n\u2191 Demeuse, M. (2008). \"Chapter 5.3 Echelles de Likert ou m\u00e9thode des classements additionn\u00e9s\" (PDF). Introduction aux th\u00e9ories et aux m\u00e9thodes de la mesure en sciences psychologiques et en sciences de l'\u00e9ducation. pp. 213\u201318. http:\/\/iredu.u-bourgogne.fr\/images\/stories\/Documents\/Cours_disponibles\/Demeuse\/Cours\/p5.3.pdf . Retrieved 04 January 2018 .   \n\n\u2191 Cline, G.B.; Luiz, J.M. (2013). \"Information technology systems in public sector health facilities in developing countries: the case of South Africa\". BMC Medical Informatics and Decision Making 13: 13. doi:10.1186\/1472-6947-13-13. PMC PMC3570341. PMID 23347433. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3570341 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID and DOI when they were missing from the original reference. Under \"Architecture and system features,\" the original article had a poorly punctuated and confusing sentence about browser support for date and time input types; it has been updated to be more correct, with a citation added. The URL to GlassFish was updated to show the 4.x versions. The original had reference 13 in the References section, but it was never referenced in-line; it was omitted in this version.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\">https:\/\/www.limswiki.org\/index.php\/Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles on health informaticsLIMSwiki journal articles on public health informatics\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 14 January 2019, at 23:31.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 141 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","625b72cffd2a8d803eb5cb58c6ef954e_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C\u00f4te_d\u2019Ivoire skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Development of an electronic information system for the management of laboratory data of tuberculosis and atypical mycobacteria at the Pasteur Institute in C\u00f4te d\u2019Ivoire<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><b>Introduction<\/b>: Tuberculosis remains a public health problem despite all the efforts made to eradicate it. To strengthen the surveillance system for this condition, it is necessary to have a good data management system. Indeed, the use of electronic information systems in <a href=\"https:\/\/www.limswiki.org\/index.php\/Information_management\" title=\"Information management\" class=\"wiki-link\" data-key=\"f8672d270c0750a858ed940158ca0a73\">data management<\/a> can improve the quality of data. The objective of this project was to set up a laboratory-specific electronic information system for mycobacteria and atypical tuberculosis.\n<\/p><p><b>Methods<\/b>: The design of this <a href=\"https:\/\/www.limswiki.org\/index.php\/Laboratory_information_system\" title=\"Laboratory information system\" class=\"wiki-link\" data-key=\"37add65b4d1c678b382a7d4817a9cf64\">laboratory information system<\/a> required a general understanding of the workflow and the implementation processes in order to generate a realistic model. For the implementation of the system, Java technology was used to develop a web application compatible with the intranet of the company. The impact and the acceptability of the use of the system on the running of the <a href=\"https:\/\/www.limswiki.org\/index.php\/Laboratory\" title=\"Laboratory\" class=\"wiki-link\" data-key=\"c57fc5aac9e4abf31dccae81df664c33\">laboratory<\/a> were evaluated using the Likert scale.\n<\/p><p><b>Results<\/b>: The system in place has been in operation for about 12 months, in conjunction with the paper registers. Since then, 4811 requests for examinations concerning 6083 samples have been registered. The results of analysis of 3892 patients were printed from the laboratory information system. In order to produce tuberculosis drug resistance reports and laboratory performance reports, dashboards have been developed.\n<\/p><p><b>Conclusion<\/b>: The system has been adopted by the staff because of the time and efficiency gained in managing laboratory data. However, obtaining an optimized tool will only be done in a cycle of sustained improvement.\n<\/p><p><b>Keywords<\/b>: clinical laboratory information systems, public health, tuberculosis\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Tuberculosis remains a public health problem despite all efforts to eradicate it. According to the World Health Organization (WHO), the number of tuberculosis cases in the world was estimated 9.6 million in 2014.<sup id=\"rdp-ebb-cite_ref-WHOWorld16_1-0\" class=\"reference\"><a href=\"#cite_note-WHOWorld16-1\">[1]<\/a><\/sup> To strengthen the surveillance system for this disease, we need to have a good data management system. \n<\/p><p>The use of electronic information systems can improve the quality of clinical data<sup id=\"rdp-ebb-cite_ref-CastelnuovoImp12_2-0\" class=\"reference\"><a href=\"#cite_note-CastelnuovoImp12-2\">[2]<\/a><\/sup> and thus the management of patients. Studies have shown that the implementation of computer tools is a performance factor for laboratory activities.<sup id=\"rdp-ebb-cite_ref-RhoadsClin14_3-0\" class=\"reference\"><a href=\"#cite_note-RhoadsClin14-3\">[3]<\/a><\/sup> The results of laboratory analyses provide important data for clinical decision-making and treatment. Effective management of this data is essential for strengthening the health care system as a whole. \n<\/p><p>Laboratory information systems have been developed and used for about half a century.<sup id=\"rdp-ebb-cite_ref-RhoadsClin14_3-1\" class=\"reference\"><a href=\"#cite_note-RhoadsClin14-3\">[3]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-El-KarehImpact12_4-0\" class=\"reference\"><a href=\"#cite_note-El-KarehImpact12-4\">[4]<\/a><\/sup> Like all laboratory information systems, the microbiology laboratory information system must be secure, user-friendly, and able to interact with other information systems. However, there are several unique features of microbiology that are not used in other <a href=\"https:\/\/www.limswiki.org\/index.php\/Clinical_laboratory\" title=\"Clinical laboratory\" class=\"wiki-link\" data-key=\"307bcdf1bdbcd1bb167cee435b7a5463\">clinical laboratories<\/a>, i.e., tracking multiple drifts, laboratory electronic notes, reporting results, and taking the preliminary and final results into account.<sup id=\"rdp-ebb-cite_ref-RhoadsClin14_3-2\" class=\"reference\"><a href=\"#cite_note-RhoadsClin14-3\">[3]<\/a><\/sup>\n<\/p><p>The strategic plan of development of e-health has been defined and is being implemented.<sup id=\"rdp-ebb-cite_ref-OrdrePlan17_5-0\" class=\"reference\"><a href=\"#cite_note-OrdrePlan17-5\">[5]<\/a><\/sup> All health facilities have to implement an electronic information system for better management of data generated by their activities. \n<\/p><p>The purpose of this project was to set up a laboratory information system (LIS) at Institute Pasteur of C\u00f4te d\u2019Ivoire to collect specific data for <i>Mycobacterium tuberculosis<\/i>. This LIS will allow the recording of the data resulting from the activities of the laboratory. It should also allow for the generation of results that are ultimately given to the patient as well as the generation of reports on the activities of the laboratory. Finally, an analysis of the impact of this new system in the laboratory routine was carried out. \n<\/p>\n<h2><span class=\"mw-headline\" id=\"Methods\">Methods<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Design\">Design<\/span><\/h3>\n<p>Designing a laboratory information system requires an understanding of the <a href=\"https:\/\/www.limswiki.org\/index.php\/Workflow\" title=\"Workflow\" class=\"wiki-link\" data-key=\"92bd8748272e20d891008dcb8243e8a8\">workflow<\/a> and the processes implemented. The data collection and recording sheet was the starting point for this work as the first objective, which was to build a system capable of recording the data and results from the laboratory\u2019s activities. This data sheet is a key element in the activities of the laboratory since all patient <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> such as socio-demographic data, information on the samples, the preliminary and final results, and the notes of the various examinations are notified. This sheet was previously reorganized by the epidemiology unit. Then, from this new form and a series of interviews with biologists, some improvements were made. It was noted that in the paper recording system used before, the biological monitoring of a patient throughout his\/her treatment was not possible. This feature was thus integrated into the new system. After analysis of all data, an application with conceptual, logic, and physical characteristics was designed using the Merise method in the open source tool MySQL Workbench.<sup id=\"rdp-ebb-cite_ref-MatheronComprendre03_6-0\" class=\"reference\"><a href=\"#cite_note-MatheronComprendre03-6\">[6]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-MySQLWorkbench_7-0\" class=\"reference\"><a href=\"#cite_note-MySQLWorkbench-7\">[7]<\/a><\/sup>\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Architecture_and_system_features\">Architecture and system features<\/span><\/h3>\n<p>We opted for a web-based application used in the company\u2019s intranet. It should be noted that the project was entirely based on Java technology. The design of the application was based on the Apache Struts 2.0 framework<sup id=\"rdp-ebb-cite_ref-ApacheStruts_8-0\" class=\"reference\"><a href=\"#cite_note-ApacheStruts-8\">[8]<\/a><\/sup>, which defined a controller-view-template three-layer design. The database connection layer uses Hibernate<sup id=\"rdp-ebb-cite_ref-Hibernate_9-0\" class=\"reference\"><a href=\"#cite_note-Hibernate-9\">[9]<\/a><\/sup>, which is an object-relational mapping (ORM) open-source solution that facilitates the development of the persistence layer; this database connection layer is also based on Java Database Connectivity (JDBC) for some modules of the application. The production of PDF reports was possible by the use of the JasperReports Library.<sup id=\"rdp-ebb-cite_ref-JRLib_10-0\" class=\"reference\"><a href=\"#cite_note-JRLib-10\">[10]<\/a><\/sup> The reports in the Excel format required the libraries of the Apache POI project.<sup id=\"rdp-ebb-cite_ref-ApachePOI_11-0\" class=\"reference\"><a href=\"#cite_note-ApachePOI-11\">[11]<\/a><\/sup> The Java Enterprise Edition of Glassfish 4.1<sup id=\"rdp-ebb-cite_ref-GlassFish_12-0\" class=\"reference\"><a href=\"#cite_note-GlassFish-12\">[12]<\/a><\/sup> deployment server in its free version was used to host the application, and the <a href=\"https:\/\/www.limswiki.org\/index.php\/MySQL\" title=\"MySQL\" class=\"wiki-link\" data-key=\"35005451bfcd508bce47c58e72260128\">MySQL<\/a> 5.1 database management system (DBMS) was used to manage the database. The MySQL DBMS and Glassfish were installed on an HP ProLiant server running on a UNIX CentOS 6.7 OS. All the client machines were PC-type computers. All web browsers could be used as web clients for the application, though with a preference for Google Chrome; browsers such as Internet Explorer, Safari, and Firefox do not fully support the \u201cdate\u201d and \u201ctime\u201d input types.<sup id=\"rdp-ebb-cite_ref-PatreonDateAndTime_13-0\" class=\"reference\"><a href=\"#cite_note-PatreonDateAndTime-13\">[13]<\/a><\/sup> The data stored in the database benefited from automatic backups to an external hard drive, and replication of this external hard drive was done at the end of the week. The source code of the application as well as the different steps of the implementation was documented to allow future maintenance. The user interface was modeled to be fairly similar to the original paper form. System access management was also taken into account in the implementation. The system modules are accessible according to the rights assigned by the administrator. \n<\/p>\n<h3><span class=\"mw-headline\" id=\"On-site_system_installation_and_user_training\">On-site system installation and user training<\/span><\/h3>\n<p>The server housing the executable files of the information system and the database was installed in a room which was secure and protected from the weather. A high capacity UPS was connected to this server to prevent the system from shutting down in the event of a power outage. The client computers comprised the ordinary service computers. The application was available on the corporate intranet from a web browser. \n<\/p><p>Training of six members of the laboratory staff was conducted in half a day. The assimilation took place quite quickly because the members of the laboratory all had good notions of the computer tool. Consultations were often sought when certain aspects had been forgotten when using the system. After each update, a short information session was organized to share the new features, after which use of the software continued without interruption. Biologists' use of the LIS often resulted in the discovery of bugs, which were later corrected; suggestions and comments were often made by users to improve this information system.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Qualitative_assessment\">Qualitative assessment<\/span><\/h3>\n<p>The management of patients with tuberculosis involves regular biological tests. Monitoring of the analyses over time for a patient becomes problematic, which is also the case with the search for information on a sample, as well as the completion of activity reports and the delivery of results. The information system put in place should be able to improve the realization of these tasks on a daily basis. \n<\/p><p>After 12 months of usage, we evaluated the acceptability of this information system and the impact of implementation on the functioning of the laboratory by different users of the system, namely a data manager, two technicians, two engineers and a biologist who had to use the software continuously. This evaluation was based on the Likert scale.<sup id=\"rdp-ebb-cite_ref-DemeuseEchelles08_14-0\" class=\"reference\"><a href=\"#cite_note-DemeuseEchelles08-14\">[14]<\/a><\/sup> To do this, we administered a questionnaire to selected the staff members. The interviewees were to choose one response from the five possible responses given for each of the questions which included: Strongly Agree (5), Agree (4), Neither Agree or Disagree (3), Disagree (2), and Strongly Disagree (1). Summary statistics were generated from the responses obtained. \n<\/p>\n<h2><span class=\"mw-headline\" id=\"Results\">Results<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Results_of_the_system_launch\">Results of the system launch<\/span><\/h3>\n<p>The system in place has been in use since the January 1, 2017. By the end of the year, 3,892 patients were registered in the LIS, with 4,811 visits and 6,083 collected samples recorded. Of the registered patients, 805 made at least two visits to the laboratory. In this information system, 93.9% (185\/197) of the entry fields from patient and visit cards are fields with drop-down lists of those pre-recorded at the level of the administration module. These mechanisms were put in place to avoid the errors inherent in free text field filling. LIS allowed the patient folder creation as well as the creation of visit cards related to requests for examinations. These visit sheets were linked to the same record for a patient, which helps to monitor his\/her condition (Figures 1, 2, 3). \n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Kon%C3%A9_JofHlthManInfo2019_6-1.png\" class=\"image wiki-link\" data-key=\"c025d0afcdf575d95a10deef9e5f812e\"><img alt=\"Fig1 Kon\u00e9 JofHlthManInfo2019 6-1.png\" src=\"https:\/\/www.limswiki.org\/images\/a\/a2\/Fig1_Kon%C3%A9_JofHlthManInfo2019_6-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> This page for creating a new patient record<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p><a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Kon%C3%A9_JofHlthManInfo2019_6-1.png\" class=\"image wiki-link\" data-key=\"5013a89151974a14541faa3a1b9a7a29\"><img alt=\"Fig2 Kon\u00e9 JofHlthManInfo2019 6-1.png\" src=\"https:\/\/www.limswiki.org\/images\/8\/8f\/Fig2_Kon%C3%A9_JofHlthManInfo2019_6-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> Patient card<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p><a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Kon%C3%A9_JofHlthManInfo2019_6-1.png\" class=\"image wiki-link\" data-key=\"de27027858f4c54aec9bf14d8d848473\"><img alt=\"Fig3 Kon\u00e9 JofHlthManInfo2019 6-1.png\" src=\"https:\/\/www.limswiki.org\/images\/f\/fe\/Fig3_Kon%C3%A9_JofHlthManInfo2019_6-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 3.<\/b> Visit card<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The printing of the results intended for the patients requires preliminary validation by rhe biologist responsible for test results; otherwise, the request for printing is not executed. Six laboratory-delivered results could be printed, including microscopy, Genexpert, solid and\/or fluid culture, genotypic test, standard antibiogram, and extended antibiogram. In the interests of quality assurance of the laboratory, the second impression for the same result carries the duplicate mark, so during this period, 6,075 printed results were obtained, which included 3,415 microscopy examinations, 1,591 Genexpert and LPA tests, 35 culture samples, 90 classical antibiogram, and five expanded. \n<\/p><p>A dashboard system allows the calculation of indicators for the monitoring of laboratory activities and also provides indicators on the state of the health of the patients. These dashboards generate indicators for activities, including microscopy, Genexpert, culture in liquids and solids media, LPA, and antibiotic sensitivity (antibiogram) for a period defined by a time interval to complete. It also allows you to generate an Excel spreadsheet, the list of samples with associated information such as sample number, methods, and test results (Figures 4, 5). \n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Kon%C3%A9_JofHlthManInfo2019_6-1.png\" class=\"image wiki-link\" data-key=\"3acdac0d17ad2978f2eca34791dc2fc3\"><img alt=\"Fig4 Kon\u00e9 JofHlthManInfo2019 6-1.png\" src=\"https:\/\/www.limswiki.org\/images\/1\/1f\/Fig4_Kon%C3%A9_JofHlthManInfo2019_6-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 4.<\/b> Dashboard of microscopy<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p><a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig5_Kon%C3%A9_JofHlthManInfo2019_6-1.png\" class=\"image wiki-link\" data-key=\"e45f03ae219d698d9ad63c66f1591e27\"><img alt=\"Fig5 Kon\u00e9 JofHlthManInfo2019 6-1.png\" src=\"https:\/\/www.limswiki.org\/images\/5\/57\/Fig5_Kon%C3%A9_JofHlthManInfo2019_6-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 5.<\/b> Dashboard of LPA<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Perception_and_usability_of_the_system\">Perception and usability of the system<\/span><\/h3>\n<p>The questionnaire was administered to the laboratory staff. Table 1 details the results obtained for these five questions. Regarding the assertions, they\u2019re saying \u201cthe user feels comfortable with the LIS,\u201d \u201cLIS allows security and quick access to information,\u201d \u201cthe LIS improves the quality of the work in the laboratory,\u201d and \u201cthe user considers to switch to exclusive use of the electronics for managing the laboratory data\u201d; 100% (6\/6) responded favorably. One in six found that LIS did not save time in performing data management tasks.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"6\"><b>Table 1.<\/b> Results of the evaluation of the acceptability and impact of the implementation of the information system on the functioning of the laboratory\n<\/td><\/tr>\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Strongly Agree\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Agree\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Neither Agree<br \/>Nor Disagree\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Disagree\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Strongly Disagree\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">I am completely comfortable with LIS\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">5 (25)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1 (4)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">LIS saves time in completing tasks\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">5 (25)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1 (2)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">The LIS allows access to information quickly and securely\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">6 (30)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">LIS improves the overall quality of work\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">5 (25)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1 (4)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">I plan to move to the exclusive use of the LIS for laboratory data management\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">5 (25)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1 (4)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Discussion\">Discussion<\/span><\/h2>\n<p>We have developed an electronic laboratory information system that takes the specifics of a microbiology laboratory focused on mycobacteria into account. This LIS, based on the existing paper collection form, has been implemented to manage the data resulting from the activities of this laboratory, except for the management of laboratory inputs. \n<\/p><p>The workflow begins with the sample arrival in the laboratory for examination of the bulletins coupled to billing. The invoicing forms attest to the agreement of the office of entries to the processing of the application by the laboratory. From the information on the request for examination, a search is made on the last name and name of the patient in the LIS at the search field of the page \u201cCreate new patient record.\u201c If there is ambiguity for a name\u2014e.g, say that we find several identical names and surnames\u2014other parameters such as date of birth, date of registration, telephone number, or residential city can be used to discriminate the patients and find the one that corresponds to the patient on the examination form. In case there is no match, a new folder is created, and the identification number used for this electronic file is generated manually from the laboratory register. This step of the process is often a bottleneck because people can have the same names and first names; the search can then become long. A unique identification number generated from the laboratory\u2019s office of entries would solve this problem, but it requires re-engineering of the procedures of the office of entries. \n<\/p><p>After sample processing, the corresponding patient record previously created is searched from the sample number or name and then opened. The visit form is created and all the results obtained are recorded. Thus, for the same patient, the LIS makes it possible to follow the evolution of its biological parameters from its visit forms. The LIS has saved time in the process of generating results before the patients results were enrolled manually on a pre-print form made in Microsoft Word. This process required at least five minutes per result; the generation time was less than a minute with the LIS.<sup id=\"rdp-ebb-cite_ref-El-KarehImpact12_4-1\" class=\"reference\"><a href=\"#cite_note-El-KarehImpact12-4\">[4]<\/a><\/sup>\n<\/p><p>To this day, all the results of LMTA come from the LIS; this has the spirit of improving the completeness of the data recorded. Pre-recorded drop-down lists, formatted fields such as the date field, and controls set up on free fields such as sample number fields reduce the recording of erroneous data and consequently improve their quality, but there are errors inherent in the user that the system cannot control such as the registration error on a patient\u2019s name or age. \n<\/p><p>Dashboards play an important role in monitoring laboratory activities and checking the evolution of the population health status. They also help monitor the laboratory\u2019s performance indicator and the drug resistance shown by the MDR-TB and XDR-TB. \n<\/p><p>Overall, the LIS was very well accepted, and all the staff of the laboratory found it comfortable since it is commonly used in the laboratory\u2019s data management activities. The figures of the survey carried out is in the same line with those of the study carried out in South Africa which showed the good perception of computer systems in the management of health structures.<sup id=\"rdp-ebb-cite_ref-ClineInform13_15-0\" class=\"reference\"><a href=\"#cite_note-ClineInform13-15\">[15]<\/a><\/sup> A lab technician disagreed with the statement that the system saves time in performing data management tasks. After investigation, some bugs were found during the generation of the smear distribution table depending on the result, type of the sample, origin of the biological product, and the profile of the sample, thus motivating this response. This particular malfunction has been corrected. Unlike the study conducted by Barbara Castelnuovo <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-CastelnuovoImp12_2-1\" class=\"reference\"><a href=\"#cite_note-CastelnuovoImp12-2\">[2]<\/a><\/sup> who showed unwilling users adopted the electronic information systems, our study demonstrated the total approval of the laboratory staff as to upgrading to the LIS, as shown by the results of the LIS evaluation by users. \n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusions\">Conclusions<\/span><\/h2>\n<p>The introduction of the information system for the microbiology laboratory has necessitated the understanding and modeling of various processes used from the arrival of the examination requests to the printing of the results. The system, currently in use, has been readily adopted by the laboratory staff, who see a tool facilitating their work for laboratory technicians. LIS reduces the workload by automating a generation of the results and reports. For the biologists in charge of the laboratory, the LIS\u2014through the dashboards\u2014allows them to have in real time the indicators on the follow-up of samples, the activity carried out in the laboratory, and the state of resistance to antituberculosis treatments. The LIS has a positive impact on the laboratory activities, but obtaining a perfect tool is only done in a cycle of sustained improvement. \n<\/p>\n<h2><span class=\"mw-headline\" id=\"Additional_material\">Additional material<\/span><\/h2>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"6\"><b>Appendix A<\/b> Questionnaire Instrument (Likert Scale Structured Questions)<br \/>Please answer the following questions by indicating which answer most accurately represents the extent to which you agree or disagree with the statement on the left. There can only be one answer per statement.\n<\/td><\/tr>\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Strongly Agree\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Agree\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Neither Agree<br \/>Nor Disagree\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Disagree\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Strongly Disagree\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1. I am completely comfortable with LIS\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">2. LIS saves time in completing tasks\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3. The LIS allows access to information quickly and securely\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">4. LIS improves the overall quality of work\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">5. I plan to move to the exclusive use of the LIS for laboratory data management\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>The authors would like to thank all the members of the Epidemiology and Laboratory Unit for Tuberculous and Atypical Mycobacteria. Special thanks to Anatole Mian for the English translation. \n<\/p>\n<h3><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h3>\n<p>This work did not receive any internal funding for its design. This is a project of the Epidemiology Unit to improve the data management of the National Reference Center for Tuberculosis in C\u00f4te d\u2019Ivoire. No external funding was also received for this project.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Contributions_of_the_authors\">Contributions of the authors<\/span><\/h3>\n<p>CJK made the modeling and design, coding the application and writing the manuscript, TA made the conception and modeling of the Fact Sheet, MKD reading and validation of the manuscript, NR made the conception and validation of graphical interfaces and dashboard, SMK conception and modelisation epidemiological record reading and validation of the manuscript. All authors read and approved the final manuscript.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Conflict_of_interest\">Conflict of interest<\/span><\/h3>\n<p>None declared.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-WHOWorld16-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WHOWorld16_1-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Wourld Health organization (2016). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/apps.who.int\/medicinedocs\/en\/d\/Js23098en\/\" data-key=\"30f89d9d686d7e75540d090b0ecb6a57\"><i>Global Tuberculosis Report 2016<\/i><\/a>. World Health Organization. pp. 214. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9789241565394<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/apps.who.int\/medicinedocs\/en\/d\/Js23098en\/\" data-key=\"30f89d9d686d7e75540d090b0ecb6a57\">http:\/\/apps.who.int\/medicinedocs\/en\/d\/Js23098en\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 October 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Global+Tuberculosis+Report+2016&rft.aulast=Wourld+Health+organization&rft.au=Wourld+Health+organization&rft.date=2016&rft.pages=pp.%26nbsp%3B214&rft.pub=World+Health+Organization&rft.isbn=9789241565394&rft_id=http%3A%2F%2Fapps.who.int%2Fmedicinedocs%2Fen%2Fd%2FJs23098en%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CastelnuovoImp12-2\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-CastelnuovoImp12_2-0\">2.0<\/a><\/sup> <sup><a href=\"#cite_ref-CastelnuovoImp12_2-1\">2.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Castelnuovo, B.; Kiragga, A.; Afayo, V. et al. (2012). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3524185\" data-key=\"e939186a9ea13152c8f9a4c2e5f6df0f\">\"Implementation of provider-based electronic medical records and improvement of the quality of data in a large HIV program in Sub-Saharan Africa\"<\/a>. <i>PLoS One<\/i> <b>7<\/b> (12): e51631. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pone.0051631\" data-key=\"57882244746e7c01d7817fd08a518add\">10.1371\/journal.pone.0051631<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3524185\/\" data-key=\"e0c223b6cf424f0c0d80a3289fc4e97c\">PMC3524185<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23284728\" data-key=\"b01caf97fa10f7619a99a794165c9311\">23284728<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3524185\" data-key=\"e939186a9ea13152c8f9a4c2e5f6df0f\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3524185<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Implementation+of+provider-based+electronic+medical+records+and+improvement+of+the+quality+of+data+in+a+large+HIV+program+in+Sub-Saharan+Africa&rft.jtitle=PLoS+One&rft.aulast=Castelnuovo%2C+B.%3B+Kiragga%2C+A.%3B+Afayo%2C+V.+et+al.&rft.au=Castelnuovo%2C+B.%3B+Kiragga%2C+A.%3B+Afayo%2C+V.+et+al.&rft.date=2012&rft.volume=7&rft.issue=12&rft.pages=e51631&rft_id=info:doi\/10.1371%2Fjournal.pone.0051631&rft_id=info:pmc\/PMC3524185&rft_id=info:pmid\/23284728&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3524185&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RhoadsClin14-3\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-RhoadsClin14_3-0\">3.0<\/a><\/sup> <sup><a href=\"#cite_ref-RhoadsClin14_3-1\">3.1<\/a><\/sup> <sup><a href=\"#cite_ref-RhoadsClin14_3-2\">3.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Rhoads, D.D.; Sintchenko, V.; Rauch, C.A. et al. (2014). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4187636\" data-key=\"87bff39e3d49474246b4d31b522e4242\">\"Clinical microbiology informatics\"<\/a>. <i>Clinical Microbiology Reviews<\/i> <b>27<\/b> (4): 1025-47. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1128%2FCMR.00049-14\" data-key=\"e2373a15def839a98e67e8389797bded\">10.1128\/CMR.00049-14<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4187636\/\" data-key=\"6f7de6cd61ca18224386e2c3a3b64afe\">PMC4187636<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25278581\" data-key=\"ce08845fe6acb9854f3210714e10432e\">25278581<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4187636\" data-key=\"87bff39e3d49474246b4d31b522e4242\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4187636<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Clinical+microbiology+informatics&rft.jtitle=Clinical+Microbiology+Reviews&rft.aulast=Rhoads%2C+D.D.%3B+Sintchenko%2C+V.%3B+Rauch%2C+C.A.+et+al.&rft.au=Rhoads%2C+D.D.%3B+Sintchenko%2C+V.%3B+Rauch%2C+C.A.+et+al.&rft.date=2014&rft.volume=27&rft.issue=4&rft.pages=1025-47&rft_id=info:doi\/10.1128%2FCMR.00049-14&rft_id=info:pmc\/PMC4187636&rft_id=info:pmid\/25278581&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4187636&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-El-KarehImpact12-4\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-El-KarehImpact12_4-0\">4.0<\/a><\/sup> <sup><a href=\"#cite_ref-El-KarehImpact12_4-1\">4.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">El-Kareh, R.; Roy, C.; Williams, D.H. et al. (2012). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3445692\" data-key=\"7002516d13cd8edfcc87965d05f312b6\">\"Impact of automated alerts on follow-up of post-discharge microbiology results: a cluster randomized controlled trial\"<\/a>. <i>Journal of General Internal Medicine<\/i> <b>27<\/b> (10): 1243-50. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2Fs11606-012-1986-8\" data-key=\"552fd5172fbcff0998a795eaae665e64\">10.1007\/s11606-012-1986-8<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3445692\/\" data-key=\"beb973a3492f32371746d3d72b10d165\">PMC3445692<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22278302\" data-key=\"6e3c92d3a9d6b2dd9d86c903a8ad07bc\">22278302<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3445692\" data-key=\"7002516d13cd8edfcc87965d05f312b6\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3445692<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Impact+of+automated+alerts+on+follow-up+of+post-discharge+microbiology+results%3A+a+cluster+randomized+controlled+trial&rft.jtitle=Journal+of+General+Internal+Medicine&rft.aulast=El-Kareh%2C+R.%3B+Roy%2C+C.%3B+Williams%2C+D.H.+et+al.&rft.au=El-Kareh%2C+R.%3B+Roy%2C+C.%3B+Williams%2C+D.H.+et+al.&rft.date=2012&rft.volume=27&rft.issue=10&rft.pages=1243-50&rft_id=info:doi\/10.1007%2Fs11606-012-1986-8&rft_id=info:pmc\/PMC3445692&rft_id=info:pmid\/22278302&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3445692&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-OrdrePlan17-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-OrdrePlan17_5-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Ordre National Des Medecins de C\u00f4te d\u2019Ivoire (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/doczz.fr\/doc\/3118472\/pdncs-ci---order-national-des-des-decins-de-ivoire-divoire\" data-key=\"2b23af07c7dd3384c4fc5f34f0969d0e\">\"Plan Strategique de Cybersante\"<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/doczz.fr\/doc\/3118472\/pdncs-ci---order-national-des-des-decins-de-ivoire-divoire\" data-key=\"2b23af07c7dd3384c4fc5f34f0969d0e\">http:\/\/doczz.fr\/doc\/3118472\/pdncs-ci---order-national-des-des-decins-de-ivoire-divoire<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 15 January 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Plan+Strategique+de+Cybersante&rft.atitle=&rft.aulast=Ordre+National+Des+Medecins+de+C%C3%B4te+d%E2%80%99Ivoire&rft.au=Ordre+National+Des+Medecins+de+C%C3%B4te+d%E2%80%99Ivoire&rft.date=2017&rft_id=http%3A%2F%2Fdoczz.fr%2Fdoc%2F3118472%2Fpdncs-ci---order-national-des-des-decins-de-ivoire-divoire&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MatheronComprendre03-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MatheronComprendre03_6-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Matheron, J.P. (2003). <i>Comprendre Merise: Outils conceptuels et organisationnels<\/i> (10th ed.). Eyrolles. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9782212075021.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Comprendre+Merise%3A+Outils+conceptuels+et+organisationnels&rft.aulast=Matheron%2C+J.P.&rft.au=Matheron%2C+J.P.&rft.date=2003&rft.edition=10th&rft.pub=Eyrolles&rft.isbn=9782212075021&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MySQLWorkbench-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MySQLWorkbench_7-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.mysql.com\/fr\/products\/workbench\/\" data-key=\"fb061142187fc7d2bcc8a2164969a633\">\"MySQL Workbench\"<\/a>. Oracle Corporation<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.mysql.com\/fr\/products\/workbench\/\" data-key=\"fb061142187fc7d2bcc8a2164969a633\">https:\/\/www.mysql.com\/fr\/products\/workbench\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 October 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=MySQL+Workbench&rft.atitle=&rft.pub=Oracle+Corporation&rft_id=https%3A%2F%2Fwww.mysql.com%2Ffr%2Fproducts%2Fworkbench%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ApacheStruts-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ApacheStruts_8-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/struts.apache.org\/\" data-key=\"0f08fc106a0adb887de68f0cde71c7ba\">\"Apache Struts\"<\/a>. Apache Software Foundation<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/struts.apache.org\/\" data-key=\"0f08fc106a0adb887de68f0cde71c7ba\">https:\/\/struts.apache.org\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 October 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Apache+Struts&rft.atitle=&rft.pub=Apache+Software+Foundation&rft_id=https%3A%2F%2Fstruts.apache.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Hibernate-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Hibernate_9-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/hibernate.org\/\" data-key=\"97a1c7efc1bf8260fffbfc82b5194b3c\">\"Hibernate\"<\/a>. Red Hat, Inc. 04 May 2018<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/hibernate.org\/\" data-key=\"97a1c7efc1bf8260fffbfc82b5194b3c\">http:\/\/hibernate.org\/<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Hibernate&rft.atitle=&rft.date=04+May+2018&rft.pub=Red+Hat%2C+Inc&rft_id=http%3A%2F%2Fhibernate.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JRLib-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-JRLib_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/community.jaspersoft.com\/project\/jasperreports-library\" data-key=\"6ed78dab288f2338866f4f1d53a6a2bb\">\"JasperReports Library\"<\/a>. TIBCO Software, Inc<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/community.jaspersoft.com\/project\/jasperreports-library\" data-key=\"6ed78dab288f2338866f4f1d53a6a2bb\">https:\/\/community.jaspersoft.com\/project\/jasperreports-library<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 16 January 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=JasperReports+Library&rft.atitle=&rft.pub=TIBCO+Software%2C+Inc&rft_id=https%3A%2F%2Fcommunity.jaspersoft.com%2Fproject%2Fjasperreports-library&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ApachePOI-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ApachePOI_11-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">\"Apache POI\". Apache Software Foundation.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Apache+POI&rft.atitle=&rft.pub=Apache+Software+Foundation&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GlassFish-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GlassFish_12-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Oracle Corporation. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/javaee.github.io\/glassfish\" data-key=\"6ee889ef7b744b1366937e6e76da1b80\">\"GlassFish: The Open Source Java EE Reference Implementation\"<\/a>. <i>Github<\/i><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/javaee.github.io\/glassfish\" data-key=\"6ee889ef7b744b1366937e6e76da1b80\">https:\/\/javaee.github.io\/glassfish<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 14 January 2019<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=GlassFish%3A+The+Open+Source+Java+EE+Reference+Implementation&rft.atitle=Github&rft.aulast=Oracle+Corporation&rft.au=Oracle+Corporation&rft_id=https%3A%2F%2Fjavaee.github.io%2Fglassfish&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PatreonDateAndTime-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PatreonDateAndTime_13-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text\" href=\"#feat=input-datetime\">\"Date and time input types\"<\/a>. <i>Can I use<\/i>. Patreon<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free\" href=\"#feat=input-datetime\">https:\/\/caniuse.com\/#feat=input-datetime<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 14 January 2019<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Date+and+time+input+types&rft.atitle=Can+I+use&rft.pub=Patreon&rft_id=https%3A%2F%2Fcaniuse.com%2F%23feat%3Dinput-datetime&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DemeuseEchelles08-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DemeuseEchelles08_14-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Demeuse, M. (2008). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/iredu.u-bourgogne.fr\/images\/stories\/Documents\/Cours_disponibles\/Demeuse\/Cours\/p5.3.pdf\" data-key=\"d1dd64aa2af59c71818d91cb799f5140\">\"Chapter 5.3 Echelles de Likert ou m\u00e9thode des classements additionn\u00e9s\"<\/a> (PDF). <i>Introduction aux th\u00e9ories et aux m\u00e9thodes de la mesure en sciences psychologiques et en sciences de l'\u00e9ducation<\/i>. pp. 213\u201318<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/iredu.u-bourgogne.fr\/images\/stories\/Documents\/Cours_disponibles\/Demeuse\/Cours\/p5.3.pdf\" data-key=\"d1dd64aa2af59c71818d91cb799f5140\">http:\/\/iredu.u-bourgogne.fr\/images\/stories\/Documents\/Cours_disponibles\/Demeuse\/Cours\/p5.3.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 January 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Chapter+5.3+Echelles+de+Likert+ou+m%C3%A9thode+des+classements+additionn%C3%A9s&rft.atitle=Introduction+aux+th%C3%A9ories+et+aux+m%C3%A9thodes+de+la+mesure+en+sciences+psychologiques+et+en+sciences+de+l%27%C3%A9ducation&rft.aulast=Demeuse%2C+M.&rft.au=Demeuse%2C+M.&rft.date=2008&rft.pages=pp.%26nbsp%3B213%E2%80%9318&rft_id=http%3A%2F%2Firedu.u-bourgogne.fr%2Fimages%2Fstories%2FDocuments%2FCours_disponibles%2FDemeuse%2FCours%2Fp5.3.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ClineInform13-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ClineInform13_15-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Cline, G.B.; Luiz, J.M. (2013). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3570341\" data-key=\"fe9da7f3be466304ec5c93581c0600f5\">\"Information technology systems in public sector health facilities in developing countries: the case of South Africa\"<\/a>. <i>BMC Medical Informatics and Decision Making<\/i> <b>13<\/b>: 13. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1186%2F1472-6947-13-13\" data-key=\"1233d17e1a257d26e11b458d652efbd5\">10.1186\/1472-6947-13-13<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3570341\/\" data-key=\"40e861c582cabe515490b2b7d1a7ad52\">PMC3570341<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23347433\" data-key=\"3b33fb22f4f8d4bb7c53bdfe290b94c6\">23347433<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3570341\" data-key=\"fe9da7f3be466304ec5c93581c0600f5\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3570341<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Information+technology+systems+in+public+sector+health+facilities+in+developing+countries%3A+the+case+of+South+Africa&rft.jtitle=BMC+Medical+Informatics+and+Decision+Making&rft.aulast=Cline%2C+G.B.%3B+Luiz%2C+J.M.&rft.au=Cline%2C+G.B.%3B+Luiz%2C+J.M.&rft.date=2013&rft.volume=13&rft.pages=13&rft_id=info:doi\/10.1186%2F1472-6947-13-13&rft_id=info:pmc\/PMC3570341&rft_id=info:pmid\/23347433&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3570341&rfr_id=info:sid\/en.wikipedia.org:Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID and DOI when they were missing from the original reference. Under \"Architecture and system features,\" the original article had a poorly punctuated and confusing sentence about browser support for date and time input types; it has been updated to be more correct, with a citation added. The URL to GlassFish was updated to show the 4.x versions. The original had reference 13 in the References section, but it was never referenced in-line; it was omitted in this version.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185646\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.429 seconds\nReal time usage: 0.467 seconds\nPreprocessor visited node count: 12126\/1000000\nPreprocessor generated node count: 33856\/1000000\nPost\u2010expand include size: 89465\/2097152 bytes\nTemplate argument size: 27453\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 419.825 1 - -total\n 75.79% 318.203 1 - Template:Reflist\n 63.56% 266.825 15 - Template:Citation\/core\n 28.58% 119.996 8 - Template:Cite_web\n 22.13% 92.899 4 - Template:Cite_journal\n 18.43% 77.387 1 - Template:Infobox_journal_article\n 18.38% 77.151 3 - Template:Cite_book\n 17.63% 74.031 1 - Template:Infobox\n 9.25% 38.827 80 - Template:Infobox\/row\n 6.04% 25.341 14 - Template:Citation\/identifier\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10854-0!*!0!!en!5!* and timestamp 20190401185646 and revision id 34758\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire\">https:\/\/www.limswiki.org\/index.php\/Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","625b72cffd2a8d803eb5cb58c6ef954e_images":["https:\/\/www.limswiki.org\/images\/a\/a2\/Fig1_Kon%C3%A9_JofHlthManInfo2019_6-1.png","https:\/\/www.limswiki.org\/images\/8\/8f\/Fig2_Kon%C3%A9_JofHlthManInfo2019_6-1.png","https:\/\/www.limswiki.org\/images\/f\/fe\/Fig3_Kon%C3%A9_JofHlthManInfo2019_6-1.png","https:\/\/www.limswiki.org\/images\/1\/1f\/Fig4_Kon%C3%A9_JofHlthManInfo2019_6-1.png","https:\/\/www.limswiki.org\/images\/5\/57\/Fig5_Kon%C3%A9_JofHlthManInfo2019_6-1.png"],"625b72cffd2a8d803eb5cb58c6ef954e_timestamp":1554145006,"945e3454ada339aaa7a7668d339d588c_type":"article","945e3454ada339aaa7a7668d339d588c_title":"Codesign of the Population Health Information Management System to measure reach and practice change of childhood obesity programs (Green et al. 2018)","945e3454ada339aaa7a7668d339d588c_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs","945e3454ada339aaa7a7668d339d588c_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Codesign of the Population Health Information Management System to measure reach and practice change of childhood obesity programs\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nCodesign of the Population Health Information Management System to measure reach and practice change of childhood obesity programsJournal\n \nPublic Health Research & PracticeAuthor(s)\n \nGreen, Amanda M.; Innes-Hughes, Christine; Rissel, Chris; Mitchell, Jo; Milat, Andrew J.;\r\nWilliams, Mandy; Persson, Lina; Thackway, Sarah; Lewis, Nicola; Wiggers, JohnAuthor affiliation(s)\n \nNSW Ministry of Health, University of Sydney, South Western Sydney Local Health District,\r\nHunter New England Local Health District, University of NewcastlePrimary contact\n \nEmail: Amanda dot Green at health dot nsw dot gov dot auYear published\n \n2018Volume and issue\n \n28(3)Page(s)\n \ne2831822DOI\n \n10.17061\/phrp2831822ISSN\n \n2204-2091Distribution license\n \nCreative Commons Attribution-NonCommercial-ShareAlike 4.0 InternationalWebsite\n \nhttp:\/\/www.phrp.com.au\/issues\/september-2018-volume-28-issue-3\/Download\n \nhttp:\/\/www.phrp.com.au\/wp-content\/uploads\/2018\/09\/PHRP2831822.pdf (PDF)\n\nContents\n\n1 Abstract \n2 Introduction \n3 Design and development \n\n3.1 Governance \n3.2 Development process \n3.3 Technical requirements and specifications \n3.4 User acceptance testing \n3.5 Training and deployment \n3.6 User interface and reporting \n\n\n4 Use of PHIMS \n\n4.1 System analytics \n4.2 Monitoring data \n4.3 Challenges \n4.4 Benefits \n\n\n5 Conclusion \n6 Acknowledgements \n\n6.1 Author contributions \n6.2 Competing interests \n6.3 Peer review and provenance \n\n\n7 References \n8 Notes \n\n\n\nAbstract \nIntroduction: Childhood obesity prevalence is an issue of international public health concern, and governments have a significant role to play in its reduction. The Healthy Children Initiative (HCI) has been delivered in New South Wales (NSW), Australia, since 2011 to support implementation of childhood obesity prevention programs at scale. Consequently, a system to support local implementation and data collection, analysis, and reporting at local and state levels was necessary. The Population Health Information Management System (PHIMS) was developed to meet this need.\nDesign and development: A collaborative and iterative process was applied to the design and development of the system. The process comprised identifying technical requirements, building system infrastructure, delivering training, deploying the system, and implementing quality measures.\nUse of PHIMS: Implementation of PHIMS resulted in rapid data retrieval and reporting against agreed performance measures for the HCI. The system has 150 users who account for the monitoring and reporting of more than 6000 HCI intervention sites (early childhood services and primary schools).\nLessons learned: Developing and implementing PHIMS presented a number of complexities including: applying an information technology (IT) development methodology to a traditional health promotion setting; data access and confidentiality issues; and managing system development and deployment to intended timelines and budget. PHIMS was successfully codesigned as a flexible, scalable, and sustainable IT solution that supports state-wide HCI program implementation, monitoring, and reporting.\n\nIntroduction \nChildhood overweight and obesity is of international public health concern, and governments have a significant role to play in addressing the issue.[1] In New South Wales (NSW), Australia, the prevalence of childhood overweight and obesity remains high, at 21 percent.[2]\nFrom 2011 to 2014, the Australian Government implemented the National Partnership Agreement on Preventive Health, which provided a historic increase in funding to prevent chronic disease. In NSW, this coordinated prevention effort for children was delivered through the Healthy Children Initiative (HCI) by the NSW Ministry of Health (the Ministry). The HCI involves the implementation of primary and secondary obesity prevention programs across the state in settings attended by children, for example, early childhood services and primary schools.[3]\nA well-established health promotion workforce existed in NSW that had designed and implemented programs in these settings and had the potential to achieve state-wide population-level reach and outcomes. However, a significant scaling up of delivery and monitoring of these programs was required to effect population-level change. To facilitate this, enhanced funding was provided to all 15 NSW Government Local Health District (LHD) health promotion services to support local implementation of these programs.[4]\nTwo programs\u2014Munch & Move and Live Life Well @ School (LLW@S)\u2014were identified from pilot programs to be delivered at scale across NSW as part of the HCI. Munch & Move had a potential reach of more than 3,500 center-based early childhood services and more than 190,000 children aged 0\u20135 years. LLW@S had a potential reach of more than 2,400 primary schools with more than 675,000 students.\nInitial implementation of both Munch & Move and LLW@S involved training of educators and teachers to embed the promotion of healthy behaviors in their organizational policy and routine practice. To ensure the successful translation into routine practice, educators and teachers were supported by their local LHD through regular visits or phone calls, and they were monitored through a set of program adoption indicators referred to as \"practices.\" These evidence based practices refer to organizational policies and practices related to nutrition, physical activity, and sedentary behavior (see Tables 1 and 2).[5][6]\n\n\n\n\n\n\n\nTable 1. Munch & Move practices\n\n\nEncouraging healthy eating\n\nLunchboxes monitored daily\n\n\nFruit and vegetables at least once per day\n\n\nOnly healthy snacks on the menu\n\n\nWater or age-appropriate drinks every day\n\n\nHealthy eating learning experiences at least twice per week\n\n\nDaily physical activity\n\nTummy time for babies every day\n\n\nPhysical activity for at least 25% of opening hours (ages 1\u22125 years)\n\n\nFundamental movement skills every day (ages 3\u22125 years)\n\n\nAppropriate use of small-screen recreation (ages 3\u20135 years)\n\n\nPolicies in place\n\nWritten nutrition policy\n\n\nWritten physical activity policy\n\n\nWritten policy restricting small-screen recreation\n\n\nProfessional development and monitoring\n\nHealth information provided to families\n\n\nNutrition and physical activity training for staff\n\n\nAnnual monitoring and reporting\n\n\n\n\n\n\n\n\n\nTable 2. Live Life Well @ School (LLW@S) practices\n\n\nCurriculum\n\nHealthy eating and physical activity learning experiences\n\n\nPersonal development, health and physical education includes fundamental movement skills\n\n\nEncouraging healthy eating and physical activity\n\nFruit, vegetables and water breaks\n\n\nPhysical activity during breaks\n\n\nSupportive environment for healthy eating\n\n\nCommunication with families\n\n\nProfessional development and monitoring\n\nProfessional development of staff\n\n\nSchool team supports LLW@S\n\n\nSchool plans incorporate LLW@S strategies\n\n\nAnnual monitoring and reporting\n\n\n\nConsistent with World Health Organization recommendations, a comprehensive HCI monitoring framework was developed to guide the review of program implementation in early childhood services and primary schools. This framework included HCI measures in the annual service agreements between the Ministry (the funder) and LHDs (the providers). Achievement against the measure was reviewed quarterly.\nContemporary and effective delivery of population-level health interventions requires innovative technology and fresh approaches to monitoring and reporting. Conte et al.[7] described the lack of evidence about whether an e-monitoring system improved the implementation of evidence-based preventive programs. However, because performance measures were included in the LHD service agreements, a system was needed to support the implementation of the HCI at the local level, and the collection, recording, analysis, and reporting of this data at both local and state levels. To achieve this, an information technology (IT) system called the Population Health Information Management System (PHIMS) was developed to perform these functions for both LHD and Ministry staff.\n\nDesign and development \nGovernance \nIn July 2011, a project board was formed with representatives from each of the project stakeholder groups across the Ministry and LHDs. The purpose of the board was to facilitate collaboration and guide the development of an overarching performance monitoring framework. A dedicated business analyst was engaged to consult with future users and to develop the business requirements document.\n\nDevelopment process \nCodesign was undertaken between the Ministry and LHDs, who contributed to the development of the business requirements and monitored the implementation of the new system. A third-party vendor was contracted to undertake the build, user acceptance testing, and deployment support.\nDevelopment involved an iterative process. The system\u2019s dual purposes (local and state-level application) were first clarified, with subsequent identification of related needs and operational priorities to be included in the business requirements document. This was completed in May 2012 and used by the solution architect and system developers to guide the design of the functional architecture. A commercial off-the-shelf solution was adapted to build a \"fit-for-purpose\" system that met the needs of stakeholders and could be integrated into an existing organizational system.\nUser acceptance testing, training of LHD staff, and deployment followed, with the system going live in August 2014. Project initiation to deployment took just over three years; the stages of development are depicted in Figure 1.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Timeline GANTT chart\n\n\n\nTechnical requirements and specifications \nThe business analysis stage of the project identified two key requirements of the system: to support local delivery of HCI programs across NSW by providing contact management and scheduling capabilities for health promotion officers (HPOs); and to support the Ministry performance management framework.[8] \nThe system uses a Windows-based application that uses Microsoft Dynamics Customer Relationship Management (CRM), Select Survey (questionnaire component), SSRS (SQL Server Reporting Services) for reporting, and a state-wide login service.\n\nUser acceptance testing \nUser input and feedback was obtained from state-level program managers and through demonstration visits with LHD health promotion services. This ensured user needs were met and questions about the new system were addressed.\n\nTraining and deployment \nDeployment of PHIMS was supported by representatives from each of the 15 LHDs (referred to as \"champions\") and 11 staff within the Ministry. The business analyst conducted a two-day training course with the LHD champions and Ministry staff. Then, prior to system deployment, webinar training was conducted for each LHD, consisting of 40 sessions (lasting one to two hours each) with approximately 100 attendees. A set of tip sheets was also developed.\nThe system was deployed in six stages from August 2014, starting with a pilot in one LHD and subsequent sequential roll-out to the remaining 14 LHDs, completed in September 2014.\nData cleaning and migration took more than 12 months, running in parallel with the build, user acceptance testing, and deployment stages. Identifying and addressing data migration issues was a key factor in the successful deployment of PHIMS. Various interim data systems were in place in LHDs and needed to be integrated during the data migration process. Most issues were noncritical and occurred after data migration or within the first three months after deployment. Most of the system issues (550\/566, 97%) had been resolved within 12 months of deployment.\n\nUser interface and reporting \nThe user views a dashboard screen that contains a bulletin board for system announcements, website links, and real-time program adoption graphs for the Munch & Move and LLW@S programs. The system is made up of a hierarchy of forms (windows) that present information and enable the user to perform various actions.\nPHIMS has the following key features that are used by HPOs to support local implementation of HCI programs:\n1. Contact management: supports HPO workflows for managing interactions with sites (i.e., early childhood services and primary schools) and allows the user to record details in the one location. This has improved record keeping and retrieval and is a well-used function of the system with more than 54,000 entries.\n2. Capacity to record each site\u2019s training: demonstrates program reach (i.e., sites formally trained in Munch & Move or LLW@S)\n3. Scheduled follow-up alerts and recording of program adoption: alerts an HPO to when a site visit is due at one-, six- and 12-month intervals. During the visit, the HPO collects data on the program practices and enters it into the scheduled follow-up summary form (which links to the questionnaire [Select Survey] component). This data is then used for the performance monitoring reports. Other details from the visits are captured in the contact notes.\n4. Operational reporting: allows HPOs to see which sites need additional support. Examples of reports are summaries of training entries, scheduled follow-ups, practice achievement, and program adoption over time.\nPHIMS has the capability to support reporting at state level by monitoring program reach (e.g., number of sites trained), practice achievement, and program adoption.\nThe reports are generated in real time and allow the user to search and display data by specific criteria, for example, NSW, LHD, or local government area, or sites in disadvantaged areas.\n\nUse of PHIMS \nSystem analytics \nThe system currently has 150 users across the 15 LHDs and the Ministry. They represent the workforce in each LHD and all state-level program and performance monitoring staff, and collectively are responsible for the monitoring and reporting of more than 6000 sites.\nEducators and teachers have attended more than 24,000 training events (i.e., workshops, conferences, webinars).\nThere are 70 reports available for day-to-day operational use by LHDs and to monitor performance by the Ministry.\n\nMonitoring data \nPHIMS has provided a mechanism for reporting changes in program implementation in targeted children\u2019s settings over time. Box 1 describes the reach and adoption of the Munch & Move and LLW@S programs.\n\n\n\n\n\n\n\nBox 1. Example of how PHIMS data can be used for program monitoring and reporting\n\n\nFrom 2008 to June 2015, 89% (3288\/3691) of early childhood services across NSW participated in Munch & Move training. Since state-wide monitoring started in 2012, there has been steady growth in the number of early childhood services that have adopted Munch & Move. Adoption of the program is reported with reference to the number of services achieving 70% (or more) of the practices that are relevant for that particular service. There has been a statistically significant increase in the proportion of early childhood services that have adopted the program. In 2012, the total for NSW was 36% and, by 2015, this increased to 78% (p < 0.001).[5]\nFrom 2008 to June 2015, 84% (2039\/2440) of primary schools across NSW participated in LLW@S training. The proportion of trained primary schools that have achieved the practices has significantly increased from 32% in 2012 to 77% in 2015 (p < 0.001).[6]\n\n\n\n\nPHIMS has delivered an innovative IT solution[9] in a health promotion setting. Engaging in a best-practice \"agile\"[10] and collaborative development process was a significant contributor to the effectiveness and high uptake of the system.\n\nChallenges \nThere were several complexities experienced in developing and implementing PHIMS. For example, LHDs requested confidentiality around operational data. Developing data governance and reporting protocols in response to the various needs of stakeholders was a key feature.\nThe decision to modify an off-the-shelf solution offered a value-for-money, sustainable, and flexible final product. The final centralized cost was close to AU$1 million.\nNegotiating the local and state health IT environments to support important functionality in differing operating environments and to achieve single sign-on was a major undertaking.\n\nBenefits \nPHIMS appears to have multiple benefits. These include the ability to rapidly retrieve data for both operational and monitoring purposes. This has revolutionized the implementation of the HCI programs and is an important factor in the high degree of user acceptance and uptake. This high degree of acceptability contributes to the sustainability of the system.\nImportantly, PHIMS has provided an opportunity for health promotion initiatives and staff to have a \"seat at the table\" with agents of authority (such as LHD chief executives and performance management executives) by providing real-time data on progress against performance measures. PHIMS also provides valuable performance monitoring and feedback to those who directly implement the program.\n\nConclusion \nPHIMS was successfully codesigned to be a flexible, scalable, and sustainable IT solution that supports HCI program implementation and provides data for state-level monitoring and reporting against agreed LHD performance measures. PHIMS could potentially be expanded to include other health promotion programs such as food provision in health facilities and tobacco retailer compliance.\n\nAcknowledgements \nWe acknowledge the contributions of Liz King, Neil Orr, Louise Farrell, Bev Lloyd, Andy Bravo, Masela Draper, Deni Fukunishi, Andy Lui, Rita Lagaluga, Evan Freeman, Elena Ouspenskaia, Claudine Lyons, Kym Buffett, Rhonda Matthews, Project Advisory Board Members, Steering Committee Members, Transition Group and Reference Group Members and the Directors and staff of Local Health District Health Promotion Services across NSW.\nThis paper was developed as part of a program of research on monitoring health promotion practice within The Australian Prevention Partnership Centre. It was funded through the National Health and Medical Research Council Partnership Centre Grants Scheme (ID GNT 9100001) with the Australian Government Department of Health, the NSW Ministry of Health, ACT Health and the HCF Research Foundation.\n\nAuthor contributions \nAG and CI-H drafted the manuscript. CR, JM, AM, MW, LP, ST, NL and JW reviewed and contributed to editing the manuscript. All authors read, revised and approved the final manuscript. All authors contributed to the development and\/or implementation of PHIMS.\n\nCompeting interests \nNone declared.\n\nPeer review and provenance \nExternally peer reviewed, commissioned.\n\nReferences \n\n\n\u2191 Commission on Ending Childhood Obesity (2016). Report on the Commission on Ending Childhood Obesity. World Health Organization. pp. 50. ISBN 9789241510066. http:\/\/apps.who.int\/iris\/bitstream\/handle\/10665\/204176\/9789241510066_eng.pdf;jsessionid=4EA7FECA778E7E3B8C7BA4420E1B2412?sequence=1 . Retrieved 07 February 2018 .   \n\n\u2191 \"Overweight and obesity in children aged 5\u201316 years, NSW 2007 to 2017\". HealthStats NSW. NSW Government. 08 May 2018. http:\/\/www.healthstats.nsw.gov.au\/Indicator\/beh_bmikid_cat . Retrieved 10 July 2018 .   \n\n\u2191 Innes-Hughes, C.; Bravo, A. Buffett, K. et al. (2017). NSW Healthy Children Initiative: The first five years July 2011 \u2013 June 2016. NSW Ministry of Health. pp. 43. ISBN 9781760007263. https:\/\/www.health.nsw.gov.au\/heal\/Publications\/HCI-report.pdf . Retrieved 08 February 2018 .   \n\n\u2191 \"Implementation Plan for The Health Children Initiative\" (PDF). Commonwealth of Australia. December 2012. http:\/\/www.federalfinancialrelations.gov.au\/content\/npa\/health\/_archive\/healthy_workers\/healthy_children\/NSW_IP_2013.pdf . Retrieved 08 February 2018 .   \n\n\u2191 5.0 5.1 Lockeridge, A.; Innes-Hughes, C.; O'Hara, B.J. et al. (2015). Munch & Move: Evidence and Evaluation Summary. NSW Ministry of Health. pp. 26. ISBN 9781760003029. https:\/\/www.health.nsw.gov.au\/heal\/Publications\/Munch-Move-Evaluation-Summary.pdf . Retrieved 08 February 2018 .   \n\n\u2191 6.0 6.1 Bravo, A.; Innes-Hughes, C.; O'Hara, B.J. et al. (2016). Live Life Well @ School: Evidence and Evaluation Summary 2008-2015. NSW Ministry of Health. pp. 31. ISBN 99781760004750. https:\/\/www.health.nsw.gov.au\/heal\/Publications\/Munch-Move-Evaluation-Summary.pdf . Retrieved 08 February 2018 .   \n\n\u2191 Conte, K.P.; Groen, S.; Loblay, V. et al. (2017). \"Dynamics behind the scale up of evidence-based obesity prevention: protocol for a multi-site case study of an electronic implementation monitoring system in health promotion practice\". Implementation Science 12 (1): 146. doi:10.1186\/s13012-017-0686-5. PMC PMC5718021. PMID 29208000. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5718021 .   \n\n\u2191 Farrell, L.; Lloyd, B.; Matthews, R. et al. (2014). \"Applying a performance monitoring framework to increase reach and adoption of children's healthy eating and physical activity programs\". Public Health Research & Practice 25 (1): e2511408. doi:10.17061\/phrp2511408. PMID 25828447.   \n\n\u2191 \"2015 New South Wales iAwards Winners & Merit Recipients\". iAwards. Australian Information Industry Association. 2015. https:\/\/www.iawards.com.au\/hidden-pages\/2015winners\/winners\/nsw . Retrieved 08 February 2018 .   \n\n\u2191 Beck, K.; Beedle, M.; van Bennekum, A. et al. (2001). \"Manifesto for Agile Software Development\". Ward Cunningham. http:\/\/agilemanifesto.org\/ . Retrieved 08 February 2018 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID and DOI when they were missing from the original reference.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\">https:\/\/www.limswiki.org\/index.php\/Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles on health informaticsLIMSwiki journal articles on public health informaticsLIMSwiki journal articles on software\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 7 January 2019, at 21:59.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 132 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","945e3454ada339aaa7a7668d339d588c_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Codesign of the Population Health Information Management System to measure reach and practice change of childhood obesity programs<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><b>Introduction<\/b>: Childhood obesity prevalence is an issue of international public health concern, and governments have a significant role to play in its reduction. The Healthy Children Initiative (HCI) has been delivered in New South Wales (NSW), Australia, since 2011 to support implementation of childhood obesity prevention programs at scale. Consequently, a system to support local implementation and data collection, analysis, and reporting at local and state levels was necessary. The Population Health Information Management System (PHIMS) was developed to meet this need.\n<\/p><p><b>Design and development<\/b>: A collaborative and iterative process was applied to the design and development of the system. The process comprised identifying technical requirements, building system infrastructure, delivering training, deploying the system, and implementing quality measures.\n<\/p><p><b>Use of PHIMS<\/b>: Implementation of PHIMS resulted in rapid data retrieval and reporting against agreed performance measures for the HCI. The system has 150 users who account for the monitoring and reporting of more than 6000 HCI intervention sites (early childhood services and primary schools).\n<\/p><p><b>Lessons learned<\/b>: Developing and implementing PHIMS presented a number of complexities including: applying an information technology (IT) development methodology to a traditional health promotion setting; <a href=\"https:\/\/www.limswiki.org\/index.php\/Information_security_management\" title=\"Information security management\" class=\"wiki-link\" data-key=\"153292309f5cd4eddf76eeb79c7f51a8\">data access and confidentiality issues<\/a>; and managing <a href=\"https:\/\/www.limswiki.org\/index.php\/Systems_development_life_cycle\" title=\"Systems development life cycle\" class=\"wiki-link\" data-key=\"b96939e19621960ee123770c13fa1a84\">system development and deployment<\/a> to intended timelines and budget. PHIMS was successfully codesigned as a flexible, scalable, and sustainable IT solution that supports state-wide HCI program implementation, monitoring, and reporting.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Childhood overweight and obesity is of international public health concern, and governments have a significant role to play in addressing the issue.<sup id=\"rdp-ebb-cite_ref-WHOReport16_1-0\" class=\"reference\"><a href=\"#cite_note-WHOReport16-1\">[1]<\/a><\/sup> In New South Wales (NSW), Australia, the prevalence of childhood overweight and obesity remains high, at 21 percent.<sup id=\"rdp-ebb-cite_ref-NSWOver18_2-0\" class=\"reference\"><a href=\"#cite_note-NSWOver18-2\">[2]<\/a><\/sup>\n<\/p><p>From 2011 to 2014, the Australian Government implemented the National Partnership Agreement on Preventive Health, which provided a historic increase in funding to prevent chronic disease. In NSW, this coordinated prevention effort for children was delivered through the Healthy Children Initiative (HCI) by the NSW Ministry of Health (the Ministry). The HCI involves the implementation of primary and secondary obesity prevention programs across the state in settings attended by children, for example, early childhood services and primary schools.<sup id=\"rdp-ebb-cite_ref-Innes-HughesNSWHealthy17_3-0\" class=\"reference\"><a href=\"#cite_note-Innes-HughesNSWHealthy17-3\">[3]<\/a><\/sup>\n<\/p><p>A well-established health promotion workforce existed in NSW that had designed and implemented programs in these settings and had the potential to achieve state-wide population-level reach and outcomes. However, a significant scaling up of delivery and monitoring of these programs was required to effect population-level change. To facilitate this, enhanced funding was provided to all 15 NSW Government Local Health District (LHD) health promotion services to support local implementation of these programs.<sup id=\"rdp-ebb-cite_ref-AGImplement12_4-0\" class=\"reference\"><a href=\"#cite_note-AGImplement12-4\">[4]<\/a><\/sup>\n<\/p><p>Two programs\u2014<i>Munch & Move<\/i> and <i>Live Life Well @ School<\/i> (<i>LLW@S<\/i>)\u2014were identified from pilot programs to be delivered at scale across NSW as part of the HCI. <i>Munch & Move<\/i> had a potential reach of more than 3,500 center-based early childhood services and more than 190,000 children aged 0\u20135 years. <i>LLW@S<\/i> had a potential reach of more than 2,400 primary schools with more than 675,000 students.\n<\/p><p>Initial implementation of both <i>Munch & Move<\/i> and <i>LLW@S<\/i> involved training of educators and teachers to embed the promotion of healthy behaviors in their organizational policy and routine practice. To ensure the successful translation into routine practice, educators and teachers were supported by their local LHD through regular visits or phone calls, and they were monitored through a set of program adoption indicators referred to as \"practices.\" These evidence based practices refer to organizational policies and practices related to nutrition, physical activity, and sedentary behavior (see Tables 1 and 2).<sup id=\"rdp-ebb-cite_ref-LockeridgeMunch15_5-0\" class=\"reference\"><a href=\"#cite_note-LockeridgeMunch15-5\">[5]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-BravoLive16_6-0\" class=\"reference\"><a href=\"#cite_note-BravoLive16-6\">[6]<\/a><\/sup>\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"2\"><b>Table 1.<\/b> <i>Munch & Move<\/i> practices\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" rowspan=\"5\"><b>Encouraging healthy eating<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Lunchboxes monitored daily\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Fruit and vegetables at least once per day\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Only healthy snacks on the menu\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Water or age-appropriate drinks every day\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Healthy eating learning experiences at least twice per week\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" rowspan=\"4\"><b>Daily physical activity<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Tummy time for babies every day\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Physical activity for at least 25% of opening hours (ages 1\u22125 years)\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Fundamental movement skills every day (ages 3\u22125 years)\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Appropriate use of small-screen recreation (ages 3\u20135 years)\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" rowspan=\"3\"><b>Policies in place<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Written nutrition policy\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Written physical activity policy\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Written policy restricting small-screen recreation\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" rowspan=\"3\"><b>Professional development and monitoring<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Health information provided to families\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Nutrition and physical activity training for staff\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Annual monitoring and reporting\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"2\"><b>Table 2.<\/b> <i>Live Life Well @ School<\/i> (<i>LLW@S<\/i>) practices\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" rowspan=\"2\"><b>Curriculum<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Healthy eating and physical activity learning experiences\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Personal development, health and physical education includes fundamental movement skills\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" rowspan=\"4\"><b>Encouraging healthy eating and physical activity<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Fruit, vegetables and water breaks\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Physical activity during breaks\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Supportive environment for healthy eating\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Communication with families\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" rowspan=\"4\"><b>Professional development and monitoring<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Professional development of staff\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">School team supports <i>LLW@S<\/i>\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">School plans incorporate <i>LLW@S<\/i> strategies\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Annual monitoring and reporting\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Consistent with World Health Organization recommendations, a comprehensive HCI monitoring framework was developed to guide the review of program implementation in early childhood services and primary schools. This framework included HCI measures in the annual service agreements between the Ministry (the funder) and LHDs (the providers). Achievement against the measure was reviewed quarterly.\n<\/p><p>Contemporary and effective delivery of population-level health interventions requires innovative technology and fresh approaches to monitoring and reporting. Conte <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-ConteDynamics17_7-0\" class=\"reference\"><a href=\"#cite_note-ConteDynamics17-7\">[7]<\/a><\/sup> described the lack of evidence about whether an e-monitoring system improved the implementation of evidence-based preventive programs. However, because performance measures were included in the LHD service agreements, a system was needed to support the implementation of the HCI at the local level, and the collection, recording, analysis, and reporting of this data at both local and state levels. To achieve this, an information technology (IT) system called the Population Health Information Management System (PHIMS) was developed to perform these functions for both LHD and Ministry staff.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Design_and_development\">Design and development<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Governance\">Governance<\/span><\/h3>\n<p>In July 2011, a project board was formed with representatives from each of the project stakeholder groups across the Ministry and LHDs. The purpose of the board was to facilitate collaboration and guide the development of an overarching performance monitoring framework. A dedicated business analyst was engaged to consult with future users and to develop the business requirements document.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Development_process\">Development process<\/span><\/h3>\n<p>Codesign was undertaken between the Ministry and LHDs, who contributed to the development of the business requirements and monitored the implementation of the new system. A third-party vendor was contracted to undertake the build, user acceptance testing, and deployment support.\n<\/p><p>Development involved an iterative process. The system\u2019s dual purposes (local and state-level application) were first clarified, with subsequent identification of related needs and operational priorities to be included in the business requirements document. This was completed in May 2012 and used by the solution architect and system developers to guide the design of the functional architecture. A commercial off-the-shelf solution was adapted to build a \"fit-for-purpose\" system that met the needs of stakeholders and could be integrated into an existing organizational system.\n<\/p><p>User acceptance testing, training of LHD staff, and deployment followed, with the system going live in August 2014. Project initiation to deployment took just over three years; the stages of development are depicted in Figure 1.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Green_PubHlthRsPract2018_28-3.jpg\" class=\"image wiki-link\" data-key=\"54e992ea8260f6f79cf1727278087ef3\"><img alt=\"Fig1 Green PubHlthRsPract2018 28-3.jpg\" src=\"https:\/\/www.limswiki.org\/images\/1\/1a\/Fig1_Green_PubHlthRsPract2018_28-3.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Timeline GANTT chart<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Technical_requirements_and_specifications\">Technical requirements and specifications<\/span><\/h3>\n<p>The business analysis stage of the project identified two key requirements of the system: to support local delivery of HCI programs across NSW by providing contact management and scheduling capabilities for health promotion officers (HPOs); and to support the Ministry performance management framework.<sup id=\"rdp-ebb-cite_ref-FarrellApplying14_8-0\" class=\"reference\"><a href=\"#cite_note-FarrellApplying14-8\">[8]<\/a><\/sup> \n<\/p><p>The system uses a Windows-based application that uses Microsoft Dynamics Customer Relationship Management (CRM), Select Survey (questionnaire component), SSRS (SQL Server Reporting Services) for reporting, and a state-wide login service.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"User_acceptance_testing\">User acceptance testing<\/span><\/h3>\n<p>User input and feedback was obtained from state-level program managers and through demonstration visits with LHD health promotion services. This ensured user needs were met and questions about the new system were addressed.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Training_and_deployment\">Training and deployment<\/span><\/h3>\n<p>Deployment of PHIMS was supported by representatives from each of the 15 LHDs (referred to as \"champions\") and 11 staff within the Ministry. The business analyst conducted a two-day training course with the LHD champions and Ministry staff. Then, prior to system deployment, webinar training was conducted for each LHD, consisting of 40 sessions (lasting one to two hours each) with approximately 100 attendees. A set of tip sheets was also developed.\n<\/p><p>The system was deployed in six stages from August 2014, starting with a pilot in one LHD and subsequent sequential roll-out to the remaining 14 LHDs, completed in September 2014.\n<\/p><p>Data cleaning and migration took more than 12 months, running in parallel with the build, user acceptance testing, and deployment stages. Identifying and addressing data migration issues was a key factor in the successful deployment of PHIMS. Various interim data systems were in place in LHDs and needed to be integrated during the data migration process. Most issues were noncritical and occurred after data migration or within the first three months after deployment. Most of the system issues (550\/566, 97%) had been resolved within 12 months of deployment.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"User_interface_and_reporting\">User interface and reporting<\/span><\/h3>\n<p>The user views a dashboard screen that contains a bulletin board for system announcements, website links, and real-time program adoption graphs for the <i>Munch & Move<\/i> and <i>LLW@S<\/i> programs. The system is made up of a hierarchy of forms (windows) that present information and enable the user to perform various actions.\n<\/p><p>PHIMS has the following key features that are used by HPOs to support local implementation of HCI programs:\n<\/p><p>1. <b>Contact management<\/b>: supports HPO workflows for managing interactions with sites (i.e., early childhood services and primary schools) and allows the user to record details in the one location. This has improved record keeping and retrieval and is a well-used function of the system with more than 54,000 entries.\n<\/p><p>2. <b>Capacity to record each site\u2019s training<\/b>: demonstrates program reach (i.e., sites formally trained in <i>Munch & Move<\/i> or <i>LLW@S<\/i>)\n<\/p><p>3. <b>Scheduled follow-up alerts and recording of program adoption<\/b>: alerts an HPO to when a site visit is due at one-, six- and 12-month intervals. During the visit, the HPO collects data on the program practices and enters it into the scheduled follow-up summary form (which links to the questionnaire [Select Survey] component). This data is then used for the performance monitoring reports. Other details from the visits are captured in the contact notes.\n<\/p><p>4. <b>Operational reporting<\/b>: allows HPOs to see which sites need additional support. Examples of reports are summaries of training entries, scheduled follow-ups, practice achievement, and program adoption over time.\n<\/p><p>PHIMS has the capability to support reporting at state level by monitoring program reach (e.g., number of sites trained), practice achievement, and program adoption.\n<\/p><p>The reports are generated in real time and allow the user to search and display data by specific criteria, for example, NSW, LHD, or local government area, or sites in disadvantaged areas.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Use_of_PHIMS\">Use of PHIMS<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"System_analytics\">System analytics<\/span><\/h3>\n<p>The system currently has 150 users across the 15 LHDs and the Ministry. They represent the workforce in each LHD and all state-level program and performance monitoring staff, and collectively are responsible for the monitoring and reporting of more than 6000 sites.\n<\/p><p>Educators and teachers have attended more than 24,000 training events (i.e., workshops, conferences, webinars).\n<\/p><p>There are 70 reports available for day-to-day operational use by LHDs and to monitor performance by the Ministry.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Monitoring_data\">Monitoring data<\/span><\/h3>\n<p>PHIMS has provided a mechanism for reporting changes in program implementation in targeted children\u2019s settings over time. Box 1 describes the reach and adoption of the <i>Munch & Move<\/i> and <i>LLW@S<\/i> programs.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>Box 1.<\/b> Example of how PHIMS data can be used for program monitoring and reporting\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">From 2008 to June 2015, 89% (3288\/3691) of early childhood services across NSW participated in <i>Munch & Move<\/i> training. Since state-wide monitoring started in 2012, there has been steady growth in the number of early childhood services that have adopted <i>Munch & Move<\/i>. Adoption of the program is reported with reference to the number of services achieving 70% (or more) of the practices that are relevant for that particular service. There has been a statistically significant increase in the proportion of early childhood services that have adopted the program. In 2012, the total for NSW was 36% and, by 2015, this increased to 78% (p < 0.001).<sup id=\"rdp-ebb-cite_ref-LockeridgeMunch15_5-1\" class=\"reference\"><a href=\"#cite_note-LockeridgeMunch15-5\">[5]<\/a><\/sup>\n<p>From 2008 to June 2015, 84% (2039\/2440) of primary schools across NSW participated in <i>LLW@S<\/i> training. The proportion of trained primary schools that have achieved the practices has significantly increased from 32% in 2012 to 77% in 2015 (p < 0.001).<sup id=\"rdp-ebb-cite_ref-BravoLive16_6-1\" class=\"reference\"><a href=\"#cite_note-BravoLive16-6\">[6]<\/a><\/sup>\n<\/p>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>PHIMS has delivered an innovative IT solution<sup id=\"rdp-ebb-cite_ref-IAwards2015NewSouth_9-0\" class=\"reference\"><a href=\"#cite_note-IAwards2015NewSouth-9\">[9]<\/a><\/sup> in a health promotion setting. Engaging in a best-practice \"agile\"<sup id=\"rdp-ebb-cite_ref-AgileMani_10-0\" class=\"reference\"><a href=\"#cite_note-AgileMani-10\">[10]<\/a><\/sup> and collaborative development process was a significant contributor to the effectiveness and high uptake of the system.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Challenges\">Challenges<\/span><\/h3>\n<p>There were several complexities experienced in developing and implementing PHIMS. For example, LHDs requested confidentiality around operational data. Developing <a href=\"https:\/\/www.limswiki.org\/index.php\/Corporate_Governance_of_ICT\" title=\"Corporate Governance of ICT\" class=\"wiki-link\" data-key=\"0f34555fc2417b2aec15121328fe2860\">data governance<\/a> and reporting protocols in response to the various needs of stakeholders was a key feature.\n<\/p><p>The decision to modify an off-the-shelf solution offered a value-for-money, sustainable, and flexible final product. The final centralized cost was close to AU$1 million.\n<\/p><p>Negotiating the local and state health IT environments to support important functionality in differing operating environments and to achieve single sign-on was a major undertaking.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Benefits\">Benefits<\/span><\/h3>\n<p>PHIMS appears to have multiple benefits. These include the ability to rapidly retrieve data for both operational and monitoring purposes. This has revolutionized the implementation of the HCI programs and is an important factor in the high degree of user acceptance and uptake. This high degree of acceptability contributes to the sustainability of the system.\n<\/p><p>Importantly, PHIMS has provided an opportunity for health promotion initiatives and staff to have a \"seat at the table\" with agents of authority (such as LHD chief executives and performance management executives) by providing real-time data on progress against performance measures. PHIMS also provides valuable performance monitoring and feedback to those who directly implement the program.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusion\">Conclusion<\/span><\/h2>\n<p>PHIMS was successfully codesigned to be a flexible, scalable, and sustainable IT solution that supports HCI program implementation and provides data for state-level monitoring and reporting against agreed LHD performance measures. PHIMS could potentially be expanded to include other health promotion programs such as food provision in health facilities and tobacco retailer compliance.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>We acknowledge the contributions of Liz King, Neil Orr, Louise Farrell, Bev Lloyd, Andy Bravo, Masela Draper, Deni Fukunishi, Andy Lui, Rita Lagaluga, Evan Freeman, Elena Ouspenskaia, Claudine Lyons, Kym Buffett, Rhonda Matthews, Project Advisory Board Members, Steering Committee Members, Transition Group and Reference Group Members and the Directors and staff of Local Health District Health Promotion Services across NSW.\n<\/p><p>This paper was developed as part of a program of research on monitoring health promotion practice within The Australian Prevention Partnership Centre. It was funded through the National Health and Medical Research Council Partnership Centre Grants Scheme (ID GNT 9100001) with the Australian Government Department of Health, the NSW Ministry of Health, ACT Health and the HCF Research Foundation.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Author_contributions\">Author contributions<\/span><\/h3>\n<p>AG and CI-H drafted the manuscript. CR, JM, AM, MW, LP, ST, NL and JW reviewed and contributed to editing the manuscript. All authors read, revised and approved the final manuscript. All authors contributed to the development and\/or implementation of PHIMS.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h3>\n<p>None declared.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Peer_review_and_provenance\">Peer review and provenance<\/span><\/h3>\n<p>Externally peer reviewed, commissioned.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-WHOReport16-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WHOReport16_1-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Commission on Ending Childhood Obesity (2016). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/apps.who.int\/iris\/bitstream\/handle\/10665\/204176\/9789241510066_eng.pdf;jsessionid=4EA7FECA778E7E3B8C7BA4420E1B2412?sequence=1\" data-key=\"fd2eeda0936a3073300b314f853d33f3\"><i>Report on the Commission on Ending Childhood Obesity<\/i><\/a>. World Health Organization. pp. 50. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9789241510066<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/apps.who.int\/iris\/bitstream\/handle\/10665\/204176\/9789241510066_eng.pdf;jsessionid=4EA7FECA778E7E3B8C7BA4420E1B2412?sequence=1\" data-key=\"fd2eeda0936a3073300b314f853d33f3\">http:\/\/apps.who.int\/iris\/bitstream\/handle\/10665\/204176\/9789241510066_eng.pdf;jsessionid=4EA7FECA778E7E3B8C7BA4420E1B2412?sequence=1<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 February 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Report+on+the+Commission+on+Ending+Childhood+Obesity&rft.aulast=Commission+on+Ending+Childhood+Obesity&rft.au=Commission+on+Ending+Childhood+Obesity&rft.date=2016&rft.pages=pp.%26nbsp%3B50&rft.pub=World+Health+Organization&rft.isbn=9789241510066&rft_id=http%3A%2F%2Fapps.who.int%2Firis%2Fbitstream%2Fhandle%2F10665%2F204176%2F9789241510066_eng.pdf%3Bjsessionid%3D4EA7FECA778E7E3B8C7BA4420E1B2412%3Fsequence%3D1&rfr_id=info:sid\/en.wikipedia.org:Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NSWOver18-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NSWOver18_2-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.healthstats.nsw.gov.au\/Indicator\/beh_bmikid_cat\" data-key=\"0cb469ffcba4797d9d5b42bb2bacfb9c\">\"Overweight and obesity in children aged 5\u201316 years, NSW 2007 to 2017\"<\/a>. <i>HealthStats NSW<\/i>. NSW Government. 08 May 2018<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.healthstats.nsw.gov.au\/Indicator\/beh_bmikid_cat\" data-key=\"0cb469ffcba4797d9d5b42bb2bacfb9c\">http:\/\/www.healthstats.nsw.gov.au\/Indicator\/beh_bmikid_cat<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 10 July 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Overweight+and+obesity+in+children+aged+5%E2%80%9316+years%2C+NSW+2007+to+2017&rft.atitle=HealthStats+NSW&rft.date=08+May+2018&rft.pub=NSW+Government&rft_id=http%3A%2F%2Fwww.healthstats.nsw.gov.au%2FIndicator%2Fbeh_bmikid_cat&rfr_id=info:sid\/en.wikipedia.org:Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Innes-HughesNSWHealthy17-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Innes-HughesNSWHealthy17_3-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Innes-Hughes, C.; Bravo, A. Buffett, K. et al. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.health.nsw.gov.au\/heal\/Publications\/HCI-report.pdf\" data-key=\"bff340c38128ee60d34934dc33f574b2\"><i>NSW Healthy Children Initiative: The first five years July 2011 \u2013 June 2016<\/i><\/a>. NSW Ministry of Health. pp. 43. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9781760007263<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.health.nsw.gov.au\/heal\/Publications\/HCI-report.pdf\" data-key=\"bff340c38128ee60d34934dc33f574b2\">https:\/\/www.health.nsw.gov.au\/heal\/Publications\/HCI-report.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 08 February 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=NSW+Healthy+Children+Initiative%3A+The+first+five+years+July+2011+%E2%80%93+June+2016&rft.aulast=Innes-Hughes%2C+C.%3B+Bravo%2C+A.+Buffett%2C+K.+et+al.&rft.au=Innes-Hughes%2C+C.%3B+Bravo%2C+A.+Buffett%2C+K.+et+al.&rft.date=2017&rft.pages=pp.%26nbsp%3B43&rft.pub=NSW+Ministry+of+Health&rft.isbn=9781760007263&rft_id=https%3A%2F%2Fwww.health.nsw.gov.au%2Fheal%2FPublications%2FHCI-report.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AGImplement12-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AGImplement12_4-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.federalfinancialrelations.gov.au\/content\/npa\/health\/_archive\/healthy_workers\/healthy_children\/NSW_IP_2013.pdf\" data-key=\"b35300ad07dac503c7cc14928302fa0e\">\"Implementation Plan for The Health Children Initiative\"<\/a> (PDF). Commonwealth of Australia. December 2012<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.federalfinancialrelations.gov.au\/content\/npa\/health\/_archive\/healthy_workers\/healthy_children\/NSW_IP_2013.pdf\" data-key=\"b35300ad07dac503c7cc14928302fa0e\">http:\/\/www.federalfinancialrelations.gov.au\/content\/npa\/health\/_archive\/healthy_workers\/healthy_children\/NSW_IP_2013.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 08 February 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Implementation+Plan+for+The+Health+Children+Initiative&rft.atitle=&rft.date=December+2012&rft.pub=Commonwealth+of+Australia&rft_id=http%3A%2F%2Fwww.federalfinancialrelations.gov.au%2Fcontent%2Fnpa%2Fhealth%2F_archive%2Fhealthy_workers%2Fhealthy_children%2FNSW_IP_2013.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LockeridgeMunch15-5\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-LockeridgeMunch15_5-0\">5.0<\/a><\/sup> <sup><a href=\"#cite_ref-LockeridgeMunch15_5-1\">5.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Lockeridge, A.; Innes-Hughes, C.; O'Hara, B.J. et al. (2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.health.nsw.gov.au\/heal\/Publications\/Munch-Move-Evaluation-Summary.pdf\" data-key=\"bc77260171f4d04a9bddced1107bdd31\">Munch & Move<i>: Evidence and Evaluation Summary<\/i><\/a>. NSW Ministry of Health. pp. 26. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9781760003029<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.health.nsw.gov.au\/heal\/Publications\/Munch-Move-Evaluation-Summary.pdf\" data-key=\"bc77260171f4d04a9bddced1107bdd31\">https:\/\/www.health.nsw.gov.au\/heal\/Publications\/Munch-Move-Evaluation-Summary.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 08 February 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=%27%27Munch+%26+Move%27%27%3A+Evidence+and+Evaluation+Summary&rft.aulast=Lockeridge%2C+A.%3B+Innes-Hughes%2C+C.%3B+O%27Hara%2C+B.J.+et+al.&rft.au=Lockeridge%2C+A.%3B+Innes-Hughes%2C+C.%3B+O%27Hara%2C+B.J.+et+al.&rft.date=2015&rft.pages=pp.%26nbsp%3B26&rft.pub=NSW+Ministry+of+Health&rft.isbn=9781760003029&rft_id=https%3A%2F%2Fwww.health.nsw.gov.au%2Fheal%2FPublications%2FMunch-Move-Evaluation-Summary.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BravoLive16-6\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BravoLive16_6-0\">6.0<\/a><\/sup> <sup><a href=\"#cite_ref-BravoLive16_6-1\">6.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Bravo, A.; Innes-Hughes, C.; O'Hara, B.J. et al. (2016). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.health.nsw.gov.au\/heal\/Publications\/Munch-Move-Evaluation-Summary.pdf\" data-key=\"bc77260171f4d04a9bddced1107bdd31\">Live Life Well @ School<i>: Evidence and Evaluation Summary 2008-2015<\/i><\/a>. NSW Ministry of Health. pp. 31. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 99781760004750<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.health.nsw.gov.au\/heal\/Publications\/Munch-Move-Evaluation-Summary.pdf\" data-key=\"bc77260171f4d04a9bddced1107bdd31\">https:\/\/www.health.nsw.gov.au\/heal\/Publications\/Munch-Move-Evaluation-Summary.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 08 February 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=%27%27Live+Life+Well+%40+School%27%27%3A+Evidence+and+Evaluation+Summary+2008-2015&rft.aulast=Bravo%2C+A.%3B+Innes-Hughes%2C+C.%3B+O%27Hara%2C+B.J.+et+al.&rft.au=Bravo%2C+A.%3B+Innes-Hughes%2C+C.%3B+O%27Hara%2C+B.J.+et+al.&rft.date=2016&rft.pages=pp.%26nbsp%3B31&rft.pub=NSW+Ministry+of+Health&rft.isbn=99781760004750&rft_id=https%3A%2F%2Fwww.health.nsw.gov.au%2Fheal%2FPublications%2FMunch-Move-Evaluation-Summary.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ConteDynamics17-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ConteDynamics17_7-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Conte, K.P.; Groen, S.; Loblay, V. et al. (2017). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5718021\" data-key=\"f8a5ac9d696dab201cdd9f5165708265\">\"Dynamics behind the scale up of evidence-based obesity prevention: protocol for a multi-site case study of an electronic implementation monitoring system in health promotion practice\"<\/a>. <i>Implementation Science<\/i> <b>12<\/b> (1): 146. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1186%2Fs13012-017-0686-5\" data-key=\"aa3bc403850c6b773f28e70d270a5240\">10.1186\/s13012-017-0686-5<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" data-key=\"c85bdffd69dd30e02024b9cc3d7679e2\">PMC<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5718021\/\" data-key=\"425ba8695ad5a2bb5059cf0a704d9f4d\">PMC5718021<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/29208000\" data-key=\"65d8560261452eb344271b95ec3f5d38\">29208000<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5718021\" data-key=\"f8a5ac9d696dab201cdd9f5165708265\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5718021<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dynamics+behind+the+scale+up+of+evidence-based+obesity+prevention%3A+protocol+for+a+multi-site+case+study+of+an+electronic+implementation+monitoring+system+in+health+promotion+practice&rft.jtitle=Implementation+Science&rft.aulast=Conte%2C+K.P.%3B+Groen%2C+S.%3B+Loblay%2C+V.+et+al.&rft.au=Conte%2C+K.P.%3B+Groen%2C+S.%3B+Loblay%2C+V.+et+al.&rft.date=2017&rft.volume=12&rft.issue=1&rft.pages=146&rft_id=info:doi\/10.1186%2Fs13012-017-0686-5&rft_id=info:pmc\/PMC5718021&rft_id=info:pmid\/29208000&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5718021&rfr_id=info:sid\/en.wikipedia.org:Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FarrellApplying14-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FarrellApplying14_8-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Farrell, L.; Lloyd, B.; Matthews, R. et al. (2014). \"Applying a performance monitoring framework to increase reach and adoption of children's healthy eating and physical activity programs\". <i>Public Health Research & Practice<\/i> <b>25<\/b> (1): e2511408. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.17061%2Fphrp2511408\" data-key=\"c08fa86d723129bebde8f0b59aee3758\">10.17061\/phrp2511408<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" data-key=\"1d34e999f13d8801964a6b3e9d7b4e30\">PMID<\/a> <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25828447\" data-key=\"fd9a25c8905c68f405e167b9d6fb9dff\">25828447<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Applying+a+performance+monitoring+framework+to+increase+reach+and+adoption+of+children%27s+healthy+eating+and+physical+activity+programs&rft.jtitle=Public+Health+Research+%26+Practice&rft.aulast=Farrell%2C+L.%3B+Lloyd%2C+B.%3B+Matthews%2C+R.+et+al.&rft.au=Farrell%2C+L.%3B+Lloyd%2C+B.%3B+Matthews%2C+R.+et+al.&rft.date=2014&rft.volume=25&rft.issue=1&rft.pages=e2511408&rft_id=info:doi\/10.17061%2Fphrp2511408&rft_id=info:pmid\/25828447&rfr_id=info:sid\/en.wikipedia.org:Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-IAwards2015NewSouth-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-IAwards2015NewSouth_9-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.iawards.com.au\/hidden-pages\/2015winners\/winners\/nsw\" data-key=\"7df73d9e793692b4d853f34ca0edf126\">\"2015 New South Wales iAwards Winners & Merit Recipients\"<\/a>. <i>iAwards<\/i>. Australian Information Industry Association. 2015<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.iawards.com.au\/hidden-pages\/2015winners\/winners\/nsw\" data-key=\"7df73d9e793692b4d853f34ca0edf126\">https:\/\/www.iawards.com.au\/hidden-pages\/2015winners\/winners\/nsw<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 08 February 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=2015+New+South+Wales+iAwards+Winners+%26+Merit+Recipients&rft.atitle=iAwards&rft.date=2015&rft.pub=Australian+Information+Industry+Association&rft_id=https%3A%2F%2Fwww.iawards.com.au%2Fhidden-pages%2F2015winners%2Fwinners%2Fnsw&rfr_id=info:sid\/en.wikipedia.org:Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AgileMani-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AgileMani_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Beck, K.; Beedle, M.; van Bennekum, A. et al. (2001). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/agilemanifesto.org\/\" data-key=\"6c409224d6fea45e4f0379356d757e7c\">\"Manifesto for Agile Software Development\"<\/a>. Ward Cunningham<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/agilemanifesto.org\/\" data-key=\"6c409224d6fea45e4f0379356d757e7c\">http:\/\/agilemanifesto.org\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 08 February 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Manifesto+for+Agile+Software+Development&rft.atitle=&rft.aulast=Beck%2C+K.%3B+Beedle%2C+M.%3B+van+Bennekum%2C+A.+et+al.&rft.au=Beck%2C+K.%3B+Beedle%2C+M.%3B+van+Bennekum%2C+A.+et+al.&rft.date=2001&rft.pub=Ward+Cunningham&rft_id=http%3A%2F%2Fagilemanifesto.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID and DOI when they were missing from the original reference.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185646\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.327 seconds\nReal time usage: 0.353 seconds\nPreprocessor visited node count: 8951\/1000000\nPreprocessor generated node count: 33028\/1000000\nPost\u2010expand include size: 73501\/2097152 bytes\nTemplate argument size: 26703\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 324.746 1 - -total\n 71.22% 231.292 1 - Template:Reflist\n 57.39% 186.362 10 - Template:Citation\/core\n 30.39% 98.697 4 - Template:Cite_book\n 23.34% 75.789 1 - Template:Infobox_journal_article\n 22.61% 73.416 1 - Template:Infobox\n 19.10% 62.022 4 - Template:Cite_web\n 14.60% 47.397 2 - Template:Cite_journal\n 13.93% 45.249 80 - Template:Infobox\/row\n 5.12% 16.640 9 - Template:Citation\/identifier\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10850-0!*!0!!en!5!* and timestamp 20190401185645 and revision id 34673\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs\">https:\/\/www.limswiki.org\/index.php\/Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","945e3454ada339aaa7a7668d339d588c_images":["https:\/\/www.limswiki.org\/images\/1\/1a\/Fig1_Green_PubHlthRsPract2018_28-3.jpg"],"945e3454ada339aaa7a7668d339d588c_timestamp":1554145005,"d400aae80e71d72278a98ceb5a2237dd_type":"article","d400aae80e71d72278a98ceb5a2237dd_title":"SCADA system testbed for cybersecurity research using machine learning approach (Teixeira et al. 2018)","d400aae80e71d72278a98ceb5a2237dd_url":"https:\/\/www.limswiki.org\/index.php\/Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach","d400aae80e71d72278a98ceb5a2237dd_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:SCADA system testbed for cybersecurity research using machine learning approach\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nSCADA system testbed for cybersecurity research using machine learning approachJournal\n \nFuture InternetAuthor(s)\n \nTeixeira, Marcio Andrey; Salman, Tara; Zolanvari, Maede;\r\nJain, Raj; Meskin, Nader; Samaka, MohammedAuthor affiliation(s)\n \nFederal Institute of Education, Science, and Technology of Sao Paulo,\r\nWashington University in Saint Louis, Qatar UniversityPrimary contact\n \nEmail: marcio dot andrey at ifsp dot edu dot brYear published\n \n2018Volume and issue\n \n10(8)Page(s)\n \n76DOI\n \n10.3390\/fi10080076ISSN\n \n1999-5903Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttps:\/\/www.mdpi.com\/1999-5903\/10\/8\/76\/htmDownload\n \nhttps:\/\/www.mdpi.com\/1999-5903\/10\/8\/76\/pdf (PDF)\n\n\n\n\n \n This article contains rendered mathematical formulae. You may require the TeX All the Things plugin for Chrome or the Native MathML add-on and fonts for Firefox if they don't render properly for you. \n\n\nContents\n\n1 Abstract \n2 Introduction \n3 Background \n\n3.1 ICS reference model \n3.2 The SCADA communication protocol \n3.3 Related works \n\n\n4 The SCADA system testbed \n5 Machine learning algorithms and performance measurements \n\n5.1 Machine learning algorithms \n5.2 Performance measurements \n\n\n6 Attack scenarios, features selection, and evaluation scenarios \n\n6.1 Attack scenarios \n6.2 Features selection \n6.3 Evaluation scenario \n\n\n7 Numerical results \n8 Conclusions \n9 Acknowledgements \n\n9.1 Author contributions \n9.2 Funding \n9.3 Conflicts of interest \n\n\n10 References \n11 Notes \n\n\n\nAbstract \nThis paper presents the development of a supervisory control and data acquisition (SCADA) system testbed used for cybersecurity research. The testbed consists of a water storage tank\u2019s control system, which is a stage in the process of water treatment and distribution. Sophisticated cyber-attacks were conducted against the testbed. During the attacks, the network traffic was captured, and features were extracted from the traffic to build a dataset for training and testing different machine learning algorithms. Five traditional machine learning algorithms were trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Na\u00efve Bayes, and KNN. Then, the trained machine learning models were built and deployed in the network, where new tests were made using online network traffic. The performance obtained during the training and testing of the machine learning models was compared to the performance obtained during the online deployment of these models in the network. The results show the efficiency of the machine learning models in detecting the attacks in real time. The testbed provides a good understanding of the effects and consequences of attacks on real SCADA environments.\nKeywords: cybersecurity, machine learning, SCADA system, network security\n\nIntroduction \nSupervisory control and data acquisition (SCADA) systems are industrial control systems (ICS) widely used by industries to monitor and control different processes such as oil and gas pipelines, water distribution systems, electrical power grids, etc. These systems provide automated control and remote monitoring of services being used in daily life. For example, state and municipal governments use SCADA systems to monitor and regulate water levels in reservoirs, pipe pressure, and water distribution.\nA typical SCADA system includes components like computer workstations, a human-machine interface (HMI), programmable logic controllers (PLCs), sensors, and actuators.[1] Historically, these systems had private and dedicated networks. However, due to the wide-range deployment of remote management, open IP networks (e.g., the internet) are now used for SCADA system communication.[2] This exposes SCADA systems to the cyberspace and makes them vulnerable to cyber-attacks using the internet.\nMachine learning (ML) and artificial intelligence techniques have been widely used to build intelligent and efficient intrusion detection systems (IDS) dedicated to ICS. However, researchers generally develop and train their ML-based security system using network traces obtained from publicly available datasets. Due to malware evolution and changes in attack strategies, these datasets fail to protect the system from new types of attacks, and consequently, the benchmark datasets should be updated periodically.\nThis paper presents the deployment of a SCADA system testbed for cybersecurity research and investigates the feasibility of using ML algorithms to detect cyber-attacks in real time. The testbed was built using equipment deployed in real industrial settings. Sophisticated attacks were conducted on the testbed to develop a better understanding of the attacks and their consequences in SCADA environments. The network traffic was captured, including both abnormal and normal traffic. The behavior of both types of traffic (abnormal and normal) was analyzed, and features were extracted to build a new SCADA-IDS dataset. This dataset was then used for training and testing ML models, which were further deployed in the network. The performance of the ML model depends highly on the available datasets. One of the main contributions of this paper is building a new dataset updated with recent and more sophisticated attacks. We argue that IDS using ML models trained with a dataset generated at the process control level could be more efficient, less complicated, and more cost-effective as compared to traditional protection techniques. Five traditional machine learning algorithms were trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Na\u00efve Bayes, and KNN. Once trained and tested, the ML models were deployed in the network, where real network traffic was used to analyze the effectiveness and efficiency of the ML models in a real-time environment. We compared the performance obtained during the training and test phase of the ML models with the performance obtained during the online deployment of these models in the network. The online deployment is another contribution of this paper since most of the published papers present the performance of the ML models obtained during the training and test phases. We conducted this research to build an IDS software based on ML models to be deployed in ICS\/SCADA systems.\nThe remainder of this paper is organized as follows. The next section presents a brief background of the ICS-SCADA system reference model and related works. Afterwards, we describe the developed SCADA system testbed, and then we describe the ML algorithms and the performance measurements used in this work. The last three sections show conducted attack scenarios and the main features of the dataset used to train the algorithms, the results and the interoperations behind them, and a summary of the main points and outcomes.\n\nBackground \nIn this section, we briefly present a description of the ICS-SCADA reference model and some related works in the domain of ML algorithms for SCADA system security.\n\nICS reference model \n\"ICS\" is a general term that covers numerous control systems, including SCADA systems, distributed control systems, and other control system configurations.[3] An ICS consists of combinations of control components (e.g., electrical, mechanical, hydraulic, pneumatic) that are used to achieve various industrial objectives (e.g., manufacturing, transportation of matter or energy). Figure 1 shows an example of an ICS reference model.[4]\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 1 Industrial control systems (ICS) reference model[4]\n\n\n\nAs can be seen from Figure 1, the ICS model is divided into four levels, from 3 to 0. Level 3 (the corporate network) consists of traditional information technology, including the general deployment of services and systems, such as file transfer, websites, mail servers, resource planning, and office automation systems. Level 2 (the supervisory control local area network) includes the functions involved in monitoring and controlling the physical processes and the general deployment of systems such as HMIs, engineering workstations, and history logs. Level 1 (the control network) includes the functions involved in sensing and manipulating physical processes, e.g., receiving the information, processing the data, and triggering outputs, which are all done in PLCs. Level 0 (the I\/O network) consists of devices (sensors\/actuators) that are directly connected to the physical process.\nAs shown in Figure 1, Level 3 is composed of the traditional IT infrastructure system (internet access service, file transfer protocol server, virtual private network (VPN) remote access, etc.). Levels 2, 1, and 0 represent a typical SCADA system, which is composed of the following components:\n\n HMI: Used to observe the status of the system or to adjust the system parameters for processes control and management purposes\n Engineering workstation: Used by engineers for programming the control functions of the HMI\n History logs: Used to collect the data in real-time from the automation processes for current or later analysis\n PLCs: Slave stations in the SCADA architecture that are connected to sensors or actuators\nThe SCADA communication protocol \nThere are several communication protocols developed for use in SCADA systems. These protocols define the standard message format for all inter-device communications in the network. One popular protocol, which is widely used in SCADA system environments, is the Modbus protocol.[5] Modbus is an application-layer messaging protocol that provides the client\/server communications between devices connected to an Ethernet network and offers services specified by function codes. The function codes tell the server what action to take. For example, a client can read the status of the discrete outputs or the values of digital inputs from the PLC; or it can read\/write the data contents of a group of registers inside the PLC. Figure 2 illustrates an example of Modbus client\/server communication.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 2 Modbus client\/server communication example\n\n\n\nThe Modbus register address type consists of four data reference types[5][6] which are summarized in Table 1. The \u201cxxxx\u201d following a leading digit represents a four-digit address location in the user data memory.\n\n\n\n\n\n\n\nTable 1. Data reference types[6][7]\n\n\n\nReference\n\nRange\n\nDescription\n\n\n0xxxx\n\n00001\u201309999\n\nRead\/Write Discrete Outputs or Coils\n\n\n1xxxx\n\n10001\u201319999\n\nRead Discrete Inputs\n\n\n3xxxx\n\n30001\u201339999\n\nRead Input Registers\n\n\n4xxxx\n\n40001\u201349999\n\nRead\/Write-Output or Holding Registers\n\n\n\nRelated works \nCyber-attacks are continuously evolving and changing behavior to bypass security mechanisms. Thus, the utilization of advanced security mechanisms is essential to identify and prevent new attacks. In this sense, the development of real testbeds advances the research in this area.\nMorris et al.[7] describe four datasets to be used for cybersecurity research. The datasets include network traffic, process control, and process measurement features from a set of attacks against testbeds which use Modbus application layer protocol. The authors argue there are several datasets developed to train and validate IDS associated with traditional information technology systems, but in the SCADA security area there is a lack of availability and access to SCADA network traffic. In our work, a new dataset with new types of attacks was created. So, once our dataset is available, we are providing a resource that could be used by researchers to train, validate, and compare their results with other datasets.\nIn order to investigate the security of the Modbus\/TCP protocol, Miciolino et al.[8] explored a complex cyber-physical testbed, conceived for the control and monitoring of a water system. The analysis of the experimental results highlights the critical characteristics of the Modbus\/TCP as a popular communication protocol in ICS environments. They concluded that by obtaining sufficient knowledge of the system, an attacker is able to change the commands of the actuators or the sensor readings in order to achieve its malicious objectives. Obtaining knowledge of the system is the first step in attacking a system. This attack is also known as a reconnaissance attack. Hence, in our work, our ML models are trained to recognize this kind of attack.\nRosa et al.[9] describe some practical cyber-attacks using an electricity grid testbed. This testbed consists of a hybrid environment of SCADA assets (e.g., PLCs, HMIs, process control servers) controlling an emulated power grid. The work explains their attacks and discusses some of the challenges faced by an attacker in implementing them. One of the attacks is the reconnaissance network attack. The authors argue that this kind of attack can be used not only to discover devices and types of services but also to perform fingerprinting and discover PLCs behind the gateways. Hence, in our work, advanced reconnaissance attacks were carried out, and ML algorithms were used to detect them.\nKeliris et al.[10] developed a process-aware supervised learning defense strategy that considers the operational behavior of an ICS to detect attacks in real-time. They used a benchmark chemical process and considered several categories of attack vectors on their hardware controllers. They used their trained SVM model to detect abnormalities in real-time and to distinguish between disturbances and malicious behavior as well. In our work, we used five ML algorithms to identify the abnormal behavior in real-time and evaluated their detection performance.\nTomin et al.[11] presented a semi-automated method for online security assessment using ML techniques. They outline their experience obtained at the Melentiev Energy Systems Institute, Russia in developing ML-based approaches for detecting potentially dangerous states in power systems. Multiple ML algorithms were trained offline using a resampling cross-validation method. Then, the best model among the ML algorithms was selected based on performance and was used online. They argue that the use of ML techniques provides reliable and robust solutions that can resolve the challenges in planning and operating future industrial systems with an acceptable level of security.\nCherdantseva et al.[12] reviewed the state of the art in cybersecurity risk assessment of SCADA systems. This review indicates that despite the popularity of the machine learning techniques, research groups in ICS security have reported a lack of standard datasets for training and testing machine learning algorithms. The lack of standard datasets has resulted in an inability to develop robust ML models to detect the anomalies in ICS. Using the testbed proposed in this paper, we built a new dataset for training and testing ML algorithms.\n\nThe SCADA system testbed \nIn this section, we describe the configuration of our SCADA system testbed for cybersecurity research.\nThe purpose of our testbed is to emulate real-world industrial systems as closely as possible without replicating an entire plant or assembly system.[13] The utilization of a testbed allows us to carry out real cyber-attacks. Our testbed is dedicated to controlling a water storage tank, which is a part of the process of water treatment and distribution. The components used in our testbed are commonly used in real SCADA environments. Figure 3 shows the SCADA testbed framework for our targeted application and Table 2 shows a brief description of the equipment used to build the testbed.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 3 The testbed framework\n\n\n\n\n\n\n\n\n\nTable 2. Description of the devices used in the testbed\n\n\n\nDevice\n\nDescription\n\n\nOn button\n\nTurns on the level control process of the water storage tank\n\n\nOff button\n\nTurns off the level control process of the water storage tank\n\n\nLight indicator\n\nIndicates whether the system is on or off\n\n\nLevel sensor 1 (LS1)\n\nMonitors the maximum water level in the tank; when the water reaches the maximum level, the sensor sends a signal to PLC\n\n\nLevel sensor 2 (LS2)\n\nMonitors the maximum water level in the tank; when the water reaches the minimum level, the sensor sends a signal to PLC\n\n\nValve\n\nControls the water level in the tank. When the water reaches the maximum level, the valve opens, and when the water reaches the minimal level, the valve closes. This logic is implemented in PLC using the ladder language.\n\n\nWater pump 1\n\nFills up the water tank\n\n\nWater pump 2\n\nDraws water from the tank when the valve is open\n\n\nPLC\n\nControls the physical process. The logic of the water control system is in PLC, which receives signals from the input devices (buttons, sensors), executes the program, and sends signals to the output devices (water pumps and valve).\n\n\nHMI\n\nUsed by the administrator to monitor and control the water storage system in real-time. The administrator can also display the devices\u2019 state and interact with the system through this interface.\n\n\nData history\n\nUsed to store logs and events of the SCADA system.\n\n\n\nAs shown in Figure 3, the storage tank has two level sensors: Level Sensor 1 (LS1) and Level Sensor 2 (LS2) that monitor the water level in the tank. When the water reaches the maximum level defined in the system, the LS1 sends a signal to the PLC. The PLC turns off Water Pump 1 used to fill up the tank, opens the valve, and turns on Water Pump 2 to draw the water from the tank. When the water reaches the minimal level defined in the system, LS2 sends a signal to the PLC, which closes the valve, turns off Water Pump 2, and turns on Water Pump 1 to fill up the tank. This process starts over when the water level reaches LS1. The SCADA system gets data from the PLC using the Modbus communication protocol and displays them to the system operator through the HMI interface.\nThere are other ICS protocols which could be used instead of Modbus in our testbed. For example, DNP3 is an ICS protocol that provides some security mechanisms.[14][15] However, in recent research, Li et al.[16] reported that they found 17,546 devices connected to the internet using the Modbus protocol spread all over the world. They did not count the amount of equipment not directly connected to the internet. Although there are other ICS protocols, many industries still use SCADA systems with Modbus protocol because their equipment does not support other protocols. In this case, solutions to detect attacks can be cheaper than other solutions, for example, changing the devices.\nPLC Schneider model M241CE40[17] is used in our testbed to control the process of the water storage tank. The logic programming of the PLC is done using the LADDER programming language[18](not covered in this paper). The sensors described in Table 2 are connected to the digital inputs of the PLC. The pumps and valves are connected to the output of the PLC.\n\nMachine learning algorithms and performance measurements \nIn this section, we describe the ML algorithms used in our work as well as the measurements used to evaluate their performances.\n\nMachine learning algorithms \nThe ML algorithms can be classified as supervised, unsupervised, and semi-supervised. Each class has its own characteristics and applicability. The discussion of all algorithms is beyond the scope of this paper. However, we refer the reader to Mantere et al.[19] and Ng and Jordan[20] for detailed technical discussions of these algorithms. In this paper, we use traditional ML algorithms to detect the attacks. Our target is to build supervised machine learning models, and we chose the followings algorithms for attack detection and classification:\n\n Logistic Regression[20]\n Random Forest[21]\n Na\u00efve Bayes[22]\n Support Vector Machine (SVM)[23]\n KNN[24]\nThe performance of these algorithms is discussed in the \"Numerical results\" section.\n\nPerformance measurements \nTraditionally, the performance of ML algorithms is measured by metrics which are derived from the confusion matrix.[25] Table 3 shows the confusion matrix in the IDS context.\n\n\n\n\n\n\n\nTable 3. Confusion matrix in the intrusion detection system (IDS) context\n\n\n\nData class\n\nClassified as normal\n\nClassified as abnormal\n\n\nNormal\n\nTrue negative (TN)\n\nFalse negative (FN)\n\n\nAbnormal\n\nFalse positive (FP)\n\nTrue positive (TP)\n\n\n\nIn the IDS context, the following parameters are used to create the confusion matrix:\n\n TN: Represents the number of normal flows correctly classified as normal (e.g., normal traffic);\n TP: Represents the number of abnormal flows (attacks) correctly classified as abnormal (e.g., attack traffic);\n FP: Represents the number of normal flows incorrectly classified as abnormal; and\n FN: Represents the number of abnormal flows incorrectly classified as normal.\nNext, we present several evaluation metrics and their respective formulas which are derived from the confusion matrix parameters:\n\n Accuracy: The percentage of correctly predicted flows considering the total number of predictions:\r\n\n \n \n \n \n  \n \n A\n c\n c\n u\n r\n a\n c\n y\n \n  \n %\n =\n \n \n \n \n T\n P\n \n +\n \n T\n N\n \n \n \n \n T\n P\n \n +\n \n T\n N\n \n +\n \n F\n P\n \n +\n \n F\n N\n \n \n \n \n ×\n 100\n \n \n \n {\\displaystyle {\\ {Accuracy}\\ \\%={\\frac {{TP}+{TN}}{{TP}+{TN}+{FP}+{FN}}}\\times 100}}\n \n \n False Alarm Rate (FAR): This represents the percentage of the normal flows misclassified as abnormal flows (attack) by the model: \r\n\n \n \n \n \n  \n \n F\n A\n R\n \n  \n %\n =\n \n \n \n F\n P\n \n \n \n F\n P\n \n +\n  \n \n T\n N\n \n \n \n \n  \n ×\n 100\n \n \n \n {\\displaystyle {\\ {FAR}\\ \\%={\\frac {FP}{{FP}+\\ {TN}}}\\ \\times 100}}\n \n \n Un-Detection Rate (UND): The fraction of the abnormal flows (attack) which are misclassified as normal flows by the model:\r\n\n \n \n \n  \n \n U\n N\n D\n \n  \n %\n =\n \n \n \n F\n N\n \n \n \n F\n N\n \n +\n \n T\n P\n \n \n \n \n  \n ×\n 100\n \n \n {\\displaystyle \\ {UND}\\ \\%={\\frac {FN}{{FN}+{TP}}}\\ \\times 100}\n \n \nAccuracy (as shown in the first equation) is the most frequently used metric for evaluating the performance of learning models in classification problems. However, this metric is not very reliable for evaluating the ML performance in scenarios with imbalanced classes.[26] In this case, one class is dominant in number, and it has more samples relatively compared to another class. For example, in IDS scenarios, the proportion of normal flows to attack flows is very high in any realistic dataset. That is, the number of samples in the dataset which represent the normal flows is enormous compared to the number of samples which represents the attack flows. This problem is prevalent in scenarios where anomaly detection is crucial, such as fraudulent transactions in banks, identification of rare diseases, and in the identification of cyber-attacks in critical infrastructure. New metrics have been developed to avoid a biased analysis.[27] So, in addition to the accuracy, we also used the FAR and UND metrics.\n\nAttack scenarios, features selection, and evaluation scenarios \nIn this section, we describe the attacks carried out in our testbed and the features used to build our dataset. This dataset was used for training and testing the ML algorithms, as described in the following section on numerical results.\n\nAttack scenarios \nNetwork attacks on SCADA systems can be divided into three categories: reconnaissance, command injection, and denial of service (DoS).[7] Our focus in this paper is on the reconnaissance attacks where the network is scanned for possible vulnerabilities to be used for later attacks. A reconnaissance attack is the first stage of any attack on a networking system. In this stage, hackers use scan tools to inspect the topology of the victim network and identify the devices in the network as well as their vulnerabilities. Figure 4 shows our testbed attack scenario where the dashed rectangles highlight the vulnerable spots and possible attack targets in the system.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 4 Attack scenario\n\n\n\nSome reconnaissance attacks can be easily detected. For example, there are scanning tools which send a large number of packets per second under Modbus\/TCP to the targeted device and wait for acknowledgment of the packets from them. If a response is received, the host (i.e., the device) is active. This attack generates a considerable variation in the traffic behavior which can be easily detected by the traditional IDS or even the traditional firewall or rule-based mechanisms. Figure 5 shows an example of the traffic behavior when a scanning tool was used in our testbed.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 5 Network traffic behavior under easy to detect attacks\n\n\n\nOn the other hand, there are some sophisticated reconnaissance attacks which are more difficult to detect. For example, some exploits can be used to map the network, which results in an attack behavior very similar to normal traffic. Figure 6 illustrates the network traffic behavior during such exploit attacks. As can be seen, the change in the traffic behavior is negligible under the attack. Thus, it is difficult to detect the attack. The use of rule-based mechanisms would fail because the signature of the Modbus and TCP traffic do not change, and the language used to express the detection rules may not be expressive enough. On the other hand, the use of ML can improve the detection rate as ML algorithms can be trained to detect these attack scenarios.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 6 Network traffic behavior under difficult to detect attacks\n\n\n\nWe conducted the following reconnaissance and exploit attacks specific to the ICS environment described in Table 4. Details of the commands used to perform the attacks can be found in works by Calderon[28] and Mnemon et al.[29] During the attacks, the network traffic was captured to be analyzed. We used Wireshark[30] and Argus[31] to analyze the captured traffic. The captured traffic included unencrypted control information of the devices (valve, pumps, sensors) as well as information regarding their type (function codes, type of data). Table 5 presents statistical information about the captured traffic.\n\n\n\n\n\n\n\nTable 4. Reconnaissance attacks carried out against our testbed[29][30]\n\n\n\nAttack name\n\nAttack description\n\n\nPort scanner[29]\n\nThis attack is used to identify common SCADA protocols on the network. Using Nmap tool, packets are sent to the target at intervals which vary from one to three seconds. The TCP connection is not fully established so that the attack is difficult to detect by rules.\n\n\nAddress scan attack[29]\n\nThis attack is used to scan network addresses and identify the Modbus server address. Each system has only one Modbus server, and disabling this device would collapse the whole SCADA system. Thus, this attack tries to find the unique address of the Modbus server so that it can be used for further attacks.\n\n\nDevice identification attack[29]\n\nThis attack is used to enumerate the SCADA Modbus slave IDs on the network and to collect additional information such as vendor and firmware from the first slave ID found.\n\n\nDevice identification attack (aggressive mode)[29]\n\nThis attack is similar to the previous attack. However, the scanning uses an aggressive mode, which means that the additional information about all slave IDs found in the system is collected.\n\n\nExploit[30]\n\nExploit is used to read the coil values of the SCADA devices. The coils represent the ON\/OFF status of the devices controlled by the PLC, such as motors, valves, and sensors.[29]\n\n\n\n\n\n\n\n\n\nTable 5. Statistical information on the captured traffic\n\n\n\nMeasurement\n\nValue\n\n\nDuration of capture (h)\n\n25\n\n\nDataset length (GB)\n\n1.27\n\n\nNumber of observations\n\n7,049,989\n\n\nAverage data rate (kbit\/s)\n\n419\n\n\nAverage packet size (bytes)\n\n76.75\n\n\nPercentage of scanner attack\n\n3 \u00d7 10\u22124\n\n\nPercentage of address scan attack\n\n75 \u00d7 10\u22124\n\n\nPercentage of device identification attack\n\n1 \u00d7 10\u22124\n\n\nPercentage of device identification attack (aggressive mode)\n\n4.93\n\n\nPercentage of exploit attack\n\n1.13\n\n\nPercentage of all attacks (total)\n\n6.07\n\n\nPercentage of normal traffic\n\n93.93\n\n\n\nFeatures selection \nOnce the network traffic is captured, the next step is to select potential features which can distinguish the anomalous traffic from the normal traffic. Mantere et al.[19] selected 12 useful features for ML-based network security monitoring in the ICS networks. Some of those researchers later went on to further study those potential features.[32] In our work, we analyzed the variation of the features during the normal and attack traffic, and we analyzed those features that did not vary during the normal and attack traffic. Based on these prior works and our studies, Table 6 shows the features selected for our dataset.\n\n\n\n\n\n\n\nTable 6. Features selected to create the dataset\n\n\n\nFeatures\n\nDescriptions\n\n\nTotal Packets (TotPkts)\n\nTotal transaction packet count\n\n\nTotal Bytes (TotBytes)\n\nTotal transaction bytes\n\n\nSource packets (SrcPkts)\n\nSource\/Destination packet count\n\n\nDestination Packets (DstPkts)\n\nDestination\/Source packet count\n\n\nSource Bytes (SrcBytes)\n\nSource\/Destination transaction bytes\n\n\nSource Port (Sport)\n\nPort number of the source\n\n\n\nEvaluation scenario \nAfter defining the dataset, the features were extracted as discussed in the previous subsection. Then, the data was labeled either as normal traffic or attack traffic. Following that, the dataset was split into training and test datasets. The training dataset was composed of 80 percent of the total data, and it was used to train our ML model. The test dataset consists of the remaining 20 percent of the data, and it was used to evaluate the performance of our trained ML model. We call this training and test phase \u201coffline evaluation\u201d because the ML models were trained and tested offline. Figure 7 shows our evaluation scenario.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 7 Model evaluation\n\n\n\nAfter training and testing, the trained ML models were created and deployed in the network. Then, their performance was analyzed using real network traffic. This phase was called \u201conline evaluation.\u201d We compared the results obtained from the two phases (offline and online). This is described next.\n\nNumerical results \nIn this section, we present the numerical results of the attacks described previously. Figure 8 shows the results for the accuracy of the ML algorithms that were used.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 8 Accuracy results\n\n\n\nAs shown previously, the accuracy represents the total number of correct predictions divided by the total number of samples. Considering the offline evaluations, Figure 8 shows Decision Tree and KNN have the best accuracy (100%) compared to other ML models. However, the difference in the accuracy is small among all trained models. In other words, all chosen ML algorithms performed well in terms of accuracy during the offline phase. During the online phase, Decision Tree, Random Forest, Na\u00efve Bayes and Logistic Regression show a small difference, hence, the performance of these algorithms in both phases (offline and online), are similar. The same does not apply to the KNN model. There was a significant difference between the online and offline phase, which indicates that in practice the KNN does not provide good accuracy.\nAs shown in Table 5, our dataset is unbalanced. Therefore, accuracy is not the ideal measure to evaluate performance.[33] Other metrics are needed to compare the performance of the ML algorithms. Figure 9 shows the false alarm rate (FAR) results. The FAR metric is the percentage of the regular traffic which has been misclassified as anomalous by the model.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 9 False alarm rate results\n\n\n\nRegarding the offline and online evaluations, as shown in Figure 9, the Random Forest and Decision Tree models performed best, followed by the KNN model. These three models had the lowest false alarm percentages, followed by Logistical Regression and Na\u00efve Bayes. These lowest percentages mean that Random Forest, Decision Tree, and KNN perform better in detecting normal traffic. In our dataset, normal traffic is the dominant traffic; therefore, it is expected to have a low FAR value. This low FAR value could be due to the model\u2019s bias toward estimating the normal traffic perfectly, which is common in unbalanced datasets. Further, the clustering done in the Random Forest, Decision Tree, and KNN models can be helpful, especially when dealing with two types of data having different network features.\nFigure 10 shows the results of the un-detection rate metric. The UND (as shown in the third equation, prior) represents the percentage of the traffic which is an anomaly but is misclassified as normal (the opposite of the FAR). The traffic represented by this metric is more critical than the traffic represented by the FAR metric because, in this case, an attack can happen without being detected. Further, in our unbalanced dataset, the models are biased toward normal traffic, and this metric would show how biased the models are.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 10 Un-detected rate results\n\n\n\nAs shown in Figure 10, considering the offline performance results, the percentage of the UND is small for the Na\u00efve Bayes, Logistic Regression, and KNN models, and zero for the Decision Tree and Random Forest models. That is, all algorithm show excellent performance on this critical metric. However, considering the online performances, the KNN model had the worst performance, which was markedly different from the offline evaluation. The same did not happen to the other models, and their online performances are very close to their offline performance. This excellent performance shows that the features selected in this work are also very good as they were able to detect attacks even in an unbalanced dataset.\n\nConclusions \nThis paper presents the development of a SCADA system testbed to be used in cybersecurity research. The testbed was dedicated to controlling a water storage tank, which is one of several stages in the process of water treatment and distribution. The testbed was used to analyze the effects of the attacks on SCADA systems. Using the network traffic, a new dataset was developed for use by researchers to train machine learning algorithms as well as to validate and compare their results with other available datasets.\nFive reconnaissance attacks specific to the ICS environment were conducted against the testbed. During the attacks, the network traffic with information about the devices (valves, pumps, sensors) was captured. Using Wireshark and Argus network tools, features were extracted to build a dataset for training and testing machine learning algorithms.\nOnce the dataset was generated, five traditional machine learning algorithms were used to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Na\u00efve Bayes, and KNN. These algorithms were evaluated in two phases: during the training and testing of the machine learning models (offline), and during the deployment of these models in the network (online). The performance obtained during the online phase was compared to the performance obtained during the offline phase.\nThree metrics were used to evaluate the performance of the used algorithms: accuracy, FAR, and UND. Regarding the accuracy metric, in the offline phase, all ML algorithms showed an excellent performance. In the online phase, almost all the algorithms performed very close to the offline results. The KNN algorithm was the only one which did not perform well. Moreover, considering an unbalanced dataset and analyzing the FAR and UND metrics, we concluded that Random Forest and Decision Tree models performed best in both phases compared to the other models.\nThe results show the feasibility of detecting reconnaissance attacks in ICS environments. Our future plans include generating more attacks and checking the models\u2019 feasibility and performance in different environments. Moreover, experiments using unsupervised algorithms will be done.\n\nAcknowledgements \nThe statements made herein are solely the responsibility of the authors. The authors would like to thank the Instituto Federal de Educa\u00e7\u00e3o, Ci\u00eancia e Tecnologia de S\u00e3o Paulo (IFSP), Washington University in Saint Louis, and Qatar University.\n\nAuthor contributions \nM.A.T. built the testbed and performed the experiments. T.S. and M.Z. assisted with revisions and improvements. The work was done under the supervision and guidance of R.J., N.M. and M.S., who also formulated the problem.\n\nFunding \nThis work has been supported under the grant ID NPRP 10-901-2-370 funded by the Qatar National Research Fund (QNRF) and grant #2017\/01055-4 S\u00e3o Paulo Research Foundation (FAPESP).\n\nConflicts of interest \nThe authors declare no conflicts of interest.\n\nReferences \n\n\n\u2191 Arag\u00f3, A.S.; Mart\u00ednez, E.R.; Clares, S.S. (2014). \"SCADA Laboratory and Test-bed as a Service for Critical Infrastructure Protection\". Proceedings of the 2nd International Symposium on ICS & SCADA Cyber Security Research 2014: 25\u20139. doi:10.14236\/ewic\/ics-csr2014.4.   \n\n\u2191 Communication Technologies, Inc. (October 2004). \"Supervisory Control and Data Acquisition (SCADA) Systems\" (PDF). Technical Information Bulletin 04-1. National Communications System. https:\/\/www.cedengineering.com\/userfiles\/SCADA%20Systems.pdf . Retrieved 08 August 2018 .   \n\n\u2191 Filkins, B. (02 February 2016). \"IT Security Spending Trends\". SANS Analyst Papers. SANS Institute. https:\/\/www.sans.org\/reading-room\/whitepapers\/analyst\/membership\/36697 . Retrieved 05 June 2018 .   \n\n\u2191 4.0 4.1 Stouffer, K.; Pilitteri, V.; Lightman, S. et al. (May 2015). \"Guide to Industrial Control Systems (ICS) Security\" (PDF). NIST Special Publication 800-82 Revision 2. National Institute of Standards and Technology. doi:10.6028\/NIST.SP.800-82r2. https:\/\/nvlpubs.nist.gov\/nistpubs\/SpecialPublications\/NIST.SP.800-82r2.pdf . Retrieved 05 June 2018 .   \n\n\u2191 5.0 5.1 \"Modbus Technical Resources\". Modbus Organization, Inc. http:\/\/www.modbus.org\/tech.php . Retrieved 05 December 2017 .   \n\n\u2191 6.0 6.1 \"Modbus Application Protocol Specification V1.1b3\" (PDF). Modbus Organization, Inc. 26 April 2012. http:\/\/www.modbus.org\/docs\/Modbus_Application_Protocol_V1_1b3.pdf . Retrieved 08 August 2018 .   \n\n\u2191 7.0 7.1 7.2 Morris, T.; Wei, G. (2014). \"Industrial Control System Traffic Data Sets for Intrusion Detection Research\". Proceedings from the International Conference on Critical Infrastructure Protection VIII: 65\u201378. doi:10.1007\/978-3-662-45355-1_5.   \n\n\u2191 Miciolino, E.E.; Bernieri, G; Pascucci, F.; Setola, R. (2015). \"Communications network analysis in a SCADA system testbed under cyber-attacks\". Proceedings of the 23rd Telecommunications Forum TELFOR: 341-344. doi:10.1109\/TELFOR.2015.7377479.   \n\n\u2191 Rosa, L.; Cruz, T.; Sim\u00f5es, P. et al. (2017). \"Attacking SCADA systems: A practical perspective\". IFIP\/IEEE Symposium on Integrated Network and Service Management: 741-746. doi:10.23919\/INM.2017.7987369.   \n\n\u2191 Keliris, A.; Salehghaffari, H.; Cairl, B. et al. (2016). \"Machine learning-based defense against process-aware attacks on Industrial Control Systems\". Proceedings from the 2016 IEEE International Test Conference: 1-10. doi:10.1109\/TEST.2016.7805855.   \n\n\u2191 Tomin, N.V.; Kurbatsky, V.G.; Sidorov, D.N. et al. (2016). \"Machine Learning Techniques for Power System Security Assessment\". IFAC-PapersOnLine 49 (27): 445\u201350. doi:10.1016\/j.ifacol.2016.10.773.   \n\n\u2191 Cherdantseva, Y.; Burnap, P.; Blyth, A. et al. (2016). \"A review of cyber security risk assessment methods for SCADA systems\". Computers & Security 56: 1\u201327. doi:10.1016\/j.cose.2015.09.009.   \n\n\u2191 Candell, R.; Zimmerman, T.; Stouffer, K. (November 2015). \"An Industrial Control System Cybersecurity Performance Testbed\" (PDF). NISTIR 80089. National Institute of Standards and Technology. doi:10.6028\/NIST.IR.8089. https:\/\/nvlpubs.nist.gov\/nistpubs\/ir\/2015\/NIST.IR.8089.pdf . Retrieved 03 June 2018 .   \n\n\u2191 \"Overview of the DNP3 Protocol\". DNP User Group. https:\/\/www.dnp.org\/Pages\/AboutDefault.aspx . Retrieved 03 June 2018 .   \n\n\u2191 Darwish, I.; Igbe, O.; Saadawi. et al. (2015). \"Experimental and theoretical modeling of DNP3 attacks in smart grids\". Proceedings from the 36th IEEE Sarnoff Symposium: 155\u201360. doi:10.1109\/SARNOF.2015.7324661.   \n\n\u2191 Li, Q.; Feng, X.; Wang, H. et al. (2018). \"Understanding the Usage of Industrial Control System Devices on the Internet\". IEEE Internet of Things Journal 5 (3): 2178\u201389. doi:10.1109\/JIOT.2018.2826558.   \n\n\u2191 \"Modicon M241 Micro PLC - TM241CE40R\". Schneider Electric. https:\/\/www.schneider-electric.us\/en\/product\/TM241CE40R\/controller-m241-40-io-relay-ethernet\/ . Retrieved 08 August 2018 .   \n\n\u2191 Erickson, K.T. (2011). Programmable Logic Controllers: An Emphasis on Design and Application (2nd ed.). Dogwood Valley Press. ISBN 9780976625902.   \n\n\u2191 19.0 19.1 Mantere, M.; Uusitalo, I.; Sailio, M. et al. (2012). \"Challenges of Machine Learning Based Monitoring for Industrial Control System Networks\". Proceedings from the 26th International Conference on Advanced Information Networking and Applications Workshops: 968-972. doi:10.1109\/WAINA.2012.135.   \n\n\u2191 20.0 20.1 Ng, A.Y.; Jordan, M.I. (2001). \"On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes\". Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic: 841\u201348.   \n\n\u2191 Zhang, J.; Zulkernine, M.; Haque, A. (2008). \"Random-Forests-Based Network Intrusion Detection Systems\". IEEE Transactions on Systems, Man, and Cybernetics, Part C 38 (5): 649\u201359. doi:10.1109\/TSMCC.2008.923876.   \n\n\u2191 Amor, N.B.; Benferhat, S.; Elouedi, Z. (2004). \"Naive Bayes vs decision trees in intrusion detection systems\". Proceedings of the 2004 ACM Symposium on Applied Computing: 420\u201324. doi:10.1145\/967900.967989.   \n\n\u2191 Chen, W.-H.; Hsu, S.-H.; Shen, H.-P. (2005). \"Application of SVM and ANN for intrusion detection\". Computers & Operations Research 32 (10): 2617\u201334. doi:10.1016\/j.cor.2004.03.019.   \n\n\u2191 Zhang, H.; Berg, A.C.; Maire, M. et al. (2006). \"SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition\". Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition: 2126-2136. doi:10.1109\/CVPR.2006.301.   \n\n\u2191 Sokolova, M.; Lapalme, G. (2009). \"A systematic analysis of performance measures for classification tasks\". Information Processing & Management 45 (4): 427\u201337. doi:10.1016\/j.ipm.2009.03.002.   \n\n\u2191 Buda, M.; Maki, A.; Mazurowski, M.A.. \"A systematic study of the class imbalance problem in convolutional neural networks\". Neural Networks 106: 249\u201359. doi:10.1016\/j.neunet.2018.07.011.   \n\n\u2191 He, H.; Garcia, E.A. (2009). \"Learning from Imbalanced Data\". IEEE Transactions on Knowledge and Data Engineering 21 (9): 1263-84. doi:10.1109\/TKDE.2008.239.   \n\n\u2191 Calderon, P. (2017). Nmap: Network Exploration and Security Auditing Cookbook (2nd Revised ed.). Packt Publishing. ISBN 9781786467454.   \n\n\u2191 29.0 29.1 29.2 29.3 29.4 29.5 29.6 Mnemon, E.; Soullie, A.; Torrents, A. et al.. \"Vulnerability & Exploit Database\". Rapid7 LLC. https:\/\/www.rapid7.com\/db\/modules\/auxiliary\/scanner\/scada\/modbusclient . Retrieved 30 January 2017 .   \n\n\u2191 30.0 30.1 30.2 \"Wireshark\". Wireshark Foundation. https:\/\/www.wireshark.org\/ . Retrieved 20 October 2017 .   \n\n\u2191 \"Argus\". QoSient, LLC. https:\/\/qosient.com\/argus\/ . Retrieved 10 November 2017 .   \n\n\u2191 Mantere, M., Sailio, M., Noponen, S. (2013). \"Network Traffic Features for Anomaly Detection in Specific Industrial Control System Network\". Future Internet 5 (4): 460\u201373. doi:10.3390\/fi5040460.   \n\n\u2191 Salman, T.; Bhamare, D.; Erbad, A. et al.. \"Machine Learning for Anomaly Detection and Categorization in Multi-Cloud Environments\". Proceedings from the 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing: 97\u2013103. doi:10.1109\/CSCloud.2017.15.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added. The Buda et al. article cited in the original has since been published fully, and the citation here is updated to reflect that.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\">https:\/\/www.limswiki.org\/index.php\/Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles (with rendered math)LIMSwiki journal articles on cybersecurityLIMSwiki journal articles on sensor networks\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 12 March 2019, at 18:01.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 206 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","d400aae80e71d72278a98ceb5a2237dd_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:SCADA system testbed for cybersecurity research using machine learning approach<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p>This paper presents the development of a <a href=\"https:\/\/www.limswiki.org\/index.php\/Supervisory_control_and_data_acquisition\" title=\"Supervisory control and data acquisition\" class=\"mw-redirect wiki-link\" data-key=\"15a9cf66d1585180cc0c2afeb1a0f817\">supervisory control and data acquisition<\/a> (SCADA) system testbed used for <a href=\"https:\/\/www.limswiki.org\/index.php\/Cybersecurity\" title=\"Cybersecurity\" class=\"mw-redirect wiki-link\" data-key=\"ba653dc2a1384e5f9f6ac9dc1a740109\">cybersecurity<\/a> research. The testbed consists of a water storage tank\u2019s control system, which is a stage in the process of water treatment and distribution. Sophisticated cyber-attacks were conducted against the testbed. During the attacks, the network traffic was captured, and features were extracted from the traffic to build a dataset for training and testing different machine learning algorithms. Five traditional machine learning algorithms were trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Na\u00efve Bayes, and KNN. Then, the trained machine learning models were built and deployed in the network, where new tests were made using online network traffic. The performance obtained during the training and testing of the machine learning models was compared to the performance obtained during the online deployment of these models in the network. The results show the efficiency of the machine learning models in detecting the attacks in real time. The testbed provides a good understanding of the effects and consequences of attacks on real SCADA environments.\n<\/p><p><b>Keywords<\/b>: cybersecurity, machine learning, SCADA system, network security\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Supervisory control and data acquisition (SCADA) systems are industrial control systems (ICS) widely used by industries to monitor and control different processes such as oil and gas pipelines, water distribution systems, electrical power grids, etc. These systems provide automated control and remote monitoring of services being used in daily life. For example, state and municipal governments use SCADA systems to monitor and regulate water levels in reservoirs, pipe pressure, and water distribution.\n<\/p><p>A typical SCADA system includes components like computer workstations, a human-machine interface (HMI), programmable logic controllers (PLCs), sensors, and actuators.<sup id=\"rdp-ebb-cite_ref-Arag.C3.B3SCADA14_1-0\" class=\"reference\"><a href=\"#cite_note-Arag.C3.B3SCADA14-1\">[1]<\/a><\/sup> Historically, these systems had <a href=\"https:\/\/www.limswiki.org\/index.php\/Network_security\" title=\"Network security\" class=\"wiki-link\" data-key=\"8cb5e340f8617180886fad5cf3c252f3\">private and dedicated networks<\/a>. However, due to the wide-range deployment of remote management, open IP networks (e.g., the internet) are now used for SCADA system communication.<sup id=\"rdp-ebb-cite_ref-NCSSuper04_2-0\" class=\"reference\"><a href=\"#cite_note-NCSSuper04-2\">[2]<\/a><\/sup> This exposes SCADA systems to the cyberspace and makes them vulnerable to cyber-attacks using the internet.\n<\/p><p>Machine learning (ML) and <a href=\"https:\/\/www.limswiki.org\/index.php\/Artificial_intelligence\" title=\"Artificial intelligence\" class=\"wiki-link\" data-key=\"0c45a597361ca47e1cd8112af676276e\">artificial intelligence<\/a> techniques have been widely used to build intelligent and efficient intrusion detection systems (IDS) dedicated to ICS. However, researchers generally develop and train their ML-based security system using network traces obtained from publicly available datasets. Due to malware evolution and changes in attack strategies, these datasets fail to protect the system from new types of attacks, and consequently, the benchmark datasets should be updated periodically.\n<\/p><p>This paper presents the deployment of a SCADA system testbed for cybersecurity research and investigates the feasibility of using ML algorithms to detect cyber-attacks in real time. The testbed was built using equipment deployed in real industrial settings. Sophisticated attacks were conducted on the testbed to develop a better understanding of the attacks and their consequences in SCADA environments. The network traffic was captured, including both abnormal and normal traffic. The behavior of both types of traffic (abnormal and normal) was analyzed, and features were extracted to build a new SCADA-IDS dataset. This dataset was then used for training and testing ML models, which were further deployed in the network. The performance of the ML model depends highly on the available datasets. One of the main contributions of this paper is building a new dataset updated with recent and more sophisticated attacks. We argue that IDS using ML models trained with a dataset generated at the process control level could be more efficient, less complicated, and more cost-effective as compared to traditional protection techniques. Five traditional machine learning algorithms were trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Na\u00efve Bayes, and KNN. Once trained and tested, the ML models were deployed in the network, where real network traffic was used to analyze the effectiveness and efficiency of the ML models in a real-time environment. We compared the performance obtained during the training and test phase of the ML models with the performance obtained during the online deployment of these models in the network. The online deployment is another contribution of this paper since most of the published papers present the performance of the ML models obtained during the training and test phases. We conducted this research to build an IDS software based on ML models to be deployed in ICS\/SCADA systems.\n<\/p><p>The remainder of this paper is organized as follows. The next section presents a brief background of the ICS-SCADA system reference model and related works. Afterwards, we describe the developed SCADA system testbed, and then we describe the ML algorithms and the performance measurements used in this work. The last three sections show conducted attack scenarios and the main features of the dataset used to train the algorithms, the results and the interoperations behind them, and a summary of the main points and outcomes.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Background\">Background<\/span><\/h2>\n<p>In this section, we briefly present a description of the ICS-SCADA reference model and some related works in the domain of ML algorithms for SCADA system security.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"ICS_reference_model\">ICS reference model<\/span><\/h3>\n<p>\"ICS\" is a general term that covers numerous control systems, including SCADA systems, distributed control systems, and other control system configurations.<sup id=\"rdp-ebb-cite_ref-FilkinsITSec16_3-0\" class=\"reference\"><a href=\"#cite_note-FilkinsITSec16-3\">[3]<\/a><\/sup> An ICS consists of combinations of control components (e.g., electrical, mechanical, hydraulic, pneumatic) that are used to achieve various industrial objectives (e.g., manufacturing, transportation of matter or energy). Figure 1 shows an example of an ICS reference model.<sup id=\"rdp-ebb-cite_ref-StoufferGuide15_4-0\" class=\"reference\"><a href=\"#cite_note-StoufferGuide15-4\">[4]<\/a><\/sup>\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Teixeira_FutureInternet2018_10-8.png\" class=\"image wiki-link\" data-key=\"23e3dc66342cc34fa8893644d701ea6f\"><img alt=\"Fig1 Teixeira FutureInternet2018 10-8.png\" src=\"https:\/\/www.limswiki.org\/images\/c\/c3\/Fig1_Teixeira_FutureInternet2018_10-8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 1<\/b> Industrial control systems (ICS) reference model<sup id=\"rdp-ebb-cite_ref-StoufferGuide15_4-1\" class=\"reference\"><a href=\"#cite_note-StoufferGuide15-4\">[4]<\/a><\/sup><\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>As can be seen from Figure 1, the ICS model is divided into four levels, from 3 to 0. Level 3 (the corporate network) consists of traditional information technology, including the general deployment of services and systems, such as file transfer, websites, mail servers, resource planning, and office automation systems. Level 2 (the supervisory control local area network) includes the functions involved in monitoring and controlling the physical processes and the general deployment of systems such as HMIs, engineering workstations, and history logs. Level 1 (the control network) includes the functions involved in sensing and manipulating physical processes, e.g., receiving the information, processing the data, and triggering outputs, which are all done in PLCs. Level 0 (the I\/O network) consists of devices (sensors\/actuators) that are directly connected to the physical process.\n<\/p><p>As shown in Figure 1, Level 3 is composed of the traditional IT infrastructure system (internet access service, file transfer protocol server, virtual private network (VPN) remote access, etc.). Levels 2, 1, and 0 represent a typical SCADA system, which is composed of the following components:\n<\/p>\n<ul><li> HMI: Used to observe the status of the system or to adjust the system parameters for processes control and management purposes<\/li>\n<li> Engineering workstation: Used by engineers for programming the control functions of the HMI<\/li>\n<li> History logs: Used to collect the data in real-time from the automation processes for current or later analysis<\/li>\n<li> PLCs: Slave stations in the SCADA architecture that are connected to sensors or actuators<\/li><\/ul>\n<h3><span class=\"mw-headline\" id=\"The_SCADA_communication_protocol\">The SCADA communication protocol<\/span><\/h3>\n<p>There are several communication protocols developed for use in SCADA systems. These protocols define the standard message format for all inter-device communications in the network. One popular protocol, which is widely used in SCADA system environments, is the Modbus protocol.<sup id=\"rdp-ebb-cite_ref-Modbus_5-0\" class=\"reference\"><a href=\"#cite_note-Modbus-5\">[5]<\/a><\/sup> Modbus is an application-layer messaging protocol that provides the client\/server communications between devices connected to an Ethernet network and offers services specified by function codes. The function codes tell the server what action to take. For example, a client can read the status of the discrete outputs or the values of digital inputs from the PLC; or it can read\/write the data contents of a group of registers inside the PLC. Figure 2 illustrates an example of Modbus client\/server communication.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Teixeira_FutureInternet2018_10-8.png\" class=\"image wiki-link\" data-key=\"9f942cf6078aa95dd89ccac0af923284\"><img alt=\"Fig2 Teixeira FutureInternet2018 10-8.png\" src=\"https:\/\/www.limswiki.org\/images\/8\/84\/Fig2_Teixeira_FutureInternet2018_10-8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 2<\/b> Modbus client\/server communication example<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The Modbus register address type consists of four data reference types<sup id=\"rdp-ebb-cite_ref-Modbus_5-1\" class=\"reference\"><a href=\"#cite_note-Modbus-5\">[5]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-MBProt_6-0\" class=\"reference\"><a href=\"#cite_note-MBProt-6\">[6]<\/a><\/sup> which are summarized in Table 1. The \u201cxxxx\u201d following a leading digit represents a four-digit address location in the user data memory.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"3\"><b>Table 1.<\/b> Data reference types<sup id=\"rdp-ebb-cite_ref-MBProt_6-1\" class=\"reference\"><a href=\"#cite_note-MBProt-6\">[6]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-MorrisIndust14_7-0\" class=\"reference\"><a href=\"#cite_note-MorrisIndust14-7\">[7]<\/a><\/sup>\n<\/td><\/tr>\n\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Reference\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Range\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Description\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0xxxx\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">00001\u201309999\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Read\/Write Discrete Outputs or Coils\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1xxxx\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">10001\u201319999\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Read Discrete Inputs\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3xxxx\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">30001\u201339999\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Read Input Registers\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">4xxxx\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">40001\u201349999\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Read\/Write-Output or Holding Registers\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Related_works\">Related works<\/span><\/h3>\n<p>Cyber-attacks are continuously evolving and changing behavior to bypass security mechanisms. Thus, the utilization of advanced security mechanisms is essential to identify and prevent new attacks. In this sense, the development of real testbeds advances the research in this area.\n<\/p><p>Morris <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-MorrisIndust14_7-1\" class=\"reference\"><a href=\"#cite_note-MorrisIndust14-7\">[7]<\/a><\/sup> describe four datasets to be used for cybersecurity research. The datasets include network traffic, process control, and process measurement features from a set of attacks against testbeds which use Modbus application layer protocol. The authors argue there are several datasets developed to train and validate IDS associated with traditional information technology systems, but in the SCADA security area there is a lack of availability and access to SCADA network traffic. In our work, a new dataset with new types of attacks was created. So, once our dataset is available, we are providing a resource that could be used by researchers to train, validate, and compare their results with other datasets.\n<\/p><p>In order to investigate the security of the Modbus\/TCP protocol, Miciolino <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-MiciolinoComm15_8-0\" class=\"reference\"><a href=\"#cite_note-MiciolinoComm15-8\">[8]<\/a><\/sup> explored a complex cyber-physical testbed, conceived for the control and monitoring of a water system. The analysis of the experimental results highlights the critical characteristics of the Modbus\/TCP as a popular communication protocol in ICS environments. They concluded that by obtaining sufficient knowledge of the system, an attacker is able to change the commands of the actuators or the sensor readings in order to achieve its malicious objectives. Obtaining knowledge of the system is the first step in attacking a system. This attack is also known as a reconnaissance attack. Hence, in our work, our ML models are trained to recognize this kind of attack.\n<\/p><p>Rosa <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-RosaAttack17_9-0\" class=\"reference\"><a href=\"#cite_note-RosaAttack17-9\">[9]<\/a><\/sup> describe some practical cyber-attacks using an electricity grid testbed. This testbed consists of a hybrid environment of SCADA assets (e.g., PLCs, HMIs, process control servers) controlling an emulated power grid. The work explains their attacks and discusses some of the challenges faced by an attacker in implementing them. One of the attacks is the reconnaissance network attack. The authors argue that this kind of attack can be used not only to discover devices and types of services but also to perform fingerprinting and discover PLCs behind the gateways. Hence, in our work, advanced reconnaissance attacks were carried out, and ML algorithms were used to detect them.\n<\/p><p>Keliris <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-KelirisMach16_10-0\" class=\"reference\"><a href=\"#cite_note-KelirisMach16-10\">[10]<\/a><\/sup> developed a process-aware supervised learning defense strategy that considers the operational behavior of an ICS to detect attacks in real-time. They used a benchmark chemical process and considered several categories of attack vectors on their hardware controllers. They used their trained SVM model to detect abnormalities in real-time and to distinguish between disturbances and malicious behavior as well. In our work, we used five ML algorithms to identify the abnormal behavior in real-time and evaluated their detection performance.\n<\/p><p>Tomin <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-TominMach16_11-0\" class=\"reference\"><a href=\"#cite_note-TominMach16-11\">[11]<\/a><\/sup> presented a semi-automated method for online security assessment using ML techniques. They outline their experience obtained at the Melentiev Energy Systems Institute, Russia in developing ML-based approaches for detecting potentially dangerous states in power systems. Multiple ML algorithms were trained offline using a resampling cross-validation method. Then, the best model among the ML algorithms was selected based on performance and was used online. They argue that the use of ML techniques provides reliable and robust solutions that can resolve the challenges in planning and operating future industrial systems with an acceptable level of security.\n<\/p><p>Cherdantseva <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-CherdantsevaARev16_12-0\" class=\"reference\"><a href=\"#cite_note-CherdantsevaARev16-12\">[12]<\/a><\/sup> reviewed the state of the art in cybersecurity risk assessment of SCADA systems. This review indicates that despite the popularity of the machine learning techniques, research groups in ICS security have reported a lack of standard datasets for training and testing machine learning algorithms. The lack of standard datasets has resulted in an inability to develop robust ML models to detect the anomalies in ICS. Using the testbed proposed in this paper, we built a new dataset for training and testing ML algorithms.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"The_SCADA_system_testbed\">The SCADA system testbed<\/span><\/h2>\n<p>In this section, we describe the configuration of our SCADA system testbed for cybersecurity research.\n<\/p><p>The purpose of our testbed is to emulate real-world industrial systems as closely as possible without replicating an entire plant or assembly system.<sup id=\"rdp-ebb-cite_ref-CandellAnInd15_13-0\" class=\"reference\"><a href=\"#cite_note-CandellAnInd15-13\">[13]<\/a><\/sup> The utilization of a testbed allows us to carry out real cyber-attacks. Our testbed is dedicated to controlling a water storage tank, which is a part of the process of water treatment and distribution. The components used in our testbed are commonly used in real SCADA environments. Figure 3 shows the SCADA testbed framework for our targeted application and Table 2 shows a brief description of the equipment used to build the testbed.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Teixeira_FutureInternet2018_10-8.png\" class=\"image wiki-link\" data-key=\"5b035bfeca991acb42fb2f6d6d2283f4\"><img alt=\"Fig3 Teixeira FutureInternet2018 10-8.png\" src=\"https:\/\/www.limswiki.org\/images\/f\/f4\/Fig3_Teixeira_FutureInternet2018_10-8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 3<\/b> The testbed framework<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"2\"><b>Table 2.<\/b> Description of the devices used in the testbed\n<\/td><\/tr>\n\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Device\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Description\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">On button\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Turns on the level control process of the water storage tank\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Off button\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Turns off the level control process of the water storage tank\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Light indicator\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Indicates whether the system is on or off\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Level sensor 1 (LS1)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Monitors the maximum water level in the tank; when the water reaches the maximum level, the sensor sends a signal to PLC\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Level sensor 2 (LS2)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Monitors the maximum water level in the tank; when the water reaches the minimum level, the sensor sends a signal to PLC\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Valve\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Controls the water level in the tank. When the water reaches the maximum level, the valve opens, and when the water reaches the minimal level, the valve closes. This logic is implemented in PLC using the ladder language.\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Water pump 1\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Fills up the water tank\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Water pump 2\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Draws water from the tank when the valve is open\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">PLC\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Controls the physical process. The logic of the water control system is in PLC, which receives signals from the input devices (buttons, sensors), executes the program, and sends signals to the output devices (water pumps and valve).\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">HMI\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Used by the administrator to monitor and control the water storage system in real-time. The administrator can also display the devices\u2019 state and interact with the system through this interface.\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Data history\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Used to store logs and events of the SCADA system.\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>As shown in Figure 3, the storage tank has two level sensors: Level Sensor 1 (LS1) and Level Sensor 2 (LS2) that monitor the water level in the tank. When the water reaches the maximum level defined in the system, the LS1 sends a signal to the PLC. The PLC turns off Water Pump 1 used to fill up the tank, opens the valve, and turns on Water Pump 2 to draw the water from the tank. When the water reaches the minimal level defined in the system, LS2 sends a signal to the PLC, which closes the valve, turns off Water Pump 2, and turns on Water Pump 1 to fill up the tank. This process starts over when the water level reaches LS1. The SCADA system gets data from the PLC using the Modbus communication protocol and displays them to the system operator through the HMI interface.\n<\/p><p>There are other ICS protocols which could be used instead of Modbus in our testbed. For example, DNP3 is an ICS protocol that provides some security mechanisms.<sup id=\"rdp-ebb-cite_ref-DNP3_14-0\" class=\"reference\"><a href=\"#cite_note-DNP3-14\">[14]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-DarwishExperi15_15-0\" class=\"reference\"><a href=\"#cite_note-DarwishExperi15-15\">[15]<\/a><\/sup> However, in recent research, Li <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-LiUnder18_16-0\" class=\"reference\"><a href=\"#cite_note-LiUnder18-16\">[16]<\/a><\/sup> reported that they found 17,546 devices connected to the internet using the Modbus protocol spread all over the world. They did not count the amount of equipment not directly connected to the internet. Although there are other ICS protocols, many industries still use SCADA systems with Modbus protocol because their equipment does not support other protocols. In this case, solutions to detect attacks can be cheaper than other solutions, for example, changing the devices.\n<\/p><p>PLC Schneider model M241CE40<sup id=\"rdp-ebb-cite_ref-SchneiderModicon_17-0\" class=\"reference\"><a href=\"#cite_note-SchneiderModicon-17\">[17]<\/a><\/sup> is used in our testbed to control the process of the water storage tank. The logic programming of the PLC is done using the LADDER programming language<sup id=\"rdp-ebb-cite_ref-EricksonProg11_18-0\" class=\"reference\"><a href=\"#cite_note-EricksonProg11-18\">[18]<\/a><\/sup>(not covered in this paper). The sensors described in Table 2 are connected to the digital inputs of the PLC. The pumps and valves are connected to the output of the PLC.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Machine_learning_algorithms_and_performance_measurements\">Machine learning algorithms and performance measurements<\/span><\/h2>\n<p>In this section, we describe the ML algorithms used in our work as well as the measurements used to evaluate their performances.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Machine_learning_algorithms\">Machine learning algorithms<\/span><\/h3>\n<p>The ML algorithms can be classified as supervised, unsupervised, and semi-supervised. Each class has its own characteristics and applicability. The discussion of all algorithms is beyond the scope of this paper. However, we refer the reader to Mantere <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-MantereChall12_19-0\" class=\"reference\"><a href=\"#cite_note-MantereChall12-19\">[19]<\/a><\/sup> and Ng and Jordan<sup id=\"rdp-ebb-cite_ref-NgOnDiscrim01_20-0\" class=\"reference\"><a href=\"#cite_note-NgOnDiscrim01-20\">[20]<\/a><\/sup> for detailed technical discussions of these algorithms. In this paper, we use traditional ML algorithms to detect the attacks. Our target is to build supervised machine learning models, and we chose the followings algorithms for attack detection and classification:\n<\/p>\n<ul><li> Logistic Regression<sup id=\"rdp-ebb-cite_ref-NgOnDiscrim01_20-1\" class=\"reference\"><a href=\"#cite_note-NgOnDiscrim01-20\">[20]<\/a><\/sup><\/li>\n<li> Random Forest<sup id=\"rdp-ebb-cite_ref-ZhangRandom08_21-0\" class=\"reference\"><a href=\"#cite_note-ZhangRandom08-21\">[21]<\/a><\/sup><\/li>\n<li> Na\u00efve Bayes<sup id=\"rdp-ebb-cite_ref-AmorNaive04_22-0\" class=\"reference\"><a href=\"#cite_note-AmorNaive04-22\">[22]<\/a><\/sup><\/li>\n<li> Support Vector Machine (SVM)<sup id=\"rdp-ebb-cite_ref-ChenApp05_23-0\" class=\"reference\"><a href=\"#cite_note-ChenApp05-23\">[23]<\/a><\/sup><\/li>\n<li> KNN<sup id=\"rdp-ebb-cite_ref-ZhangSVMKNN06_24-0\" class=\"reference\"><a href=\"#cite_note-ZhangSVMKNN06-24\">[24]<\/a><\/sup><\/li><\/ul>\n<p>The performance of these algorithms is discussed in the \"Numerical results\" section.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Performance_measurements\">Performance measurements<\/span><\/h3>\n<p>Traditionally, the performance of ML algorithms is measured by metrics which are derived from the confusion matrix.<sup id=\"rdp-ebb-cite_ref-SokolovaASys09_25-0\" class=\"reference\"><a href=\"#cite_note-SokolovaASys09-25\">[25]<\/a><\/sup> Table 3 shows the confusion matrix in the IDS context.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"3\"><b>Table 3.<\/b> Confusion matrix in the intrusion detection system (IDS) context\n<\/td><\/tr>\n\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Data class\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Classified as normal\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Classified as abnormal\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Normal\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">True negative (TN)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">False negative (FN)\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Abnormal\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">False positive (FP)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">True positive (TP)\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>In the IDS context, the following parameters are used to create the confusion matrix:\n<\/p>\n<ul><li> TN: Represents the number of normal flows correctly classified as normal (e.g., normal traffic);<\/li>\n<li> TP: Represents the number of abnormal flows (attacks) correctly classified as abnormal (e.g., attack traffic);<\/li>\n<li> FP: Represents the number of normal flows incorrectly classified as abnormal; and<\/li>\n<li> FN: Represents the number of abnormal flows incorrectly classified as normal.<\/li><\/ul>\n<p>Next, we present several evaluation metrics and their respective formulas which are derived from the confusion matrix parameters:\n<\/p>\n<ul><li> Accuracy: The percentage of correctly predicted flows considering the total number of predictions:<br \/><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0e3592b5244deeac8660582ca611f0dd0432a748'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -2.005ex; width:45.78ex; height:5.343ex;\" \/><\/span><\/li><\/ul>\n<ul><li> False Alarm Rate (FAR): This represents the percentage of the normal flows misclassified as abnormal flows (attack) by the model: <br \/><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/58da217add9959d61cae0ac2936cc207288d45de'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -2.005ex; width:29.795ex; height:5.343ex;\" \/><\/span><\/li><\/ul>\n<ul><li> Un-Detection Rate (UND): The fraction of the abnormal flows (attack) which are misclassified as normal flows by the model:<br \/><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/761208949db3ed4867ff6cd1c1f95f747027fc23'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -2.005ex; width:29.737ex; height:5.343ex;\" \/><\/span><\/li><\/ul>\n<p>Accuracy (as shown in the first equation) is the most frequently used metric for evaluating the performance of learning models in classification problems. However, this metric is not very reliable for evaluating the ML performance in scenarios with imbalanced classes.<sup id=\"rdp-ebb-cite_ref-BudaASyst18_26-0\" class=\"reference\"><a href=\"#cite_note-BudaASyst18-26\">[26]<\/a><\/sup> In this case, one class is dominant in number, and it has more samples relatively compared to another class. For example, in IDS scenarios, the proportion of normal flows to attack flows is very high in any realistic dataset. That is, the number of samples in the dataset which represent the normal flows is enormous compared to the number of samples which represents the attack flows. This problem is prevalent in scenarios where anomaly detection is crucial, such as fraudulent transactions in banks, identification of rare diseases, and in the identification of cyber-attacks in critical infrastructure. New metrics have been developed to avoid a biased analysis.<sup id=\"rdp-ebb-cite_ref-HeLearning09_27-0\" class=\"reference\"><a href=\"#cite_note-HeLearning09-27\">[27]<\/a><\/sup> So, in addition to the accuracy, we also used the FAR and UND metrics.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Attack_scenarios.2C_features_selection.2C_and_evaluation_scenarios\">Attack scenarios, features selection, and evaluation scenarios<\/span><\/h2>\n<p>In this section, we describe the attacks carried out in our testbed and the features used to build our dataset. This dataset was used for training and testing the ML algorithms, as described in the following section on numerical results.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Attack_scenarios\">Attack scenarios<\/span><\/h3>\n<p>Network attacks on SCADA systems can be divided into three categories: reconnaissance, command injection, and denial of service (DoS).<sup id=\"rdp-ebb-cite_ref-MorrisIndust14_7-2\" class=\"reference\"><a href=\"#cite_note-MorrisIndust14-7\">[7]<\/a><\/sup> Our focus in this paper is on the reconnaissance attacks where the network is scanned for possible vulnerabilities to be used for later attacks. A reconnaissance attack is the first stage of any attack on a networking system. In this stage, hackers use scan tools to inspect the topology of the victim network and identify the devices in the network as well as their vulnerabilities. Figure 4 shows our testbed attack scenario where the dashed rectangles highlight the vulnerable spots and possible attack targets in the system.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Teixeira_FutureInternet2018_10-8.png\" class=\"image wiki-link\" data-key=\"b5f56a6c86da188b2ddbdf4a97a17106\"><img alt=\"Fig4 Teixeira FutureInternet2018 10-8.png\" src=\"https:\/\/www.limswiki.org\/images\/9\/9f\/Fig4_Teixeira_FutureInternet2018_10-8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 4<\/b> Attack scenario<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Some reconnaissance attacks can be easily detected. For example, there are scanning tools which send a large number of packets per second under Modbus\/TCP to the targeted device and wait for acknowledgment of the packets from them. If a response is received, the host (i.e., the device) is active. This attack generates a considerable variation in the traffic behavior which can be easily detected by the traditional IDS or even the traditional firewall or rule-based mechanisms. Figure 5 shows an example of the traffic behavior when a scanning tool was used in our testbed.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig5_Teixeira_FutureInternet2018_10-8.png\" class=\"image wiki-link\" data-key=\"e98d99770d859b9a1aa3a1d304be0cca\"><img alt=\"Fig5 Teixeira FutureInternet2018 10-8.png\" src=\"https:\/\/www.limswiki.org\/images\/2\/2e\/Fig5_Teixeira_FutureInternet2018_10-8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 5<\/b> Network traffic behavior under easy to detect attacks<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>On the other hand, there are some sophisticated reconnaissance attacks which are more difficult to detect. For example, some exploits can be used to map the network, which results in an attack behavior very similar to normal traffic. Figure 6 illustrates the network traffic behavior during such exploit attacks. As can be seen, the change in the traffic behavior is negligible under the attack. Thus, it is difficult to detect the attack. The use of rule-based mechanisms would fail because the signature of the Modbus and TCP traffic do not change, and the language used to express the detection rules may not be expressive enough. On the other hand, the use of ML can improve the detection rate as ML algorithms can be trained to detect these attack scenarios.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig6_Teixeira_FutureInternet2018_10-8.png\" class=\"image wiki-link\" data-key=\"7b6784ca90b4bde638062de54f70e081\"><img alt=\"Fig6 Teixeira FutureInternet2018 10-8.png\" src=\"https:\/\/www.limswiki.org\/images\/6\/6a\/Fig6_Teixeira_FutureInternet2018_10-8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 6<\/b> Network traffic behavior under difficult to detect attacks<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>We conducted the following reconnaissance and exploit attacks specific to the ICS environment described in Table 4. Details of the commands used to perform the attacks can be found in works by Calderon<sup id=\"rdp-ebb-cite_ref-CalderonNmap17_28-0\" class=\"reference\"><a href=\"#cite_note-CalderonNmap17-28\">[28]<\/a><\/sup> and Mnemon <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-MnemonVuln_29-0\" class=\"reference\"><a href=\"#cite_note-MnemonVuln-29\">[29]<\/a><\/sup> During the attacks, the network traffic was captured to be analyzed. We used Wireshark<sup id=\"rdp-ebb-cite_ref-Wireshark_30-0\" class=\"reference\"><a href=\"#cite_note-Wireshark-30\">[30]<\/a><\/sup> and Argus<sup id=\"rdp-ebb-cite_ref-Argus_31-0\" class=\"reference\"><a href=\"#cite_note-Argus-31\">[31]<\/a><\/sup> to analyze the captured traffic. The captured traffic included unencrypted control information of the devices (valve, pumps, sensors) as well as information regarding their type (function codes, type of data). Table 5 presents statistical information about the captured traffic.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"2\"><b>Table 4.<\/b> Reconnaissance attacks carried out against our testbed<sup id=\"rdp-ebb-cite_ref-MnemonVuln_29-1\" class=\"reference\"><a href=\"#cite_note-MnemonVuln-29\">[29]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-Wireshark_30-1\" class=\"reference\"><a href=\"#cite_note-Wireshark-30\">[30]<\/a><\/sup>\n<\/td><\/tr>\n\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Attack name\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Attack description\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Port scanner<sup id=\"rdp-ebb-cite_ref-MnemonVuln_29-2\" class=\"reference\"><a href=\"#cite_note-MnemonVuln-29\">[29]<\/a><\/sup>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">This attack is used to identify common SCADA protocols on the network. Using Nmap tool, packets are sent to the target at intervals which vary from one to three seconds. The TCP connection is not fully established so that the attack is difficult to detect by rules.\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Address scan attack<sup id=\"rdp-ebb-cite_ref-MnemonVuln_29-3\" class=\"reference\"><a href=\"#cite_note-MnemonVuln-29\">[29]<\/a><\/sup>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">This attack is used to scan network addresses and identify the Modbus server address. Each system has only one Modbus server, and disabling this device would collapse the whole SCADA system. Thus, this attack tries to find the unique address of the Modbus server so that it can be used for further attacks.\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Device identification attack<sup id=\"rdp-ebb-cite_ref-MnemonVuln_29-4\" class=\"reference\"><a href=\"#cite_note-MnemonVuln-29\">[29]<\/a><\/sup>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">This attack is used to enumerate the SCADA Modbus slave IDs on the network and to collect additional information such as vendor and firmware from the first slave ID found.\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Device identification attack (aggressive mode)<sup id=\"rdp-ebb-cite_ref-MnemonVuln_29-5\" class=\"reference\"><a href=\"#cite_note-MnemonVuln-29\">[29]<\/a><\/sup>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">This attack is similar to the previous attack. However, the scanning uses an aggressive mode, which means that the additional information about all slave IDs found in the system is collected.\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Exploit<sup id=\"rdp-ebb-cite_ref-Wireshark_30-2\" class=\"reference\"><a href=\"#cite_note-Wireshark-30\">[30]<\/a><\/sup>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Exploit is used to read the coil values of the SCADA devices. The coils represent the ON\/OFF status of the devices controlled by the PLC, such as motors, valves, and sensors.<sup id=\"rdp-ebb-cite_ref-MnemonVuln_29-6\" class=\"reference\"><a href=\"#cite_note-MnemonVuln-29\">[29]<\/a><\/sup>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"2\"><b>Table 5.<\/b> Statistical information on the captured traffic\n<\/td><\/tr>\n\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Measurement\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Value\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Duration of capture (h)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">25\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Dataset length (GB)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.27\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Number of observations\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">7,049,989\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Average data rate (kbit\/s)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">419\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Average packet size (bytes)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">76.75\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Percentage of scanner attack\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3 \u00d7 10<sup>\u22124<\/sup>\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Percentage of address scan attack\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">75 \u00d7 10<sup>\u22124<\/sup>\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Percentage of device identification attack\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1 \u00d7 10<sup>\u22124<\/sup>\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Percentage of device identification attack (aggressive mode)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">4.93\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Percentage of exploit attack\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.13\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Percentage of all attacks (total)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">6.07\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Percentage of normal traffic\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">93.93\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Features_selection\">Features selection<\/span><\/h3>\n<p>Once the network traffic is captured, the next step is to select potential features which can distinguish the anomalous traffic from the normal traffic. Mantere <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-MantereChall12_19-1\" class=\"reference\"><a href=\"#cite_note-MantereChall12-19\">[19]<\/a><\/sup> selected 12 useful features for ML-based network security monitoring in the ICS networks. Some of those researchers later went on to further study those potential features.<sup id=\"rdp-ebb-cite_ref-MantereNetwork13_32-0\" class=\"reference\"><a href=\"#cite_note-MantereNetwork13-32\">[32]<\/a><\/sup> In our work, we analyzed the variation of the features during the normal and attack traffic, and we analyzed those features that did not vary during the normal and attack traffic. Based on these prior works and our studies, Table 6 shows the features selected for our dataset.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"2\"><b>Table 6.<\/b> Features selected to create the dataset\n<\/td><\/tr>\n\n<tr>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Features\n<\/th>\n<th style=\"background-color:#e2e2e2; padding-left:10px; padding-right:10px;\">Descriptions\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Total Packets (TotPkts)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Total transaction packet count\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Total Bytes (TotBytes)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Total transaction bytes\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Source packets (SrcPkts)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Source\/Destination packet count\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Destination Packets (DstPkts)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Destination\/Source packet count\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Source Bytes (SrcBytes)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Source\/Destination transaction bytes\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Source Port (Sport)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Port number of the source\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Evaluation_scenario\">Evaluation scenario<\/span><\/h3>\n<p>After defining the dataset, the features were extracted as discussed in the previous subsection. Then, the data was labeled either as normal traffic or attack traffic. Following that, the dataset was split into training and test datasets. The training dataset was composed of 80 percent of the total data, and it was used to train our ML model. The test dataset consists of the remaining 20 percent of the data, and it was used to evaluate the performance of our trained ML model. We call this training and test phase \u201coffline evaluation\u201d because the ML models were trained and tested offline. Figure 7 shows our evaluation scenario.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig7_Teixeira_FutureInternet2018_10-8.png\" class=\"image wiki-link\" data-key=\"7f536eb6612605077d99405b6195e07d\"><img alt=\"Fig7 Teixeira FutureInternet2018 10-8.png\" src=\"https:\/\/www.limswiki.org\/images\/6\/63\/Fig7_Teixeira_FutureInternet2018_10-8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 7<\/b> Model evaluation<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>After training and testing, the trained ML models were created and deployed in the network. Then, their performance was analyzed using real network traffic. This phase was called \u201conline evaluation.\u201d We compared the results obtained from the two phases (offline and online). This is described next.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Numerical_results\">Numerical results<\/span><\/h2>\n<p>In this section, we present the numerical results of the attacks described previously. Figure 8 shows the results for the accuracy of the ML algorithms that were used.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig8_Teixeira_FutureInternet2018_10-8.png\" class=\"image wiki-link\" data-key=\"40248ae79ea4980a940aa6e8769225a4\"><img alt=\"Fig8 Teixeira FutureInternet2018 10-8.png\" src=\"https:\/\/www.limswiki.org\/images\/7\/7a\/Fig8_Teixeira_FutureInternet2018_10-8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 8<\/b> Accuracy results<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>As shown previously, the accuracy represents the total number of correct predictions divided by the total number of samples. Considering the offline evaluations, Figure 8 shows Decision Tree and KNN have the best accuracy (100%) compared to other ML models. However, the difference in the accuracy is small among all trained models. In other words, all chosen ML algorithms performed well in terms of accuracy during the offline phase. During the online phase, Decision Tree, Random Forest, Na\u00efve Bayes and Logistic Regression show a small difference, hence, the performance of these algorithms in both phases (offline and online), are similar. The same does not apply to the KNN model. There was a significant difference between the online and offline phase, which indicates that in practice the KNN does not provide good accuracy.\n<\/p><p>As shown in Table 5, our dataset is unbalanced. Therefore, accuracy is not the ideal measure to evaluate performance.<sup id=\"rdp-ebb-cite_ref-SalmanMachine17_33-0\" class=\"reference\"><a href=\"#cite_note-SalmanMachine17-33\">[33]<\/a><\/sup> Other metrics are needed to compare the performance of the ML algorithms. Figure 9 shows the false alarm rate (FAR) results. The FAR metric is the percentage of the regular traffic which has been misclassified as anomalous by the model.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig9_Teixeira_FutureInternet2018_10-8.png\" class=\"image wiki-link\" data-key=\"2375b5f70a724cbad1a7e62568d62bbd\"><img alt=\"Fig9 Teixeira FutureInternet2018 10-8.png\" src=\"https:\/\/www.limswiki.org\/images\/4\/46\/Fig9_Teixeira_FutureInternet2018_10-8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 9<\/b> False alarm rate results<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Regarding the offline and online evaluations, as shown in Figure 9, the Random Forest and Decision Tree models performed best, followed by the KNN model. These three models had the lowest false alarm percentages, followed by Logistical Regression and Na\u00efve Bayes. These lowest percentages mean that Random Forest, Decision Tree, and KNN perform better in detecting normal traffic. In our dataset, normal traffic is the dominant traffic; therefore, it is expected to have a low FAR value. This low FAR value could be due to the model\u2019s bias toward estimating the normal traffic perfectly, which is common in unbalanced datasets. Further, the clustering done in the Random Forest, Decision Tree, and KNN models can be helpful, especially when dealing with two types of data having different network features.\n<\/p><p>Figure 10 shows the results of the un-detection rate metric. The UND (as shown in the third equation, prior) represents the percentage of the traffic which is an anomaly but is misclassified as normal (the opposite of the FAR). The traffic represented by this metric is more critical than the traffic represented by the FAR metric because, in this case, an attack can happen without being detected. Further, in our unbalanced dataset, the models are biased toward normal traffic, and this metric would show how biased the models are.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig10_Teixeira_FutureInternet2018_10-8.png\" class=\"image wiki-link\" data-key=\"1a51bf94754158544aab4f30f76a8ccc\"><img alt=\"Fig10 Teixeira FutureInternet2018 10-8.png\" src=\"https:\/\/www.limswiki.org\/images\/2\/29\/Fig10_Teixeira_FutureInternet2018_10-8.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 10<\/b> Un-detected rate results<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>As shown in Figure 10, considering the offline performance results, the percentage of the UND is small for the Na\u00efve Bayes, Logistic Regression, and KNN models, and zero for the Decision Tree and Random Forest models. That is, all algorithm show excellent performance on this critical metric. However, considering the online performances, the KNN model had the worst performance, which was markedly different from the offline evaluation. The same did not happen to the other models, and their online performances are very close to their offline performance. This excellent performance shows that the features selected in this work are also very good as they were able to detect attacks even in an unbalanced dataset.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusions\">Conclusions<\/span><\/h2>\n<p>This paper presents the development of a SCADA system testbed to be used in cybersecurity research. The testbed was dedicated to controlling a water storage tank, which is one of several stages in the process of water treatment and distribution. The testbed was used to analyze the effects of the attacks on SCADA systems. Using the network traffic, a new dataset was developed for use by researchers to train machine learning algorithms as well as to validate and compare their results with other available datasets.\n<\/p><p>Five reconnaissance attacks specific to the ICS environment were conducted against the testbed. During the attacks, the network traffic with information about the devices (valves, pumps, sensors) was captured. Using Wireshark and Argus network tools, features were extracted to build a dataset for training and testing machine learning algorithms.\n<\/p><p>Once the dataset was generated, five traditional machine learning algorithms were used to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Na\u00efve Bayes, and KNN. These algorithms were evaluated in two phases: during the training and testing of the machine learning models (offline), and during the deployment of these models in the network (online). The performance obtained during the online phase was compared to the performance obtained during the offline phase.\n<\/p><p>Three metrics were used to evaluate the performance of the used algorithms: accuracy, FAR, and UND. Regarding the accuracy metric, in the offline phase, all ML algorithms showed an excellent performance. In the online phase, almost all the algorithms performed very close to the offline results. The KNN algorithm was the only one which did not perform well. Moreover, considering an unbalanced dataset and analyzing the FAR and UND metrics, we concluded that Random Forest and Decision Tree models performed best in both phases compared to the other models.\n<\/p><p>The results show the feasibility of detecting reconnaissance attacks in ICS environments. Our future plans include generating more attacks and checking the models\u2019 feasibility and performance in different environments. Moreover, experiments using unsupervised algorithms will be done.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>The statements made herein are solely the responsibility of the authors. The authors would like to thank the Instituto Federal de Educa\u00e7\u00e3o, Ci\u00eancia e Tecnologia de S\u00e3o Paulo (IFSP), Washington University in Saint Louis, and Qatar University.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Author_contributions\">Author contributions<\/span><\/h3>\n<p>M.A.T. built the testbed and performed the experiments. T.S. and M.Z. assisted with revisions and improvements. The work was done under the supervision and guidance of R.J., N.M. and M.S., who also formulated the problem.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h3>\n<p>This work has been supported under the grant ID NPRP 10-901-2-370 funded by the Qatar National Research Fund (QNRF) and grant #2017\/01055-4 S\u00e3o Paulo Research Foundation (FAPESP).\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Conflicts_of_interest\">Conflicts of interest<\/span><\/h3>\n<p>The authors declare no conflicts of interest.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-Arag.C3.B3SCADA14-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Arag.C3.B3SCADA14_1-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Arag\u00f3, A.S.; Mart\u00ednez, E.R.; Clares, S.S. (2014). \"SCADA Laboratory and Test-bed as a Service for Critical Infrastructure Protection\". <i>Proceedings of the 2nd International Symposium on ICS & SCADA Cyber Security Research 2014<\/i>: 25\u20139. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.14236%2Fewic%2Fics-csr2014.4\" data-key=\"b500d9b1daef94d82b55b1f098817bd8\">10.14236\/ewic\/ics-csr2014.4<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SCADA+Laboratory+and+Test-bed+as+a+Service+for+Critical+Infrastructure+Protection&rft.jtitle=Proceedings+of+the+2nd+International+Symposium+on+ICS+%26+SCADA+Cyber+Security+Research+2014&rft.aulast=Arag%C3%B3%2C+A.S.%3B+Mart%C3%ADnez%2C+E.R.%3B+Clares%2C+S.S.&rft.au=Arag%C3%B3%2C+A.S.%3B+Mart%C3%ADnez%2C+E.R.%3B+Clares%2C+S.S.&rft.date=2014&rft.pages=25%E2%80%939&rft_id=info:doi\/10.14236%2Fewic%2Fics-csr2014.4&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NCSSuper04-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NCSSuper04_2-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Communication Technologies, Inc. (October 2004). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.cedengineering.com\/userfiles\/SCADA%20Systems.pdf\" data-key=\"3fac967f2a20a3b75067f039fbc68520\">\"Supervisory Control and Data Acquisition (SCADA) Systems\"<\/a> (PDF). <i>Technical Information Bulletin 04-1<\/i>. National Communications System<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.cedengineering.com\/userfiles\/SCADA%20Systems.pdf\" data-key=\"3fac967f2a20a3b75067f039fbc68520\">https:\/\/www.cedengineering.com\/userfiles\/SCADA%20Systems.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 08 August 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Supervisory+Control+and+Data+Acquisition+%28SCADA%29+Systems&rft.atitle=Technical+Information+Bulletin+04-1&rft.aulast=Communication+Technologies%2C+Inc.&rft.au=Communication+Technologies%2C+Inc.&rft.date=October+2004&rft.pub=National+Communications+System&rft_id=https%3A%2F%2Fwww.cedengineering.com%2Fuserfiles%2FSCADA%2520Systems.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FilkinsITSec16-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FilkinsITSec16_3-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Filkins, B. (02 February 2016). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.sans.org\/reading-room\/whitepapers\/analyst\/membership\/36697\" data-key=\"8f59deb830f373be1a338edb4366dc72\">\"IT Security Spending Trends\"<\/a>. <i>SANS Analyst Papers<\/i>. SANS Institute<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.sans.org\/reading-room\/whitepapers\/analyst\/membership\/36697\" data-key=\"8f59deb830f373be1a338edb4366dc72\">https:\/\/www.sans.org\/reading-room\/whitepapers\/analyst\/membership\/36697<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 June 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=IT+Security+Spending+Trends&rft.atitle=SANS+Analyst+Papers&rft.aulast=Filkins%2C+B.&rft.au=Filkins%2C+B.&rft.date=02+February+2016&rft.pub=SANS+Institute&rft_id=https%3A%2F%2Fwww.sans.org%2Freading-room%2Fwhitepapers%2Fanalyst%2Fmembership%2F36697&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-StoufferGuide15-4\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-StoufferGuide15_4-0\">4.0<\/a><\/sup> <sup><a href=\"#cite_ref-StoufferGuide15_4-1\">4.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Stouffer, K.; Pilitteri, V.; Lightman, S. et al. (May 2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/nvlpubs.nist.gov\/nistpubs\/SpecialPublications\/NIST.SP.800-82r2.pdf\" data-key=\"6318258d560ae26a2e695740f39973a2\">\"Guide to Industrial Control Systems (ICS) Security\"<\/a> (PDF). <i>NIST Special Publication 800-82 Revision 2<\/i>. National Institute of Standards and Technology. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.6028%2FNIST.SP.800-82r2\" data-key=\"8d1717bda4bc44ce073626c7e2bed1ea\">10.6028\/NIST.SP.800-82r2<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/nvlpubs.nist.gov\/nistpubs\/SpecialPublications\/NIST.SP.800-82r2.pdf\" data-key=\"6318258d560ae26a2e695740f39973a2\">https:\/\/nvlpubs.nist.gov\/nistpubs\/SpecialPublications\/NIST.SP.800-82r2.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 June 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Guide+to+Industrial+Control+Systems+%28ICS%29+Security&rft.atitle=NIST+Special+Publication+800-82+Revision+2&rft.aulast=Stouffer%2C+K.%3B+Pilitteri%2C+V.%3B+Lightman%2C+S.+et+al.&rft.au=Stouffer%2C+K.%3B+Pilitteri%2C+V.%3B+Lightman%2C+S.+et+al.&rft.date=May+2015&rft.pub=National+Institute+of+Standards+and+Technology&rft_id=info:doi\/10.6028%2FNIST.SP.800-82r2&rft_id=https%3A%2F%2Fnvlpubs.nist.gov%2Fnistpubs%2FSpecialPublications%2FNIST.SP.800-82r2.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Modbus-5\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-Modbus_5-0\">5.0<\/a><\/sup> <sup><a href=\"#cite_ref-Modbus_5-1\">5.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.modbus.org\/tech.php\" data-key=\"e73dd0ec91f772432036ef2f9a027e61\">\"Modbus Technical Resources\"<\/a>. Modbus Organization, Inc<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.modbus.org\/tech.php\" data-key=\"e73dd0ec91f772432036ef2f9a027e61\">http:\/\/www.modbus.org\/tech.php<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 December 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Modbus+Technical+Resources&rft.atitle=&rft.pub=Modbus+Organization%2C+Inc&rft_id=http%3A%2F%2Fwww.modbus.org%2Ftech.php&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MBProt-6\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MBProt_6-0\">6.0<\/a><\/sup> <sup><a href=\"#cite_ref-MBProt_6-1\">6.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/www.modbus.org\/docs\/Modbus_Application_Protocol_V1_1b3.pdf\" data-key=\"755a3df5b1981ec21b0f1770f8cdd52e\">\"Modbus Application Protocol Specification V1.1b3\"<\/a> (PDF). Modbus Organization, Inc. 26 April 2012<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/www.modbus.org\/docs\/Modbus_Application_Protocol_V1_1b3.pdf\" data-key=\"755a3df5b1981ec21b0f1770f8cdd52e\">http:\/\/www.modbus.org\/docs\/Modbus_Application_Protocol_V1_1b3.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 08 August 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Modbus+Application+Protocol+Specification+V1.1b3&rft.atitle=&rft.date=26+April+2012&rft.pub=Modbus+Organization%2C+Inc&rft_id=http%3A%2F%2Fwww.modbus.org%2Fdocs%2FModbus_Application_Protocol_V1_1b3.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MorrisIndust14-7\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MorrisIndust14_7-0\">7.0<\/a><\/sup> <sup><a href=\"#cite_ref-MorrisIndust14_7-1\">7.1<\/a><\/sup> <sup><a href=\"#cite_ref-MorrisIndust14_7-2\">7.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Morris, T.; Wei, G. (2014). \"Industrial Control System Traffic Data Sets for Intrusion Detection Research\". <i>Proceedings from the International Conference on Critical Infrastructure Protection VIII<\/i>: 65\u201378. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-662-45355-1_5\" data-key=\"ea4e37b74ecef00ef26cf595daa59919\">10.1007\/978-3-662-45355-1_5<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Industrial+Control+System+Traffic+Data+Sets+for+Intrusion+Detection+Research&rft.jtitle=Proceedings+from+the+International+Conference+on+Critical+Infrastructure+Protection+VIII&rft.aulast=Morris%2C+T.%3B+Wei%2C+G.&rft.au=Morris%2C+T.%3B+Wei%2C+G.&rft.date=2014&rft.pages=65%E2%80%9378&rft_id=info:doi\/10.1007%2F978-3-662-45355-1_5&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MiciolinoComm15-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MiciolinoComm15_8-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Miciolino, E.E.; Bernieri, G; Pascucci, F.; Setola, R. (2015). \"Communications network analysis in a SCADA system testbed under cyber-attacks\". <i>Proceedings of the 23rd Telecommunications Forum TELFOR<\/i>: 341-344. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FTELFOR.2015.7377479\" data-key=\"85e229d4529db5446a2137b1236e39e8\">10.1109\/TELFOR.2015.7377479<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Communications+network+analysis+in+a+SCADA+system+testbed+under+cyber-attacks&rft.jtitle=Proceedings+of+the+23rd+Telecommunications+Forum+TELFOR&rft.aulast=Miciolino%2C+E.E.%3B+Bernieri%2C+G%3B+Pascucci%2C+F.%3B+Setola%2C+R.&rft.au=Miciolino%2C+E.E.%3B+Bernieri%2C+G%3B+Pascucci%2C+F.%3B+Setola%2C+R.&rft.date=2015&rft.pages=341-344&rft_id=info:doi\/10.1109%2FTELFOR.2015.7377479&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RosaAttack17-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RosaAttack17_9-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Rosa, L.; Cruz, T.; Sim\u00f5es, P. et al. (2017). \"Attacking SCADA systems: A practical perspective\". <i>IFIP\/IEEE Symposium on Integrated Network and Service Management<\/i>: 741-746. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.23919%2FINM.2017.7987369\" data-key=\"67bd6c2eaf2cef3a773ae592074ad509\">10.23919\/INM.2017.7987369<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Attacking+SCADA+systems%3A+A+practical+perspective&rft.jtitle=IFIP%2FIEEE+Symposium+on+Integrated+Network+and+Service+Management&rft.aulast=Rosa%2C+L.%3B+Cruz%2C+T.%3B+Sim%C3%B5es%2C+P.+et+al.&rft.au=Rosa%2C+L.%3B+Cruz%2C+T.%3B+Sim%C3%B5es%2C+P.+et+al.&rft.date=2017&rft.pages=741-746&rft_id=info:doi\/10.23919%2FINM.2017.7987369&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KelirisMach16-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KelirisMach16_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Keliris, A.; Salehghaffari, H.; Cairl, B. et al. (2016). \"Machine learning-based defense against process-aware attacks on Industrial Control Systems\". <i>Proceedings from the 2016 IEEE International Test Conference<\/i>: 1-10. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FTEST.2016.7805855\" data-key=\"c77ec3a323be402079cc92cf7323dbee\">10.1109\/TEST.2016.7805855<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Machine+learning-based+defense+against+process-aware+attacks+on+Industrial+Control+Systems&rft.jtitle=Proceedings+from+the+2016+IEEE+International+Test+Conference&rft.aulast=Keliris%2C+A.%3B+Salehghaffari%2C+H.%3B+Cairl%2C+B.+et+al.&rft.au=Keliris%2C+A.%3B+Salehghaffari%2C+H.%3B+Cairl%2C+B.+et+al.&rft.date=2016&rft.pages=1-10&rft_id=info:doi\/10.1109%2FTEST.2016.7805855&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TominMach16-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-TominMach16_11-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Tomin, N.V.; Kurbatsky, V.G.; Sidorov, D.N. et al. (2016). \"Machine Learning Techniques for Power System Security Assessment\". <i>IFAC-PapersOnLine<\/i> <b>49<\/b> (27): 445\u201350. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.ifacol.2016.10.773\" data-key=\"111dbdfdf7b32274994e29e46125a357\">10.1016\/j.ifacol.2016.10.773<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Machine+Learning+Techniques+for+Power+System+Security+Assessment&rft.jtitle=IFAC-PapersOnLine&rft.aulast=Tomin%2C+N.V.%3B+Kurbatsky%2C+V.G.%3B+Sidorov%2C+D.N.+et+al.&rft.au=Tomin%2C+N.V.%3B+Kurbatsky%2C+V.G.%3B+Sidorov%2C+D.N.+et+al.&rft.date=2016&rft.volume=49&rft.issue=27&rft.pages=445%E2%80%9350&rft_id=info:doi\/10.1016%2Fj.ifacol.2016.10.773&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CherdantsevaARev16-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CherdantsevaARev16_12-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Cherdantseva, Y.; Burnap, P.; Blyth, A. et al. (2016). \"A review of cyber security risk assessment methods for SCADA systems\". <i>Computers & Security<\/i> <b>56<\/b>: 1\u201327. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.cose.2015.09.009\" data-key=\"d25c0319ba74e1ddf7762b2379a5cac7\">10.1016\/j.cose.2015.09.009<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+review+of+cyber+security+risk+assessment+methods+for+SCADA+systems&rft.jtitle=Computers+%26+Security&rft.aulast=Cherdantseva%2C+Y.%3B+Burnap%2C+P.%3B+Blyth%2C+A.+et+al.&rft.au=Cherdantseva%2C+Y.%3B+Burnap%2C+P.%3B+Blyth%2C+A.+et+al.&rft.date=2016&rft.volume=56&rft.pages=1%E2%80%9327&rft_id=info:doi\/10.1016%2Fj.cose.2015.09.009&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CandellAnInd15-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CandellAnInd15_13-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Candell, R.; Zimmerman, T.; Stouffer, K. (November 2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/nvlpubs.nist.gov\/nistpubs\/ir\/2015\/NIST.IR.8089.pdf\" data-key=\"7adaa4687fe833f30a0f6aa90bab1a20\">\"An Industrial Control System Cybersecurity Performance Testbed\"<\/a> (PDF). <i>NISTIR 80089<\/i>. National Institute of Standards and Technology. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.6028%2FNIST.IR.8089\" data-key=\"712a4a08e02e5df27602ec6c7aca6599\">10.6028\/NIST.IR.8089<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/nvlpubs.nist.gov\/nistpubs\/ir\/2015\/NIST.IR.8089.pdf\" data-key=\"7adaa4687fe833f30a0f6aa90bab1a20\">https:\/\/nvlpubs.nist.gov\/nistpubs\/ir\/2015\/NIST.IR.8089.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 03 June 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=An+Industrial+Control+System+Cybersecurity+Performance+Testbed&rft.atitle=NISTIR+80089&rft.aulast=Candell%2C+R.%3B+Zimmerman%2C+T.%3B+Stouffer%2C+K.&rft.au=Candell%2C+R.%3B+Zimmerman%2C+T.%3B+Stouffer%2C+K.&rft.date=November+2015&rft.pub=National+Institute+of+Standards+and+Technology&rft_id=info:doi\/10.6028%2FNIST.IR.8089&rft_id=https%3A%2F%2Fnvlpubs.nist.gov%2Fnistpubs%2Fir%2F2015%2FNIST.IR.8089.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DNP3-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DNP3_14-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.dnp.org\/Pages\/AboutDefault.aspx\" data-key=\"a516777408bc8b5c16c0984fba9ccb7d\">\"Overview of the DNP3 Protocol\"<\/a>. DNP User Group<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.dnp.org\/Pages\/AboutDefault.aspx\" data-key=\"a516777408bc8b5c16c0984fba9ccb7d\">https:\/\/www.dnp.org\/Pages\/AboutDefault.aspx<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 03 June 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Overview+of+the+DNP3+Protocol&rft.atitle=&rft.pub=DNP+User+Group&rft_id=https%3A%2F%2Fwww.dnp.org%2FPages%2FAboutDefault.aspx&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DarwishExperi15-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DarwishExperi15_15-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Darwish, I.; Igbe, O.; Saadawi. et al. (2015). \"Experimental and theoretical modeling of DNP3 attacks in smart grids\". <i>Proceedings from the 36th IEEE Sarnoff Symposium<\/i>: 155\u201360. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FSARNOF.2015.7324661\" data-key=\"4078a39cc2c13b48a2052e16e91813e2\">10.1109\/SARNOF.2015.7324661<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Experimental+and+theoretical+modeling+of+DNP3+attacks+in+smart+grids&rft.jtitle=Proceedings+from+the+36th+IEEE+Sarnoff+Symposium&rft.aulast=Darwish%2C+I.%3B+Igbe%2C+O.%3B+Saadawi.+et+al.&rft.au=Darwish%2C+I.%3B+Igbe%2C+O.%3B+Saadawi.+et+al.&rft.date=2015&rft.pages=155%E2%80%9360&rft_id=info:doi\/10.1109%2FSARNOF.2015.7324661&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LiUnder18-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LiUnder18_16-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Li, Q.; Feng, X.; Wang, H. et al. (2018). \"Understanding the Usage of Industrial Control System Devices on the Internet\". <i>IEEE Internet of Things Journal<\/i> <b>5<\/b> (3): 2178\u201389. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FJIOT.2018.2826558\" data-key=\"da7384d11aaf957bd68be003166a337e\">10.1109\/JIOT.2018.2826558<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Understanding+the+Usage+of+Industrial+Control+System+Devices+on+the+Internet&rft.jtitle=IEEE+Internet+of+Things+Journal&rft.aulast=Li%2C+Q.%3B+Feng%2C+X.%3B+Wang%2C+H.+et+al.&rft.au=Li%2C+Q.%3B+Feng%2C+X.%3B+Wang%2C+H.+et+al.&rft.date=2018&rft.volume=5&rft.issue=3&rft.pages=2178%E2%80%9389&rft_id=info:doi\/10.1109%2FJIOT.2018.2826558&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SchneiderModicon-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SchneiderModicon_17-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.schneider-electric.us\/en\/product\/TM241CE40R\/controller-m241-40-io-relay-ethernet\/\" data-key=\"15da3787988833e704ebbbac1e681271\">\"Modicon M241 Micro PLC - TM241CE40R\"<\/a>. Schneider Electric<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.schneider-electric.us\/en\/product\/TM241CE40R\/controller-m241-40-io-relay-ethernet\/\" data-key=\"15da3787988833e704ebbbac1e681271\">https:\/\/www.schneider-electric.us\/en\/product\/TM241CE40R\/controller-m241-40-io-relay-ethernet\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 08 August 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Modicon+M241+Micro+PLC+-+TM241CE40R&rft.atitle=&rft.pub=Schneider+Electric&rft_id=https%3A%2F%2Fwww.schneider-electric.us%2Fen%2Fproduct%2FTM241CE40R%2Fcontroller-m241-40-io-relay-ethernet%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EricksonProg11-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-EricksonProg11_18-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Erickson, K.T. (2011). <i>Programmable Logic Controllers: An Emphasis on Design and Application<\/i> (2nd ed.). Dogwood Valley Press. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780976625902.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Programmable+Logic+Controllers%3A+An+Emphasis+on+Design+and+Application&rft.aulast=Erickson%2C+K.T.&rft.au=Erickson%2C+K.T.&rft.date=2011&rft.edition=2nd&rft.pub=Dogwood+Valley+Press&rft.isbn=9780976625902&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MantereChall12-19\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MantereChall12_19-0\">19.0<\/a><\/sup> <sup><a href=\"#cite_ref-MantereChall12_19-1\">19.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Mantere, M.; Uusitalo, I.; Sailio, M. et al. (2012). \"Challenges of Machine Learning Based Monitoring for Industrial Control System Networks\". <i>Proceedings from the 26th International Conference on Advanced Information Networking and Applications Workshops<\/i>: 968-972. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FWAINA.2012.135\" data-key=\"23f7e2bea19195997fbe0d5ce785c990\">10.1109\/WAINA.2012.135<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Challenges+of+Machine+Learning+Based+Monitoring+for+Industrial+Control+System+Networks&rft.jtitle=Proceedings+from+the+26th+International+Conference+on+Advanced+Information+Networking+and+Applications+Workshops&rft.aulast=Mantere%2C+M.%3B+Uusitalo%2C+I.%3B+Sailio%2C+M.+et+al.&rft.au=Mantere%2C+M.%3B+Uusitalo%2C+I.%3B+Sailio%2C+M.+et+al.&rft.date=2012&rft.pages=968-972&rft_id=info:doi\/10.1109%2FWAINA.2012.135&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NgOnDiscrim01-20\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-NgOnDiscrim01_20-0\">20.0<\/a><\/sup> <sup><a href=\"#cite_ref-NgOnDiscrim01_20-1\">20.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Ng, A.Y.; Jordan, M.I. (2001). \"On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes\". <i>Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic<\/i>: 841\u201348.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+discriminative+vs.+generative+classifiers%3A+A+comparison+of+logistic+regression+and+naive+Bayes&rft.jtitle=Proceedings+of+the+14th+International+Conference+on+Neural+Information+Processing+Systems%3A+Natural+and+Synthetic&rft.aulast=Ng%2C+A.Y.%3B+Jordan%2C+M.I.&rft.au=Ng%2C+A.Y.%3B+Jordan%2C+M.I.&rft.date=2001&rft.pages=841%E2%80%9348&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ZhangRandom08-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ZhangRandom08_21-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Zhang, J.; Zulkernine, M.; Haque, A. (2008). \"Random-Forests-Based Network Intrusion Detection Systems\". <i>IEEE Transactions on Systems, Man, and Cybernetics, Part C<\/i> <b>38<\/b> (5): 649\u201359. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FTSMCC.2008.923876\" data-key=\"fe8e8d6677e72e9cb50591ddda2eb1e1\">10.1109\/TSMCC.2008.923876<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Random-Forests-Based+Network+Intrusion+Detection+Systems&rft.jtitle=IEEE+Transactions+on+Systems%2C+Man%2C+and+Cybernetics%2C+Part+C&rft.aulast=Zhang%2C+J.%3B+Zulkernine%2C+M.%3B+Haque%2C+A.&rft.au=Zhang%2C+J.%3B+Zulkernine%2C+M.%3B+Haque%2C+A.&rft.date=2008&rft.volume=38&rft.issue=5&rft.pages=649%E2%80%9359&rft_id=info:doi\/10.1109%2FTSMCC.2008.923876&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AmorNaive04-22\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AmorNaive04_22-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Amor, N.B.; Benferhat, S.; Elouedi, Z. (2004). \"Naive Bayes vs decision trees in intrusion detection systems\". <i>Proceedings of the 2004 ACM Symposium on Applied Computing<\/i>: 420\u201324. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F967900.967989\" data-key=\"dfe5b1d80713e0d25418f83efcbfb443\">10.1145\/967900.967989<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Naive+Bayes+vs+decision+trees+in+intrusion+detection+systems&rft.jtitle=Proceedings+of+the+2004+ACM+Symposium+on+Applied+Computing&rft.aulast=Amor%2C+N.B.%3B+Benferhat%2C+S.%3B+Elouedi%2C+Z.&rft.au=Amor%2C+N.B.%3B+Benferhat%2C+S.%3B+Elouedi%2C+Z.&rft.date=2004&rft.pages=420%E2%80%9324&rft_id=info:doi\/10.1145%2F967900.967989&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ChenApp05-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ChenApp05_23-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Chen, W.-H.; Hsu, S.-H.; Shen, H.-P. (2005). \"Application of SVM and ANN for intrusion detection\". <i>Computers & Operations Research<\/i> <b>32<\/b> (10): 2617\u201334. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.cor.2004.03.019\" data-key=\"4123299c8a1ac1c59783dc0d7fe820b5\">10.1016\/j.cor.2004.03.019<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Application+of+SVM+and+ANN+for+intrusion+detection&rft.jtitle=Computers+%26+Operations+Research&rft.aulast=Chen%2C+W.-H.%3B+Hsu%2C+S.-H.%3B+Shen%2C+H.-P.&rft.au=Chen%2C+W.-H.%3B+Hsu%2C+S.-H.%3B+Shen%2C+H.-P.&rft.date=2005&rft.volume=32&rft.issue=10&rft.pages=2617%E2%80%9334&rft_id=info:doi\/10.1016%2Fj.cor.2004.03.019&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ZhangSVMKNN06-24\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ZhangSVMKNN06_24-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Zhang, H.; Berg, A.C.; Maire, M. et al. (2006). \"SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition\". <i>Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition<\/i>: 2126-2136. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FCVPR.2006.301\" data-key=\"e3caa9c03ad64fb716a85cc753b608fb\">10.1109\/CVPR.2006.301<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SVM-KNN%3A+Discriminative+Nearest+Neighbor+Classification+for+Visual+Category+Recognition&rft.jtitle=Proceedings+of+the+2006+IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.aulast=Zhang%2C+H.%3B+Berg%2C+A.C.%3B+Maire%2C+M.+et+al.&rft.au=Zhang%2C+H.%3B+Berg%2C+A.C.%3B+Maire%2C+M.+et+al.&rft.date=2006&rft.pages=2126-2136&rft_id=info:doi\/10.1109%2FCVPR.2006.301&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SokolovaASys09-25\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SokolovaASys09_25-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sokolova, M.; Lapalme, G. (2009). \"A systematic analysis of performance measures for classification tasks\". <i>Information Processing & Management<\/i> <b>45<\/b> (4): 427\u201337. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.ipm.2009.03.002\" data-key=\"c5c0d8c4f724d678a416aa93c394f079\">10.1016\/j.ipm.2009.03.002<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+systematic+analysis+of+performance+measures+for+classification+tasks&rft.jtitle=Information+Processing+%26+Management&rft.aulast=Sokolova%2C+M.%3B+Lapalme%2C+G.&rft.au=Sokolova%2C+M.%3B+Lapalme%2C+G.&rft.date=2009&rft.volume=45&rft.issue=4&rft.pages=427%E2%80%9337&rft_id=info:doi\/10.1016%2Fj.ipm.2009.03.002&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BudaASyst18-26\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BudaASyst18_26-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Buda, M.; Maki, A.; Mazurowski, M.A.. \"A systematic study of the class imbalance problem in convolutional neural networks\". <i>Neural Networks<\/i> <b>106<\/b>: 249\u201359. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.neunet.2018.07.011\" data-key=\"611b1b299db0c88dfc8340eb49e8f30b\">10.1016\/j.neunet.2018.07.011<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+systematic+study+of+the+class+imbalance+problem+in+convolutional+neural+networks&rft.jtitle=Neural+Networks&rft.aulast=Buda%2C+M.%3B+Maki%2C+A.%3B+Mazurowski%2C+M.A.&rft.au=Buda%2C+M.%3B+Maki%2C+A.%3B+Mazurowski%2C+M.A.&rft.volume=106&rft.pages=249%E2%80%9359&rft_id=info:doi\/10.1016%2Fj.neunet.2018.07.011&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HeLearning09-27\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HeLearning09_27-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">He, H.; Garcia, E.A. (2009). \"Learning from Imbalanced Data\". <i>IEEE Transactions on Knowledge and Data Engineering<\/i> <b>21<\/b> (9): 1263-84. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FTKDE.2008.239\" data-key=\"5a62a4537d2bc744d66caec1d038d785\">10.1109\/TKDE.2008.239<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+from+Imbalanced+Data&rft.jtitle=IEEE+Transactions+on+Knowledge+and+Data+Engineering&rft.aulast=He%2C+H.%3B+Garcia%2C+E.A.&rft.au=He%2C+H.%3B+Garcia%2C+E.A.&rft.date=2009&rft.volume=21&rft.issue=9&rft.pages=1263-84&rft_id=info:doi\/10.1109%2FTKDE.2008.239&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CalderonNmap17-28\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CalderonNmap17_28-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Calderon, P. (2017). <i>Nmap: Network Exploration and Security Auditing Cookbook<\/i> (2nd Revised ed.). Packt Publishing. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9781786467454.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Nmap%3A+Network+Exploration+and+Security+Auditing+Cookbook&rft.aulast=Calderon%2C+P.&rft.au=Calderon%2C+P.&rft.date=2017&rft.edition=2nd+Revised&rft.pub=Packt+Publishing&rft.isbn=9781786467454&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MnemonVuln-29\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MnemonVuln_29-0\">29.0<\/a><\/sup> <sup><a href=\"#cite_ref-MnemonVuln_29-1\">29.1<\/a><\/sup> <sup><a href=\"#cite_ref-MnemonVuln_29-2\">29.2<\/a><\/sup> <sup><a href=\"#cite_ref-MnemonVuln_29-3\">29.3<\/a><\/sup> <sup><a href=\"#cite_ref-MnemonVuln_29-4\">29.4<\/a><\/sup> <sup><a href=\"#cite_ref-MnemonVuln_29-5\">29.5<\/a><\/sup> <sup><a href=\"#cite_ref-MnemonVuln_29-6\">29.6<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Mnemon, E.; Soullie, A.; Torrents, A. et al.. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.rapid7.com\/db\/modules\/auxiliary\/scanner\/scada\/modbusclient\" data-key=\"af4486a205a6ff35844346deafc65156\">\"Vulnerability & Exploit Database\"<\/a>. Rapid7 LLC<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.rapid7.com\/db\/modules\/auxiliary\/scanner\/scada\/modbusclient\" data-key=\"af4486a205a6ff35844346deafc65156\">https:\/\/www.rapid7.com\/db\/modules\/auxiliary\/scanner\/scada\/modbusclient<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 30 January 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Vulnerability+%26+Exploit+Database&rft.atitle=&rft.aulast=Mnemon%2C+E.%3B+Soullie%2C+A.%3B+Torrents%2C+A.+et+al.&rft.au=Mnemon%2C+E.%3B+Soullie%2C+A.%3B+Torrents%2C+A.+et+al.&rft.pub=Rapid7+LLC&rft_id=https%3A%2F%2Fwww.rapid7.com%2Fdb%2Fmodules%2Fauxiliary%2Fscanner%2Fscada%2Fmodbusclient&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Wireshark-30\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-Wireshark_30-0\">30.0<\/a><\/sup> <sup><a href=\"#cite_ref-Wireshark_30-1\">30.1<\/a><\/sup> <sup><a href=\"#cite_ref-Wireshark_30-2\">30.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.wireshark.org\/\" data-key=\"5873aa400c3f681206e958c37e6d907d\">\"Wireshark\"<\/a>. Wireshark Foundation<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.wireshark.org\/\" data-key=\"5873aa400c3f681206e958c37e6d907d\">https:\/\/www.wireshark.org\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 20 October 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Wireshark&rft.atitle=&rft.pub=Wireshark+Foundation&rft_id=https%3A%2F%2Fwww.wireshark.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Argus-31\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Argus_31-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/qosient.com\/argus\/\" data-key=\"400b7ad0b54b2a1fc5b06d51ae0bcdce\">\"Argus\"<\/a>. QoSient, LLC<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/qosient.com\/argus\/\" data-key=\"400b7ad0b54b2a1fc5b06d51ae0bcdce\">https:\/\/qosient.com\/argus\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 10 November 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Argus&rft.atitle=&rft.pub=QoSient%2C+LLC&rft_id=https%3A%2F%2Fqosient.com%2Fargus%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MantereNetwork13-32\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MantereNetwork13_32-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Mantere, M., Sailio, M., Noponen, S. (2013). \"Network Traffic Features for Anomaly Detection in Specific Industrial Control System Network\". <i>Future Internet<\/i> <b>5<\/b> (4): 460\u201373. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.3390%2Ffi5040460\" data-key=\"19034a303ab8d76392bac97cc1225136\">10.3390\/fi5040460<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Network+Traffic+Features+for+Anomaly+Detection+in+Specific+Industrial+Control+System+Network&rft.jtitle=Future+Internet&rft.aulast=Mantere%2C+M.%2C+Sailio%2C+M.%2C+Noponen%2C+S.&rft.au=Mantere%2C+M.%2C+Sailio%2C+M.%2C+Noponen%2C+S.&rft.date=2013&rft.volume=5&rft.issue=4&rft.pages=460%E2%80%9373&rft_id=info:doi\/10.3390%2Ffi5040460&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SalmanMachine17-33\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SalmanMachine17_33-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Salman, T.; Bhamare, D.; Erbad, A. et al.. \"Machine Learning for Anomaly Detection and Categorization in Multi-Cloud Environments\". <i>Proceedings from the 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing<\/i>: 97\u2013103. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FCSCloud.2017.15\" data-key=\"9869e1880af5a9d2cfbcb1a234f6eff3\">10.1109\/CSCloud.2017.15<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Machine+Learning+for+Anomaly+Detection+and+Categorization+in+Multi-Cloud+Environments&rft.jtitle=Proceedings+from+the+2017+IEEE+4th+International+Conference+on+Cyber+Security+and+Cloud+Computing&rft.aulast=Salman%2C+T.%3B+Bhamare%2C+D.%3B+Erbad%2C+A.+et+al.&rft.au=Salman%2C+T.%3B+Bhamare%2C+D.%3B+Erbad%2C+A.+et+al.&rft.pages=97%E2%80%93103&rft_id=info:doi\/10.1109%2FCSCloud.2017.15&rfr_id=info:sid\/en.wikipedia.org:Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added. The Buda <i>et al.<\/i> article cited in the original has since been published fully, and the citation here is updated to reflect that.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185645\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.858 seconds\nReal time usage: 1.744 seconds\nPreprocessor visited node count: 24083\/1000000\nPreprocessor generated node count: 38283\/1000000\nPost\u2010expand include size: 155286\/2097152 bytes\nTemplate argument size: 55801\/2097152 bytes\nHighest expansion depth: 15\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 753.978 1 - -total\n 84.44% 636.644 1 - Template:Reflist\n 73.19% 551.822 33 - Template:Citation\/core\n 50.40% 379.983 20 - Template:Cite_journal\n 21.75% 164.008 11 - Template:Cite_web\n 8.34% 62.884 1 - Template:Infobox_journal_article\n 7.99% 60.223 1 - Template:Infobox\n 6.33% 47.737 2 - Template:Cite_book\n 4.79% 36.116 23 - Template:Citation\/identifier\n 4.64% 35.003 80 - Template:Infobox\/row\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10952-0!*!0!!en!5!*!math=5 and timestamp 20190401185643 and revision id 35217\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach\">https:\/\/www.limswiki.org\/index.php\/Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","d400aae80e71d72278a98ceb5a2237dd_images":["https:\/\/www.limswiki.org\/images\/c\/c3\/Fig1_Teixeira_FutureInternet2018_10-8.png","https:\/\/www.limswiki.org\/images\/8\/84\/Fig2_Teixeira_FutureInternet2018_10-8.png","https:\/\/www.limswiki.org\/images\/f\/f4\/Fig3_Teixeira_FutureInternet2018_10-8.png","https:\/\/www.limswiki.org\/images\/9\/9f\/Fig4_Teixeira_FutureInternet2018_10-8.png","https:\/\/www.limswiki.org\/images\/2\/2e\/Fig5_Teixeira_FutureInternet2018_10-8.png","https:\/\/www.limswiki.org\/images\/6\/6a\/Fig6_Teixeira_FutureInternet2018_10-8.png","https:\/\/www.limswiki.org\/images\/6\/63\/Fig7_Teixeira_FutureInternet2018_10-8.png","https:\/\/www.limswiki.org\/images\/7\/7a\/Fig8_Teixeira_FutureInternet2018_10-8.png","https:\/\/www.limswiki.org\/images\/4\/46\/Fig9_Teixeira_FutureInternet2018_10-8.png","https:\/\/www.limswiki.org\/images\/2\/29\/Fig10_Teixeira_FutureInternet2018_10-8.png"],"d400aae80e71d72278a98ceb5a2237dd_timestamp":1554145003,"af5b38e70b68468e6df8188586e739da_type":"article","af5b38e70b68468e6df8188586e739da_title":"Security architecture and protocol for trust verifications regarding the integrity of files stored in cloud services (Pinheiro et al. 2018)","af5b38e70b68468e6df8188586e739da_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services","af5b38e70b68468e6df8188586e739da_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Security architecture and protocol for trust verifications regarding the integrity of files stored in cloud services\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nSecurity architecture and protocol for trust verifications regarding the integrity of files stored in cloud servicesJournal\n \nSensorsAuthor(s)\n \nPinheiro, Alexandre; Canedo, Edna Dias; De Sousa Junior, Rafael Timoteo;,\r\nDe Oliveira Albuquerque, Robson; Villalba, Luis Javier Garcia; Kim, Tai-HoonAuthor affiliation(s)\n \nUniversity of Bras\u00edlia, Universidad Complutense de Madrid, Sungshin Women\u2019s UniversityPrimary contact\n \nEmail: javiergv at fdi dot ucm dot esYear published\n \n2018Volume and issue\n \n18(3)Page(s)\n \n753DOI\n \n10.3390\/s18030753ISSN\n \n1999-5903Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttps:\/\/www.mdpi.com\/1424-8220\/18\/3\/753\/htmDownload\n \nhttps:\/\/www.mdpi.com\/1424-8220\/18\/3\/753\/pdf (PDF)\n\nContents\n\n1 Abstract \n2 Introduction \n3 Background \n\n3.1 Trust \n3.2 Encryption \n3.3 Hashes \n\n\n4 Related work \n\n4.1 Computational trust applications \n4.2 Integrity verification and privacy guarantee \n4.3 Management and monitoring of CSS \n\n\n5 Proposed architecture and protocol \n\n5.1 The proposed protocol \n5.2 Trust level classification process \n5.3 Freshness of the trust verification process \n5.4 Variation of the trust level assigned to the cloud storage service \n\n\n6 Architecture implementation \n\n6.1 Client application \n6.2 Integrity checking service application \n6.3 Cloud storage service application \n\n\n7 Experimental validation \n\n7.1 Experimental setup \n7.2 Submission of files \n7.3 Network bandwidth consumption \n7.4 File integrity checking by the ICS \n7.5 Simulation of file storage faults \n\n\n8 Discussion \n9 Conclusion and future work \n10 Acknowledgements \n\n10.1 Author contributions \n10.2 Conflicts of interest \n\n\n11 References \n12 Notes \n\n\n\nAbstract \nCloud computing is considered an interesting paradigm due to its scalability, availability, and virtually unlimited storage capacity. However, it is challenging to organize a cloud storage service (CSS) that is safe from the client point-of-view and to implement this CSS in public clouds since it is not advisable to blindly consider this configuration as fully trustworthy. Ideally, owners of large amounts of data should trust their data to be in the cloud for a long period of time, without the burden of keeping copies of the original data, nor of accessing the whole content for verification regarding data preservation. Due to these requirements, integrity, availability, privacy, and trust are still challenging issues for the adoption of cloud storage services, especially when losing or leaking information can bring significant damage, be it legal or business-related. With such concerns in mind, this paper proposes an architecture for periodically monitoring both the information stored in the cloud and the service provider behavior. The architecture operates with a proposed protocol based on trust and encryption concepts to ensure cloud data integrity without compromising confidentiality and without overloading storage services. Extensive tests and simulations of the proposed architecture and protocol validate their functional behavior and performance.\nKeywords: cloud computing; cloud data storage; proof of integrity; services monitoring; trust\n\nIntroduction \nCompanies, institutions, and government agencies generate large amounts of digital information every day, such as documents, projects, and transaction records. For legal or business reasons, this information needs to remain stored for long periods of time.\nDue to the popularization of cloud computing (CC), its cost reduction, and an ever-growing supply of cloud storage services (CSS), many companies are choosing these services to store their sensitive information. Cloud computing\u2019s advantages include scalability, availability, and virtually unlimited storage capacity. However, it is a challenge to build safe storage services, mainly when these services run in public cloud infrastructures and are managed by service providers under conditions that are not fully trustworthy.\nData owners often need to keep their stored data for a long time, though it is possible that they rarely will have to access it. Furthermore, some data could be stored in a CSS without its owner having to keep the original copy. However, in these situations, the storage service reliability must be considered, because even the best services sometimes fail[1], and since the loss of these data or their leakage can bring significant business or legal damage, the issues of integrity, availability, privacy, and trust need to be answered before the adoption of the CSS.\nData integrity is defined as the accuracy and consistency of stored data. These two properties indicate that the data have not changed and have not been broken.[2] Moreover, besides data integrity, a considerable number of organizations consider both confidentiality and privacy requirements as the main obstacles to the acceptance of public cloud services.[2] Hence, to fulfill these requirements, a CSS should provide mechanisms to confirm data integrity, while still ensuring user privacy and data confidentiality.\nConsidering these requirements, this paper proposes an architecture for periodically monitoring both the information stored in the cloud infrastructure and the contracted storage service behavior. The architecture is based on the operation of a proposed protocol that uses a third party and applies trust and encryption means to verify both the existence and the integrity of data stored in the cloud infrastructure without compromising these data\u2019s confidentiality. Furthermore, the protocol was designed to minimize the overload that it imposes on the cloud storage service.\nTo validate the proposed architecture and its supporting protocol, a corresponding prototype was developed and implemented. Then, this prototype was submitted to testing and simulations by means of which we verified its functional characteristics and its performance.\nThis paper addresses all of this and is structured as follows. The \"Background\" section reviews the concepts and definitions of cloud computing, encryption, and trust, then we present works related to data integrity in the cloud. Then we describe the proposed architecture, while its implementation is discussed in the following section. Afterwards, the \"Experimental validation\" section is devoted to the experiments and respective results, while the main differences between related works and the proposed architecture follow it. The paper ends with our conclusions and outlines future works.\n\nBackground \nCloud computing (CC) is a model that allows convenient and on-demand network access to a shared set of configurable computational resources. These resources can be quickly provisioned with minimal management effort and without the service provider\u2019s intervention.[3] Since it constitutes a flexible and reliable computing environment, CC is being gradually adopted in different business scenarios using several available supporting solutions.\nRelying on different technologies (e.g., virtualization, utility computing, grid computing, and service-oriented architecture) and proposing a new computational services paradigm, CC requires high-level management activities, which include: (a) selection of the service provider, (b) selection of virtualization technology, (c) virtual resources\u2019 allocation, and (d) monitoring and auditing procedures to comply with service level agreements (SLAs).[4]\nA particular CC solution comprises several components such as client modules, data centers, and distributed servers. These elements form the three parts of the cloud solution[4][5], each one with a specific purpose and specific role in delivering working applications based on the cloud.\nThe CC architecture is basically structured into two main layers: a lower and a higher resource layer, each one dealing with a particular aspect of making application resources available. The lower layer comprises the physical infrastructure, and it is responsible for the virtualization of storage and computational resources. The higher layer provides specific services, such as software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). Each of these layers may have its own management and monitoring systems, independent of one another, thus improving flexibility, reuse, and scalability.[6][7]\nSince CC provides access to a shared pool of configurable computing resources, its provisioning mode can be classified by the intended access methods and coverage of services\u2019 availability, which yields different models of CC services\u2019 deployment, ranging from private clouds, in which resources are shared within an owner organization, to public clouds, in which cloud providers possess the resources that are consumed by other organizations based on contracts, but also including hybrid cloud environments and community clouds.[8]\nThe central concept of this paper\u2019s proposal is the verification by the cloud service user that a particular property, in our case the integrity of files, is fulfilled by the cloud service provider, regardless of the mode of a service's provision and deployment, either in the form of private, public, or hybrid clouds.\nThe verification of file integrity is performed by means of a protocol that uses contemporaneous computational encryption, specifically public key encryption and hashes, which together provide authentication of messages and compact integrity verification sequences that are unequivocally bound to each verified file (signed file hashes).\nThis proposed protocol is conceived to allow the user of cloud services to check whether the services provider is indeed acting as expected in regard to maintaining the integrity of the user files, which corresponds to the idea of the user monitoring the provider to acquire and maintain trust in the provider behavior in this circumstance.\nSome specific aspects of trust, encryption, and hashes that are considered as useful for this paper\u2019s comprehension are briefly reviewed in the subsections below.\n\nTrust \nTrust is a common reasoning process for humans to face the world\u2019s complexities and to think sensibly about everyday life possibilities. Trust is strongly linked to expectations about something, which implies a degree of uncertainty and optimism. It is the choice of putting something in another\u2019s hands, considering the other\u2019s behavior to determine how to act in a given situation.[9]\nTrust can be considered as a particular level of subjective probability in which an agent believes that another agent will perform a certain action, which is subject to monitoring.[10] Furthermore, trust can be represented as an opinion so that situations involving trust and trust relationships can be modeled. Thus, positive and negative feedback on a specific entity can be accumulated and used to calculate its future behavior.[11] This opinion may result from direct experience or may come from a recommendation from another entity.[12]\nAccording to Adnane et al.[13] and De Sousa, Jr. and Puttini[14], trust, trust models, and trust management have been the subject of various research works demonstrating that the conceptualization of computational trust allows a computing entity to reason with and about trust, and to make decisions regarding other entities. Indeed, since the initial works on the subject by the likes of Marsh[9] and Yahalom et al.[15], computational trust is recognized as an important aspect for decision-making in distributed and auto-organized applications, and its expression allows formalizing and clarifying trust aspects in communication protocols.\nYahalom et al.[15], for instance, find the notion of \"trust\" to mean that if an entity A trusts an entity B in some respect, this means that A believes that B will behave in a certain way and will perform some action under certain specific circumstances. This leads to the possibility of conducting a protocol operation (action) that is evaluated by the entity A on the basis of what A knows about the entity B and the circumstances of the operation. This accurately corresponds to the protocol relationship established between a CC service consumer and a CC service provider, which is the focus of the present paper.\nThus, in our proposal, trust is used in the context of a cloud computing service as a means to verify specific actions performed by the participating entities in this context. Using the definitions by Yahalom et al.[15] and Grandison and Sloman[16], we can state that in a CC service, one entity, the CC service consumer, may trust another one, the CC service provider, for actions such as providing identification to the other entity, not interfering in the other entity sessions, neither passively by reading secret messages, nor actively by impersonating other parties. Furthermore, the CC service provider will grant access to resources or services, as well as make decisions on behalf of the other entity, with respect to a resource or service that this entity owns or controls.\nIn these trust verifications, it is required to ensure some properties such as the secrecy and integrity of stored files, authentication of message sources, and the freshness of the presented proofs, avoiding proof replays. It is required as well to present reduced overhead in cloud computing protocol operations and services. In our proposal, these requirements are fulfilled with modern robust public key encryption involving hashes, as discussed hereafter, considering that these means are adequately and easily deployed in current CC service provider and consumer situations.\n\nEncryption \nEncryption is a process of converting (or ciphering) a plaintext message into a ciphertext that can be deciphered back to the original message. An encryption algorithm, along with one or more keys, is used either in the encryption or the decryption operation.\nThe number, type, and length of the keys used depend on the encryption algorithm, the choice of which is a consequence of the security level needed. In conventional symmetric encryption, a single key is used, and with this key the sender can encrypt a message, and a recipient can decrypt the ciphered message. However, key security becomes an issue since at least two copies of the key exist, one at the sender and another at the recipient.\nOppositely, in asymmetric encryption, the encryption key and the decryption key are correlated, but different, one being a public key of the recipient that can be used by the sender to encrypt the message, while the other related key is a recipient private key allowing the recipient to decrypt the message.[17] The private key can be used by its owner to send messages that are considered signed by the owner since every entity can use the corresponding public key to verify if a message comes from the owner of the private key.\nThese properties of asymmetric encryption are useful for the trust verifications that in our proposal are designed for checking the integrity of files stored in cloud services. Indeed, our proposal uses encryption of hashes as the principal means to fulfill the trust requirements in these operations.\n\nHashes \nA hash value, hash code, or simply hash is the result of applying a mathematical one-way function that takes a string of any size as the data source and returns a relatively small and fixed-length string. A modification of any bit in the source string dramatically alters the resulting hash code after executing the hash function.[18] These one-way functions are designed to make it very difficult to deduce from a hash value the source string that was used to calculate this hash. Furthermore, it is required that it should be extremely difficult to find two source strings whose hash codes are the same, i.e., a hash collision.\nOver the years, many cryptographic algorithms have been developed for hashes, for which the Message-Digest algorithm 5 (MD5) and Secure Hash Algorithm (SHA) family of algorithms can be highlighted, due to the wide use of these algorithms in the most diverse information security software packages. MD5 is a very fast cryptographic algorithm that receives as input a random-sized message and produces as output a fixed length hash with 128 bits.[19]\nThe SHA family is composed of algorithms named as SHA-1, SHA-256, and SHA-512, which differ regarding the respective security level and the output hash length, that can vary from 160 to 512 bits. The SHA-3 algorithm was chosen by the National Institute of Standards and Technology (NIST) in an international competition that aimed to replace all of the SHA family of algorithms.[20]\nThe Blake2 algorithm is an improved version of the hash cryptographic algorithm called \u201cBlake,\u201d a finalist of the SHA-3 selection competition that is optimized for software applications. Blake2 can generate hash values from eight to 512 bits. The main Blake2 characteristics are: the memory consumption reduction by 32% compared to other SHA algorithms, the processing speed being greater than that of MD5 on 64-bit platforms, direct parallelism support without overhead, and faster hash generation on multicore processors.[21] In our proposed validation prototype, the Blake2 algorithm was considered as a good choice due to its combined characteristics of speed, security, and simplicity.\n\nRelated work \nThis section presents a brief review of papers regarding the themes of computational trust applications, privacy guarantees, data integrity verification, services management, and monitoring, all of them applicable to cloud computing environments.\n\nComputational trust applications \nDepending on the used approach, trust can either be directly measured by one entity based on its own experiences or can be evaluated through the use of third-party opinions and recommendations.\nTahta et al.[22] propose a trust model for peer-to-peer (P2P) systems called \u201cGenTrust,\u201d in which genetic algorithms are used to recognize several types of attacks and to help a well-behaved node find other trusted nodes. GenTrust uses extracted features (number of interactions, number of successful interactions, the average size of downloaded files, the average time between two interactions, etc.) that result from a node\u2019s own interactions. However, when there is not enough information for a node to consider, recommendations from other nodes are used. Then, the genetic algorithm selects which characteristics, when evaluated together and in a given context, present the best result to identify the most trustful nodes.\nAnother approach is presented by Gholami and Arani[23], proposing a trust model named \u201cTurnaround_Trust\u201d aimed at helping clients to find cloud services that can serve them based on service quality requirements. The Turnaround_Trust model considers service quality criteria such as cost, response time, bandwidth, and processor speed, to select the most trustful service among those available in the cloud.\nOur approach in this paper differs from these related works since we use trust metrics that are directly related to the stored files in CC and that are paired to the cryptographic proof of these files' integrity.\nCanedo[24] bases the proposed trust model on concepts such as direct trust, trust recommendation, indirect trust, situational trust, and reputation to allow a node selection for trustful file exchange in a private cloud. For the sake of trust calculation, the processing capacity of a node, its storage capacity, and operating system\u2014as well as the link capacity\u2014are adopted as trust metrics that compose a set representative of the node availability. Concerning reputation, the calculation considers the satisfactory and unsatisfactory experiences with the referred node informed by other nodes. The proposed model calculates trust and reputation scores for a node based on previously-collected information, i.e., either information requested from other nodes in the network or information that is directly collected from interactions with the node being evaluated.\nIn the present paper, our approach is applied to both private and public CC services, with the development of the necessary architecture and secure protocol for trust verification regarding the integrity of files in CC services.\n\nIntegrity verification and privacy guarantee \nIn their effort to guarantee the integrity of data stored in cloud services, many research works present proposals in the domain analyzed in this paper.\nA protocol is proposed by Juels and Kaliski, Jr.[25] to enable a cloud storage service to prove that a file subjected to verification is not corrupted. To that end, a formal and secure definition of proof of retrievability is presented, and the paper introduces the use of sentinels, which are special blocks hidden in the original file prior to encryption to be afterward used to challenge the cloud service. Based on Juels and Kaliski, Jr.'s work[25], Kumar and Saxena[26] present another scheme where one does not need to encrypt all the data, but only a few bits per data block.\nGeorge and Sabitha[27] propose a bipartite solution to improve privacy and integrity. The first part, called \u201canonymization,\u201d initially recognizes fields in records that could identify their owners and then uses techniques such as generalization, suppression, obfuscation, and the addition of anonymous records to enhance data privacy. The second part, called \u201cintegrity checking,\u201d uses public and private key encryption techniques to generate a tag for each record on a table. Both parts are executed with the help of a trusted third party called the \u201cenclave\u201d that saves all generated data that will be used by the de-anonymization and integrity verification processes.\nAn encryption-based integrity verification method is proposed by Kavuri et al.[28] The proposed method uses a new hash algorithm, the dynamic user policy-based hash algorithm, to calculate hashes of data for each authorized cloud user. For data encryption, an improved attribute-based encryption algorithm is used. The encrypted data and corresponding hash value are saved separately in cloud storage. Data integrity can be verified only by an authorized user and requires the retrieval of all the encrypted data and corresponding hash.\nAl-Jaberi and Zainal[29] provide another proposal to simultaneously achieve data integrity verification and privacy-preserving, which proposes the use of two encryption algorithms for every data upload or download transaction. The Advanced Encryption Standard (AES) algorithm is used to encrypt client data, which will be saved in a CSS, and an RSA-based partial homomorphic encryption technique is used to encrypt AES encryption keys that will be saved in a third-party entity together with a hash of the file. Data integrity is verified only when a client downloads one file.\nKai et al.[30] propose a data integrity auditing protocol to allow the fast identification of corrupted data using homomorphic cipher-text verification and a recoverable coding methodology. Checking the integrity of outsourced data is done periodically by either a trusted or untrusted entity. The adopted methodology aims at reducing the total auditing time and the communication cost.\nThe work of Wang et al.[31] presents a security model for public verification and assurance of stored file correctness that supports dynamic data operation. The model guarantees that no challenged file blocks should be retrieved by the verifier during the verification process, and no state information should be stored at the verifier side between audits. A Merkle hash tree (MHT) is used to save the authentic data value hashes, and both the values and positions of data blocks are authenticated by the verifier.\nOur proposal in this paper differs from these described proposals since we introduce the idea of trust resulting from file integrity verification as an aggregate concept to evaluate the long-term behavior of a CSS and including most of the requirements specified in these other proposals, such as hashes of file blocks, freshness of verifications, and integrated support for auditing by an independent party. Further discussion on theses differences is presented in the \"Discussion\" section based on the results coming from the validation of our proposal.\n\nManagement and monitoring of CSS \nSome other research works were reviewed since their purpose is to provide management tools to ensure better use of the services offered by CSS providers, as well as monitoring functions regarding the quality of these services, thus allowing one to generate a ranking of these providers.\nPflanzner et al.[32] present an approach to autonomous data management within CSS. This approach proposes a high-level service that helps users to better manage data distributed in multiple CSS. The proposed solution is composed of a framework that consists of three components named MeasureTool, DistributeTool, and CollectTool. Each component is respectively responsible for performing monitoring processes for measuring the performance, splitting, and distributing file chunks between different CSS and retrieving split parts of a required file. Both historical performance and latest performance values are used for CSS selection and to define the number of file chunks that will be stored in each CSS.\nFurthermore, they propose the use of cloud infrastructure services to execute applications on mobile data stored in CSS.[32] In this proposal, the services for data management are run in one or more IaaS systems that keep track of the user storage area in CSS and execute the data manipulation processes when new files appear. The service running on an IaaS cloud downloads the user data files from the CSS, executes the necessary application on these files, and uploads the modified data to the CSS. This approach permits overcoming the computing capacity limitations of mobile devices.\nThe quality of services (QoS) provided by some commercial CSS is analyzed by Gracia-Tinedo et al.[33] For this, a measurement study is presented where important aspects such as transfer speed (upload\/download), behavior according to client geographic location, failure rate, and service variability related to file size, time, and account load are broadly explored. To perform the measurement, two platforms are employed, one with homogeneous and dedicated machines and the other with shared and heterogeneous machines distributed in different geographic locations. Furthermore, the measurement is executed using its own CSS REST interfaces, regarding mainly the methods PUT and GET, respectively used to upload and download files. The applied measurement methodology is demonstrated to be efficient and permits one to learn important characteristics about the analyzed CSS.\nOur contributions in this paper comprise the periodic monitoring of files stored in the cloud, performed by an integrity checking service that is defined as an abstract role so that it can operate independently either the CSS provider or its consumer, preserving the privacy of stored file contents, and operating according to a new verification protocol. Both the tripartite architecture and the proposed protocol are described hereafter in this paper.\n\nProposed architecture and protocol \nThis section presents the proposed architecture that defines roles that work together to enable periodic monitoring of files stored in the cloud. Furthermore, the companion protocol that regulates how these roles interact with one another is detailed and discussed.\nThe architecture is composed of three roles: (i) Client, (ii) Cloud Storage Service (CSS), and (iii) Integrity Check Service (ICS). The Client represents the owner of files that will be stored by the cloud provider and is responsible for generating the needed information that is stored specifically for the purpose of file integrity monitoring. The CSS role represents the entity responsible for receiving and storing the client\u2019s files, as well as receiving and responding to challenges regarding file integrity that come from the ICS role. The ICS interfaces either with the Client of the CSS, so it acts as the responsible role for information regarding the Client files that are stored by the CSS and uses this information to constantly monitor the Client files\u2019 integrity by submitting challenges to the CSS and later validating the responses of the CSS to each verification challenge.\n\nThe proposed protocol \nThe trust-oriented protocol for continuous monitoring of stored files in the cloud (TOPMCloud) was initially proposed by Pinheiro et al.[34][35] Then, it was further developed and tested, giving way to the results presented in this paper.\nThe TOPMCloud objective is to make the utilization of an outsourced service possible to allow clients to constantly monitor the integrity of their stored files in CSS without having to keep original file copies or revealing the contents of these files.\nFrom another point of view, the primary requirement for the proposed TOPMCloud is to prevent the CSS provider from offering to and charging a client for a storage service that in practice is not being provided. Complementary requirements comprise low bandwidth consumption, minimal CSS overloading, rapid identification of a misbehaving service, strong defenses against fraud, stored data confidentiality, and utmost predictability for the ICS.\nTo respond to the specified requirements, TOPMCloud is designed with two distinct and correlated execution processes that are shown together in Figure 1. The first one is called \u201cFile Storage Process\u201d and runs on demand from the Client that is this process starting entity. The second is the \u201cVerification Process,\u201d which is instantiated by an ICS and is continuously executed to verify a CSS. An ICS can simultaneously verify more than one CSS by means of parallel instances of the Verification Process.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 1 Trust-oriented protocol for continuous monitoring of stored files in the cloud (TOPMCloud) processes\n\n\n\nThe File Storage Process starts in the Client with the encryption of the file to be stored in the CSS. This first step, which is performed under the control of the file owner, is followed by the division of the encrypted file into 4096 chunks. These chunks are randomly permuted and are selected to be grouped into data blocks, each one with 16 distinct file chunks, and the position or address of each chunk is memorized. Then, hashes are generated from these data blocks. Each hash together with the set of its respective chunk addresses are used to build a data structure named the Information Table, which is sent to the ICS.\nThe selection and distribution of chunks used to assemble the data blocks are done in cycles. The number of cycles will vary according to the file storage period. Each cycle generates 256 data blocks without repeating chunks. The data blocks generated in each cycle contain all of the chunks of the encrypted file (256 * 16 = 4096).\nThe chosen values 4096, 16, and 256 come from a compromise involving the analysis of the protocol in the next subsections and the experimental evaluation that is presented in the \"Experimental validation\" section of this paper. Therefore, these values represent choices that were made considering the freshness of the information regarding the trust credited to a CSS, the time for the whole architecture to react to file storage faults, the required number of verifications to hold the trust in a CSS for a certain period of time, as well as the expected performance and the optimization of computational resources and network capacity consumption. The chosen values are indeed parameters in our prototype code, so they can evolve if the protocol requirements change.\nThe Verification Process in the ICS starts with the computation of how many files should be verified and how many challenges should be sent to a CSS, both numbers being calculated according to the trust level assigned to the CSS. Each stored hash and its corresponding chunk addresses will be used only once by the ICS to send an integrity verification challenge to the CSS provider.\nIn the CSS, the stored file will be used to respond to the challenges coming from the ICS. On receiving a challenge with a set of chunk addresses, the CSS reads the chunks from the stored file, assembles the data block, generates a hash from this data block, and sends this hash as the challenge answer to the ICS.\nTo finalize the verification by the ICS, the hash coming in the challenge answer is compared to the original file hash, and the result activates the trust level classification process. For this process, if the compared hashes are equal, this means that the verified content chunks are intact in the stored file in the CSS.\n\nTrust level classification process \nThe trust level is evaluated as a real value in the range (\u22121, +1), with values from \u22121, meaning the most untrustful, to +1, meaning the most trustful, thus constituting the classification level that is attributed by the ICS to the CSS provider.\nIn the ICS, whenever a file hash verification process fails, the trust level of the verified CSS is downgraded, according to the following rules: when the current trust level value is greater than zero, it is set to zero (the ICS reacts quickly to a misbehavior from a CSS that was considered up to the moment as trustful); when the trust value is in the range between zero and \u22120.5, it is reduced by 15%; otherwise, the ICS calculates the value of 2.5% from the difference between the current trust level value and \u22121, and the result is subtracted from the trust level value (the ICS continuously downgrades a CSS that is still considered untrustful). These calculations are shown in Algorithm 1.\n\r\n\n\n\n\n\n\n\n\n\n\n Alg. 1 Pseudocode for computing the TrustLevel in the case of hash verification failures\n\n\n\nConversely, whenever a checking cycle is completed without failures (all data blocks of a file have been checked without errors), the trust level assigned to a CSS is raised. If the current trust level value is less than 0.5, then the trust level value is raised by 2.5%. Otherwise, the ICS calculates the value of 0.5% from the difference between one and the current trust level value, and the result is added to the trust level value. These calculations are shown in Algorithm 2. This means that initially we softly redeem an untrustful CSS, while we exponentially upgrade a redeemed CSS and a still trustful CSS.\n\r\n\n\n\n\n\n\n\n\n\n\n Alg. 2 Pseudocode for computing the TrustLevel in the case of hash verification failures\n\n\n\nAgain, these chosen thresholds and downgrading\/upgrading values come from the experimental evaluation that is presented in the \"Experimental validation\" section, based on performance and applicability criteria. They are indeed parameters in our prototype code, so they can evolve if the protocol requirements change.\n\nFreshness of the trust verification process \nSince it is important to update the perception that a Client has about a CCS provider, the observed values of trust regarding a CSS are also used to determine the rhythm or intensity of verifications to be performed for this CSS.\nThus, the freshness of results from the trust verification process is assured by updating in the ICS the minimum percentage values of the number of stored files to be verified in a CSS, as well as the minimum percentages of data blocks that should be checked. We choose to present these updates by day, though again, this is a parameter in our implemented prototype.\nConsequently, according to the observed trust level for a CSS, the number of files and the percentage of these file contents checked in this CSS are set as specified in Table 1. In this table, the extreme values one and \u22121 should respectively represent blind trust and complete distrust, but they are not considered as valid for our classification purposes, since we expect trust to be an ever-changing variable, including the idea of redemption.\n\r\n\n\n\n\n\n\n\n\n\n\n Table 1 Classification of the trust levels for updating purposes\n\n\n\nWhenever the trust value equals zero, as a means to have a decidable system, a fixed value must be artificially assigned to it to preserve the dynamics of evaluations. Thus, if the last verified result is a positive assessment, the value +0.1 is assigned to the observed trust; otherwise, if a verification fault has been observed, the assigned value is \u22120.1.\n\nVariation of the trust level assigned to the cloud storage service \nAccording to the TOPMCloud definition, the trust level assigned to a CSS always grows when a file-checking cycle is finished without the ICS detecting any verification failures during this cycle. Considering this rule, the first simulations regarding the evolution of trust in the ICS were used to determine the maximum number of days needed for the ICS to finish a checking cycle for a file stored in a CSS. The conclusion of a checking cycle indicates that each of the 4096 file chunks was validated as a part of one of the data blocks that are checked by means of the 256 challenges submitted by the ICS to the CSS.\nThe projected time for our algorithm to finish a file-checking cycle can vary between a minimum and a maximum value depending on the number of files simultaneously monitored by the ICS on a CSS. However, the checked file size should not significantly influence this time because the daily number of checked data blocks on a file is a percentage of the file size, as defined previously in Table 1.\nBy means of mathematical calculations, it is possible to determine that in a CSS classified with a \u201cvery high distrust\u201d level, i.e., the worst trust level, the maximum time to finish a checking cycle is 38 days. Comparatively, in a CSS classified with a \u201cvery high trust\u201d level, i.e., the best trust level, the time to finish a checking cycle can reach 1792 days. Figure 2 shows the maximum and the minimum number of days required to finish a file-checking cycle for each trust level proposed in TOPMCloud.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 2 Time required to complete a file-checking cycle\n\n\n\nNotwithstanding the mathematical calculations regarding the proposed protocol\u2019s maximum time required to finish a file-checking cycle, it is noticeable that this time can increase if the ICS or the CSS servers do not have enough computational capacity to respectively generate or to answer the necessary protocol challenges for each day. Furthermore, the file-checking cycle depends on the available network bandwidth and can worsen if the network does not support the generated packet traffic. This situation can occur when the number of CSS stored files is very large.\nThe variation of the time to conclude the checking cycle, according to the trust level assigned to the CSS, comes from the different number of data blocks verified per day. This variation aims to reward cloud storage services that historically have no faults, thus minimizing the consumption of resources such as processing capacity and network bandwidth. Moreover, this feature allows our proposed architecture to prioritize the checking of files that are stored in CSS providers, which have already presented faults. Consequently, this feature reduces the requested time to determine if other files were lost or corrupted.\nAnother interesting characteristic of the proposed protocol was analyzed with calculations that were realized to determine the number of file cycles concluded without identifying any fault so that the trust level assigned to a CSS raises to the highest trust level foreseen in Table 1, \u201cvery high trust.\u201d Figure 3 presents the results of this analysis using as a starting point the \u201cnot evaluated\u201d situation, which corresponds to a trust level equal to zero assigned to a CSS.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 3 Expected best performing trust level evolution for a CSS.\n\n\n\nFrom the analysis of the results shown in Figure 2 and Figure 3, it could be concluded that the requested time for a CSS to obtain the maximum trust level is so large that it will be practically impossible to reach this level. This conclusion is easily obtained using the maximum number of days needed to finish a checking cycle for the \u201chigh trust\u201d level (896) multiplied by the number of successfully concluded cycles to reach the level of \u201cvery high trust\u201d (384 \u2212202 = 182). The result of this calculation is 163.072 days (182 * 896), which is approximately 453 years.\nAlthough this is mathematically correct, in practice, this situation would never occur. The simple explanation for this fact is related to the number of files that have been simultaneously monitored by the ICS in the CSS. The maximum expected time for the file-checking cycle conclusion only occurs when the number of monitored files in a CSS, classified with the level \u201chigh trust,\u201d is equal to 25 or a multiple of this value. According to Table 1, this is due to the fact that, at the \u201chigh trust\u201d level, it is required that 16% of the file content should be checked by day. The maximum time spent in file checking only occurs when the result of this file percentage calculation is equal to an integer value. Otherwise, the result is rounded up, thus increasing the percentage of files effectively checked.\nIndeed, if the ICS is monitoring exactly 25 files in a CSS that is classified with the \u201chigh trust,\u201d level and supposing that these files were submitted to CSS in the same day, the checking cycles for this set of files will finish in 896 days. Since in a period of 896 days, there are 25 concluded cycles, then about 20 years are needed for the CSS to attain the 182 cycles requested for reaching the next level, \u201cvery high trust.\u201d However, this situation worsens if the number of considered files decreases. For instance, considering the \u201chigh trust\u201d level, if there are only six files being monitored, then the time to attain the next level exceeds 65 years.\nFigure 4 presents a comparative view of the time required to upgrade to the next trust level according to the number of monitored files. In general, less time will be required to increase the trust level if there are more monitored files.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 4 Time to upgrade the trust level according to the number of monitored file\n\n\n\nAs can be seen in Figure 4, the best case is obtained when the number of monitored files is equal to the required number of successfully concluded cycles to upgrade to the next trust level. For this number of files, the time required to increase the trust level is always equal to the time needed to conclude one checking cycle.\nOpposite to the trust level raising curve that reflects a slow and gradual process, the trust level reduction is designed as a very fast process. The trust value assigned to the CSS always decreases when a challenge result indicates a fault in a checked file.\nTo evaluate the proposed process for downgrading the measured trust level, calculations were performed aiming to determine how many file-checking failures are needed for a CSS to reach the maximum distrust level. Any trust level between \u201cvery high trust\u201d and \u201clow trust\u201d could be used as the starting point to these calculations. Then, when a challenge-response failure is identified, the trust value is changed to zero and the CSS is immediately reclassified to the \u201clow distrust\u201d level. From this level to the \u201cvery high distrust\u201d level, the number of file-checking failures required to reach each next distrust level is shown in Figure 5.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 5 Number of file-checking failures needed to downgrade to each distrust level\n\n\n\nSimilarly to the trust level raising process, the required minimum time to downgrade to a distrust level is determined by the number of simultaneously-monitored files. Figure 6 presents a comparative view of the required minimum time to downgrade a CSS considering that all monitored files are corrupted and that failures will be identified upon the ICS receiving the first unsuccessful challenge response from the CSS.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 6 Trust level downgrade according to the number of monitored files\n\n\n\nAn important difference between the process to downgrade the trust level assigned to a CSS and the opposite process to upgrade this trust level is that the downgrade time is preferably presented as a number of days, whereas the upgrade time is preferably presented in years. As shown in Figure 6, the minimum time to downgrade a trust level will be one day when the number of monitored files is equal to or greater than the number of identified challenge failures required to downgrade a trust level according to Figure 5.\n\nArchitecture implementation \nThe implementation of the architecture was organized as a validation project comprising three phases. The first phase was devoted to the processes under the responsibility of the client in our proposal. The second and the third phases were respectively aimed at implementing the processes under the ICS responsibility and the processes under the responsibility of the CSS. Hence, a completely functional prototype was used for the validation of the proposed architecture and protocol.\nIn each implementation phase, one application was developed using Java Enterprise Edition (Java EE) components, such as Java Persistence API (JPA), Enterprise JavaBeans (EJB), Contexts and Dependency Injection (CDI), and Java API for XML Web Services (JAX-WS).[36] A desktop application was developed for the Client, while two web service applications were developed respectively for the ICS and CSS. The chosen application server was Glassfish 4[37], and the chosen database management system (DBMS) was PostgreSQL.[38]\nThese platforms were chosen to take into consideration the distributed characteristics of the proposed architecture and the need for continuous and asynchronous protocol communications between the predicted roles in this architecture. Thus, the choice of Java EEwas determined by its usability for the implementation of web services, task scheduling, event monitoring, and asynchronous calls. Both the Glassfish application server and the PostgreSQL DBMS were chosen because they are open-source applications and fully meet the developed application needs.\n\nClient application \nThe Client application\u2019s main tasks are: to encrypt a file and to send it to one or more CSS, to split and select the file chunks, to assemble the data blocks and to generate their hashes, to group them in cycles, to generate the Information Table, and to send this table to the ICS. In our prototype, the client application also allows one to control the inventory of stored files in each CSS, to store the cryptographic keys, to look for pieces of information about the verification process and the file integrity in each CSS, to retrieve a file from a CSS, confirming its integrity, and deciphering its contents.\nThese functions are accessible in the Client application by means of a graphical interface through which the user selects files from the file system and at least one CSS to store the chosen files, as well as an ICS to verify the storage service used for these files. This same interface allows the client to inform about both the password to be used in the file encryption process and the number of years corresponding to the period to keep the file stored in the CSS. Furthermore, a numerical seed is given to add entropy to the process of choosing the chunks that will compose each data block. Figure 7 shows the application interface with the implemented \u201cUpload File\u201d function.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 7 The client interface showing the \u201cUpload File\u201d function\n\n\n\nIn this prototype, the list of available CSS comes from previous registration activity, by means of which the user first registers the ICS with which it maintains a service level agreement. After an ICS has been selected, the Client application obtains a list of CSS from the selected ICS web service. This initial process requires the client user to maintain some contractual relationship regarding file storage verifications involving the Client application, the registered ICS, and the selected CSS. Then, the Client user is informed about the current trust level assigned by the ICS to each concerned CSS.\nThe file encryption process is performed using resources available in the \u201cjavax.crypto\u201d package[39], using the AES cryptographic algorithm[40], a 256-bit key, and the cipher-block chaining (CBC) operation.[41]\nThe process of sending an encrypted file to a CSS was implemented using Java threads so that it is possible to simultaneously initiate the uploading of the encrypted file to each selected CSS. Furthermore, by means of threads, it is possible to proceed with the next steps in parallel, without the need to wait for a complete file upload, an operation that takes a variable duration according to the file size and the network characteristics.\nThe calculations of either the number of cycles and the chunk distribution is executed according to the number of years that the file must be stored by the CSS and monitored by the ICS. Although the number of used cycles should vary according to the trust level assigned to the CSS, in our validation prototype for these calculations, we choose to consider the worst case value, corresponding to the \u201chigh distrust\u201d level.\nEach chunk address code is obtained through the SHA1PRNG algorithm, a pseudo-random number generator, executed by \u201cSecureRandom,\u201d a class from the \u201cjava.security\u201d package.[39] To produce the hashes, the cryptographic function Blake2[21] was chosen due to its speed, security, and simplicity.\nAlso in our prototype, the monitoring module shown in Figure 8 was developed to provide a practical tool for the user to access functions such as: to manage the inventory of stored files in a set of CSS providers, to check the file status assigned by the ICS according to results from verifications of the CSS, to download files from the CSS, and to decipher the downloaded files.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 8 The monitoring module entry screen\n\n\n\nIn the monitoring module, the \u201cStatus\u201d button shows information such as: the file source folder, the identification hash, the name of the CSS where the file is stored, the name of the ICS responsible for monitoring this CSS, the number of concluded checking cycles, the number of cycles that is currently being performed, the number of data blocks already checked in the current cycle, the ICS monitoring period for the last day, and its current status. Figure 9 shows the file status query screen.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 9 File status query screen\n\n\n\nIntegrity checking service application \nThe ICS implementation comprises a web service application called \u201cVerifierService\u201d that presents the following functionalities: to receive the File Information Table submitted by the Client application, to select and send challenges regarding monitored files to the CSS that host them, to manage pending responses to challenges, to receive challenge responses from the CSS, to update the CSS trust levels, to receive and to answer to requests for the monitored file status, to receive, and to answer requests for information about the CSS with which there is a monitoring agreement.\nIn the File Information Table received from the Client application, the CSS is represented by the identifier of the monitoring contract between the ICS and the CSS. For each CSS contract in the information table, the ICS application saves a new record in a table named \u201carchives\u201d so that the monitoring of one file copy does not interfere with the monitoring of other copies. The following information is stored in the archives table: a file identifier hash, the chunk size in bytes, the number of generated cycles, and the contract identifier that defines the CSS where the file is stored.\nFor each data block hash received by means of the information table coming from a Client, a new record is generated by the ICS application in a table named \u201cblocks.\u201d The following information is stored in this table: the data block hash, the chunk address codes that compose this block, the cycle number to which it belongs, and the \"archives\" record identifier.\nThe process of selecting, generating, and sending challenges is an activity performed periodically (in our prototype, daily) by the ICS. This process comprises the following actions: selecting files to be checked in each CSS, selecting data blocks to be checked in each file, generating challenges, and sending the challenges to the CSS.\nThe trust level assigned to a CSS will be decremented whenever the ICS identifies a corrupted data block in a file. Conversely, the trust level is incremented after all data blocks from the same cycle have been the object of challenges to the CSS, and all the answers to these challenges confirm the integrity of the checked blocks. Other results obtained in a file marked as corrupted, whether positive or negative, will simply be ignored.\n\nCloud storage service application \nThe CSS application was developed as a web service named \u201cStorerWebService.\u201d This application provides a service capable of receiving challenges from an ICS, processing them, and sending back to the ICS the challenge responses.\nThe CSS application in our prototype implementation includes the following functionalities: to receive challenges from an ICS and storing them in the CSS database, to monitor the database and process the pending challenges, to store the responses in the database, and to monitor the database to find and send pending responses to the ICS. Furthermore, the application also includes features for uploading and downloading files to simulate the features normally available in a CSS.\nThe challenge-receiving functionality is provided in ICS by means of a method called \u201casyncFileCheck\u201d that gets as the input parameter an object of the class called \u201cChallengeBean.\u201d This object contains the attributes \u201cidentifier,\u201d \u201caddressCodes,\u201d \u201cchunkLength,\u201d \u201cresponseUrl,\u201d and \u201cid,\u201d which respectively represent: the file identifier hash, the array with the set of chunk addresses codes to be read in the file and that will compose the data block on which the response hash will be generated, the chunk size in bytes, the ICS web service URL responsible for receiving the challenge answers, and the challenge identifier.\nAfter receiving a challenge, the information is extracted from the ChallengeBean object and is stored in the CSS database, where it gets the \u201cWaiting for Processing\u201d status. The advantage of storing the received challenges for further processing is related to the organization of our general architecture for saving computational resources. This CSS asynchronous procedure prevents the ICS from needing to keep a process locked awaiting a response from the CSS for each submitted challenge, considering that the required time to process a challenge varies according to the checked file size, the number of simultaneously-received challenges, and the CSS computational capacity.\nAnother designed consequence of this model is the possibility of performing load distribution since the services responsible for receiving, processing, and responding to the challenges can be provided by totally different hardware and software infrastructures. The only requirement is that all infrastructure components must share access to the same database.\nThe response hash is generated from a data block assembled with file chunks that are read from the file being checked. The used chunks are those referenced by the 16 chunk addresses defined in the \u201caddressCodes\u201d attribute of the \u201cchallenge\u201d object. These address codes are integers ranging from zero to 4095 that are multiplied by the chunk size to obtain the address of the first byte of each chunk in the file.\nAfter the completion of the chunk reading task, the obtained data are concatenated, forming a data block. From this data block, a 256-bit hash is generated using the Blake2 hash cryptographic function.[21] The generated hash is saved in the database while waiting to be sent back to the ICS. A specific process monitors the pendent hash responses and sends it as challenge answers to ICS.\n\nExperimental validation \nThis section describes the setup used to perform the experiments designed to evaluate the performance, efficiency, and efficacy of the proposed protocol TOPMCloud. Then, the results of the experimental validation are presented and discussed.\n\nExperimental setup \nOur experimental environment comprises five similar virtual machines (VM), each one running with 12 GB memory, 200 GB of hard disk space, running under the operating system Ubuntu Server 14.04 LTS Server. All of them were set to run in a private cloud hosted by the Decision Technologies Laboratory at the University of Bras\u00edlia, Brazil. Since this setup presents common functionalities that are found in most commercial CSS providers, it is considered as a configuration that adequately represents the utilization of these services as provided by commercial cloud services. The basic operating system functions that are required from the CSS provider are file access operations and hash calculations\u2014which are commonly available in cloud computing services\u2014and object class libraries. Otherwise, these functions can be easily deployed by commands from the cloud services client. These VMs are configured to perform the three roles designed in our architecture, with the following distribution of services in the VM set: one VM holds the Client role; one VM performs the Integrity Check Service (ICS) role; and the remaining three VMs hold the Cloud Storage Services (CSS) role.\nThe experiments were realized using four files with different sizes (2.5, 5, 10, and 15 GB). For each file, an information table was generated considering the utilization of the file storage service and its monitoring during five different time periods (1, 5, 10, 20, and 30 years). Files with diverse content types and formats, such as International Organization for Standardization (ISO) 9660, XenServer Virtual Appliance (XVA), and Matroska Video (MKV), were used. For each considered time period, cryptographic keys with the same size were used, but with different and aleatory values, so that each generated encrypted file was completely distinct from other files generated from the same origin file.\nWith the described configuration operating during three months, logs were generated from the ICS monitoring process verifying files stored in the three CSS providers. In total, 60 files were monitored by the ICS, 20 of them stored in each CSS. In this period, some CSS servers were randomly chosen to be turned off and after a while to be switched on again, in order to simulate fault conditions.\nIn order to evaluate the behavior of the proposed protocol, some experiments were realized with contextual modifications, including the deliberate change of file contents and the change of trust levels assigned to each CSS.\n\nSubmission of files \nOur experiments begin by observing the performance during the process of preparation and submission of files to CSS and the related transmission of their respective information tables to the ICS. The steps of this process have their duration time measured so that we collect observations regarding the following tasks: encrypting the source file to an encrypted file, hashing this encrypted file, computing cycles and distributing chunks on data blocks, hashing these data blocks, and finally sending the information table to ICS.\nEach of these tasks has its execution time varying according to the processed file size or the foreseen time period for its storage in the CSS. We present hereafter average times taken from a number of 20 controlled repetitions of the experiment or test. Figure 10 shows both the encryption and the hash generation average time by file sizes.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 10 Average time for file encryption and hash generation by file size\n\n\n\nAs explained before, the task \u201ccomputing cycles and distributing chunks\u201d is responsible for selecting 16 chunk address codes for each data block required for filling the computing cycles. Hence, its execution time varies exclusively according to the file storage period. Figure 11 shows the required average time for computing cycles and distributing chunks.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 11 Required average time for computing cycles and distributing chunks\n\n\n\nThe task \u201chashing data blocks\u201d comprises actions for randomly reading the chunks in the encrypted file, assembling data blocks by the concatenation of chunks and, finally, the hash generation from each assembled data block. In this task, the execution time varies according to both the file size and the expected cloud storage time. The larger the file, the larger will be the size of each data block. Additionally, the longer the storage period, the greater the quantity of data blocks to be generated. Figure 12 shows a graph with the time variation for generating data block hashes according to the file size and the chosen storage period.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 12 Average time for data block hashing\n\n\n\nObserving Figure 12, it is possible to verify that the time for generating all hashes from a 15 GB file presents a disproportionate growth when compared with the other file sizes, independent of the cloud storage period. From this observation, it is possible to infer that for files with sizes of the order of 15 GB or greater it is necessary to optimize the proposed protocol.\nAnother important component in the file submission process total execution time is the time required to send the information table to the ICS. This time varies according to the quantity of generated data blocks, which in turn varies according to the storage period for the file in the CSS. The measured time in this task comprises the connection with web service in the ICS, the sending of the information table, its storage in the ICS database, and the receiving of a successful storage confirmation from the ICS. Figure 13 shows the required average time for sending an information table to the ICS according to the CSS storage period.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 13 Required average time for sending an information table to the integrity check service (ICS)\n\n\n\nNetwork bandwidth consumption \nWe show results from experiments realized to determine the effective consumption of network bandwidth by the file monitoring process execution. By design, the TOPMCloud implies a variation in network resource consumption according to the trust level assigned to the CSS.\nFor evaluating this feature, each trust level foreseen in TOPMCloud was successively assigned to a CSS, and for each assigned level, the ICS performed daily file verifications on the said CSS. The measurements regarding the traffic due to challenges sent by the ICS and corresponding answers were based on traffic collection with the Wireshark tool. Using a controlled network environment and with filters applied on Wireshark, it was possible to exclusively capture packets generated either by the concerned ICS and the CSS applications. Figure 14 shows the average daily network bandwidth consumption by stored file according to the trust level assigned to the CSS.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 14 Average daily network bandwidth consumption by stored file\n\n\n\nThe rate of network traffic per stored files attains its maximum value when the number of stored files in the CSS is equal to one since the ICS is required to use network bandwidth just for this file. In this case, at least one of these file data blocks will be verified per day, independent of the trust level assigned to the CSS. If this trust level is higher and there is a set of files to be verified, the network traffic will serve to monitor percentages of these files\u2019 contents. The network traffic per stored file always attains its minimum when an integer value is attributed to the percentage computation defined at Table 1, column \u201cFiles verified by day,\u201d for the number of files stored in the CSS, according to its trust level.\n\nFile integrity checking by the ICS \nAs mentioned before, this operation of the ICS was monitored for three months, in a setup involving one ICS and three CSS. During that period, either the ICS or the CSS stored their logs in text files, with each log record containing the execution time spent with the actions related to the file integrity-checking process.\nAs the file integrity checking is executed by means of challenges sent by the ICS to the CSS, the response time is directly proportional to the verified file size. Thus, the information about challenge results obtained from the ICS logs was grouped and classified according to verified file size. The registered values in the ICS logs comprise the executed actions since the moment of storing the challenge in the \u201crequests\u201d table up to the receiving of the respective answer sent by the CSS. In our experiment, the ICS logs registered 1340 records of successful checking results. Figure 15 shows the medium, maximum, and minimum time spent to process a challenge by the verified file size.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 15 Time spent to conclude the processing of a challenge by file size\n\n\n\nThe most time-consuming actions in the processing of a challenge are realized in the CSS and comprise the reassembly of the data block from the stored file chunks, followed by the task of data block hashing. From the three CSS used in our experiment, the number of records collected in their logs was respectively 360, 470, and 510. These records registered the time spent by each CSS in the execution of the aforementioned actions. In spite of the same number of files having been stored in all three CSS, the number of processed challenges for each CSS varied, because each CSS was randomly turned down for a random time interval. Figure 16 shows the comparison between the average time spent by each CSS answering the requested challenges coming from the ICS.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 16 Average time spent by the cloud storage service (CSS) to answer ICS challenges, by CSS and file size\n\n\n\nSimulation of file storage faults \nThe simulation of integrity faults in files stored in the CSS was performed by means of the controlled modification of bytes within some stored files. This simulation purpose was to determine how much time is required by the ICS to identify a fault. For this simulation, 10 out of the 15 \u201cfive GB checked files\u201d were chosen, and then these files were randomly distributed in two groups each one possessing five files, which had in common the same number of modifications in their bytes.\nIn File Group 1, 5368 sequential bytes (0,0001%) were changed in each file, and the position for this modification inside each file was randomly chosen. In Group 2, in each file, a number of 5,368,709 sequential bytes (1%) was changed at the end of the file.\nFigure 17 shows the results regarding the perception of these faults by the ICS.\n\r\n\n\n\n\n\n\n\n\n\n\n Fig. 17 Detection by the ICS of integrity faults in files stored in a CSS\n\n\n\nDiscussion \nThe related research works described in \"Related works\" present viable alternatives that respond to the problem of verifying the integrity of files stored in the cloud, but none of them presents a unique and complete solution that allows permanent and active monitoring, execution by an independent third party, file integrity verification without access to file content, low and predictable computational cost, and balanced computational resources\u2019 consumption according to the measured CSS QoS. The main differences between the protocol proposed in this work and those of the cited related works is further discussed as follows.\nWith the proposals by Juels and Kaliski, Jr.[25], and Kumar and Saxena[26], small changes in files cannot be detected because only a few specific bits in each file are tested, while in our TopMCloud, all bits are tested. In this case, our solution benefits from the integrity check proposed and tested according to the presented results.\nThe proposed solution by George and Sabitha[27] requires a trusted third party to save pieces of information about the files. The trusted third party is necessary because it is not possible to restore the original file content without the saved information. Oppositely, in TopMCloud, the third party can be untrusted because it verifies the file integrity without having direct access to the encrypted file or any information related to its original content.\nWith Kavuri et al.[28] and Al-Jaberi and Zainal[29], it is necessary to retrieve the whole ciphered file to verify its integrity, while in TopMCloud, the monitoring is constantly performed without retrieving from the CSS any bit of the stored file. In this case, our solution reduces bandwidth consumption when network traffic is to be considered and still guarantees the integrity of the files when their storage time has to be taken into account, according to the client needs.\nWith the proposals of Kai et al.[30] and Wang et al.[31], the monitoring solutions are based on asymmetric homomorphic algorithms, a type of cryptography scheme that consumes large amounts of computational resources. In TopMCloud, hashes perform faster and consume fewer resources. Thus our solution benefits from speed in processing large files and being able to maintain integrity and confidentiality.\nAnother interesting consideration is that none of the related works were designed to monitor large files with 5, 10, and 15 GB, or more than that. Consequently, among the reviewed publications, there were none presenting test results analogous to those coming from the TopMCloud validation process. For this reason, it was not possible to perform a qualitative analysis of the results obtained with the tests applied in TopMCloud in comparison to the mentioned related works.\n\nConclusion and future work \nIn this paper, a distributed computational architecture was proposed aimed at monitoring the integrity of stored files in a Cloud Storage Service (CSS) without compromising the confidentiality of these files\u2019 contents. The proposed architecture and its supporting protocol leverage the notion of trust so that a CSS consumer uses a third-party file-checking service, the Integrity Check Service (ICS), to continuously challenge the CSS provider regarding the integrity of stored files and, based on these verifications, to present a level of trust attributed to this CSS provider.\nBased on the behavior of each CSS, the file-checking frequency adapts dynamically, either increasing if stored file integrity failures are observed or decreasing if a low failure rate is observed. Consequently, the verification effort is oriented toward being more intensive when it is effectively needed, thus optimizing computational and network resource consumption regarding the proposed protocol\u2019s execution.\nThe proposed protocol was also designed to address requirements such as low bandwidth consumption, capacity to quickly identify misbehaving storage services, strong resistance against fraud, reduced CSS overhead, confidentiality of stored file contents and capacity to provide predictability, and maximum resource savings for the ICS.\nAnother view of our proposal is that it was designed to provide an efficient control over the integrity of files stored in a CSS without overloading the service providers that present appropriate behavior, but quickly acting if this behavior becomes problematic, which requires our architecture to identify faults and provide early alerts about corrupted files to their owners. These are the main reasons for choosing in our design to perform file integrity monitoring by means of the hashing of stored file parts, without that ICS needing to directly access file contents and avoiding the CSS from processing the complete file on each checking.\nThe design choice of hashing and verifying data blocks, which are assembled from randomly-chosen file chunks, was demonstrated in our experimental validation as an effective method to detect the simulated file integrity faults that were injected in the CSS under test. Even small modifications in very large files took an average of 14 days to be identified.\nFurthermore, based on the test results in our experimental setup, it was possible to verify that the time taken to generate a File Information Table, as well as the size of this table, can be considered adequate, being proportional to the file size. The network bandwidth consumption for the monitoring was very low regardless of the trust level assigned to a CSS.\nAnother feature of the proposed architecture is that if necessary, it can be used to avoid the CSS consumer needing to store copies of files that are stored in the cloud since the consumer is ensured the capacity to retrieve the files from multiple CSS providers, which constitutes a redundant cloud storage configuration. In this case, the redundancy level must be appropriately chosen by the storage client according to the file information criticality and using the measurements made by the ICS regarding the trust in each CSS used. The CSS classifications in trust levels, according to their availability and their stored file integrity history, besides being applied to relieve computational load for a well-behaving CSS, also allow the clients to use this information to critically select the most suitable CSS to be contracted, according to the information criticality level associated with the files that will be stored in the cloud.\nThe proposed architecture proved to be quite robust during the tests, satisfactorily responding to the fault simulations. Interestingly enough, the developed prototype also resisted failures not foreseen in the protocol very well, such as unplanned server shutdown (due to electricity outages), not requiring any human intervention for the functionalities to return after restarting the system.\nAs future work, it is intended to add a functionality that allows the sharing of the measured CSS trust level between different ICS. This functionality would allow, for example, that a fault identified in a CSS by an ICS would alert others ICS so that they can pro-actively react, prioritizing the checking of files stored in that CSS.\nAiming to obtain better performance with files larger than 30 GB, it is intended to test the modifications of our protocol parameters, such as increasing the number of chunks per file and\/or the number of chunks per data blocks. In this same sense, it is also intended to improve the architecture implementation in order to configure different processing parallelism schemes.\n\nAcknowledgements \nThis research work was supported by Sungshin W. University. In addition, A.P. thanks the Brazilian Army Science and Technology Department. E.D.C. thanks the Ministry of Planning, Development and Management (Grant SEST - 011\/2016). R.T.d.S.J. thanks the Brazilian research and innovation Agencies CAPES - Coordination for the Improvement of Higher Education Personnel (Grant 23038.007604\/2014-69 FORTE - Tempestive Forensics Project), CNPq - National Council for Scientific and Technological Development (Grant 465741\/2014-2 Science and Technology National Institute - INCT on Cybersecurity), FAPDF - Research Support Foundation of the Federal District (Grant 0193.001366\/2016 - UIoT - Universal Internet of Things), the Ministry of Planning, Development and Management (Grant 005\/2016 DIPLA), and the Institutional Security Office of the Presidency of the Republic of Brazil (Grant 002\/2017). R.d.O.A. thanks the Brazilian research and innovation Agencies CAPES - Coordination for the Improvement of Higher Education Personnel (Grant 23038.007604\/2014-69 FORTE - Tempestive Forensics Project), CNPq - National Council for Scientific and Technological Development (Grant 465741\/2014-2 Science and Technology National Institute - INCT on Cybersecurity), FAPDF - Research Support Foundation of the Federal District (Grant 0193.001365\/2016 - SSDDC - Secure Software Defined Data Center), and the Institutional Security Office of the Presidency of the Republic of Brazil (Grant 002\/2017).\n\nAuthor contributions \nA.P., E.D.C. and R.T.d.S.J conceived the security architecture and the proposed protocol for trust verifications regarding the integrity of files in cloud services. A.P. developed the corresponding prototype for validation purposes. R.d.O.A., L.J.G.V and T.H.K. conceived the experiments and specified data collection requirements for the validation of results. All authors contributed equally to performing the experiments, analyzing resulting data and writing the paper.\n\nConflicts of interest \nThe authors declare no conflict of interest.\n\nReferences \n\n\n\u2191 Tandel, S.T.; Shah, V.K.; Hiranwal, S. (2013). \"An implementation of effective XML based dynamic data integrity audit service in cloud\". International Journal of Societal Applications of Computer Science 2 (8): 449\u2013553. https:\/\/web.archive.org\/web\/20150118081656\/http:\/\/ijsacs.org\/previous.html .   \n\n\u2191 2.0 2.1 Dabas, P.; Wadhwa, D. (2014). \"A Recapitulation of Data Auditing Approaches for Cloud Data\". International Journal of Computer Applications Technology and Research 3 (6): 329\u201332. doi:10.7753\/IJCATR0306.1002. https:\/\/ijcat.com\/archieve\/volume3\/issue6\/ijcatr03061002 .   \n\n\u2191 Mell, P.; Grance, T. (September 2011). \"The NIST Definition of Cloud Computing\". Computer Security Resource Center. https:\/\/csrc.nist.gov\/publications\/detail\/sp\/800-145\/final .   \n\n\u2191 4.0 4.1 Miller, M. (2008). Cloud Computing: Web-Based Applications That Change the Way You Work and Collaborate Online. Que Publishing. ISBN 9780789738035.   \n\n\u2191 Velte, T.; Velte, A.; Elsenpeter, R.C. (2009). Cloud Computing: A Practical Approach. McGraw-Hill Education. ISBN 9780071626941.   \n\n\u2191 Zhou, M.; Zhang, R.; Zeng, D.; Qian, W. (2010). \"Services in the Cloud Computing era: A survey\". Proceedings from the 4th International Universal Communication Symposium: 40\u201346. doi:10.1109\/IUCS.2010.5666772.   \n\n\u2191 Jing, X.; Jian-Jun, Z. (2010). \"A Brief Survey on the Security Model of Cloud Computing\". Proceedings from the Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science: 475\u20138. doi:10.1109\/DCABES.2010.103.   \n\n\u2191 Mell, P.; Grance, T. (October 2009). \"The NIST Definition of Cloud Computing\" (PDF). https:\/\/www.nist.gov\/sites\/default\/files\/documents\/itl\/cloud\/cloud-def-v15.pdf .   \n\n\u2191 9.0 9.1 Marsh, S.P. (April 1994). \"Formalising Trust as a Computational Concept\" (PDF). University of Stirling. http:\/\/stephenmarsh.wdfiles.com\/local--files\/start\/TrustThesis.pdf .   \n\n\u2191 Gambetta, D. (1990). \"Can We Trust Trust?\". In Gambetta, D. (PDF). Trust: Making and Breaking Cooperative Relations (2008 Scanned Digital Copy). ISBN 0631155066. https:\/\/www.nuffield.ox.ac.uk\/media\/1779\/gambetta-trust_making-and-breaking-cooperative-relations.pdf .   \n\n\u2191 J\u00f8sang, A.; Knapskog, S.J. (2011). \"A Metric for Trusted Systems\" (PDF). Proceedings from the 21st National Information Systems Security Conference. https:\/\/csrc.nist.gov\/csrc\/media\/publications\/conference-paper\/1998\/10\/08\/proceedings-of-the-21st-nissc-1998\/documents\/papera2.pdf .   \n\n\u2191 Victor, P.; De Cock, M.; Cornelis, C. (2011). \"Trust and Recommendations\". In Ricci, F.; Rokach, L.; Shapira, B.; Kantor, P.B.. Recommender Systems Handbook. Springer. pp. 645\u201375. ISBN 9780387858197.   \n\n\u2191 Adnane, A.; Bidan, C.; de Sousa J\u00fanior, R.T. (2013). \"Trust-based security for the OLSR routing protocol\". Computer Communications 36 (10\u201311): 1159-71. doi:10.1016\/j.comcom.2013.04.003.   \n\n\u2191 De Sousa Jr., R.T.; Puttini, R.S. (2010). \"Trust Management in Ad Hoc Networks\". In Yan, Z.. Trust Modeling and Management in Digital Environments: From Social Concept to System Development. IGI Global. pp. 224\u201349. ISBN 9781615206827.   \n\n\u2191 15.0 15.1 15.2 Yahalom, R.; Klein, B.; Beth, T. (1993). \"Trust relationships in secure systems-a distributed authentication perspective\". Proceedings from the 1993 IEEE Computer Society Symposium on Research in Security and Privacy: 150\u201364. doi:10.1109\/RISP.1993.287635.   \n\n\u2191 Grandison, T.; Sloman, M. (2000). \"A survey of trust in internet applications\". IEEE Communications Surveys & Tutorials 3 (4): 2\u201316. doi:10.1109\/COMST.2000.5340804.   \n\n\u2191 Bellare, M.; Boldyreva, A.; Micali, S. (2000). \"Public-Key Encryption in a Multi-user Setting: Security Proofs and Improvements\". Proceedings from Advances in Cryptology \u2014 EUROCRYPT 2000: 259\u201374. doi:10.1109\/COMST.2000.5340804.   \n\n\u2191 Bose, R. (2008). Information Theory, Coding and Cryptography (2nd ed.). Mcgraw Hill Education. pp. 297\u20138. ISBN 9780070669017.   \n\n\u2191 Rivest, R. (April 1992). \"The MD5 Message-Digest Algorithm\". ietf.org. https:\/\/tools.ietf.org\/html\/rfc1321 . Retrieved 25 June 2016 .   \n\n\u2191 Dworkin, M.J. (04 August 2015). \"SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions\". NIST. https:\/\/www.nist.gov\/publications\/sha-3-standard-permutation-based-hash-and-extendable-output-functions .   \n\n\u2191 21.0 21.1 21.2 Aumasson, J.-P.; Neves, S.; W.-O.; Winnerlein, C. (2013). \"BLAKE2: Simpler, Smaller, Fast as MD5\". Proceedings from the 2013 International Conference on Applied Cryptography and Network Security: 119\u201335. doi:10.1007\/978-3-642-38980-1_8.   \n\n\u2191 Tahta, U.E.; Sen, S.; Can, A.B. (2015). \"GenTrust: A genetic trust management model for peer-to-peer systems\". Applied Soft Computing 34: 693\u2013704. doi:10.1016\/j.asoc.2015.04.053.   \n\n\u2191 Gholami, A.; Arani, M.G. (2015). \"A Trust Model Based on Quality of Service in Cloud Computing Environment\". International Journal of Database Theory and Application 8 (5): 161\u201370. https:\/\/pdfs.semanticscholar.org\/487e\/11b3605276b5ff66de363d4e735bcdd740c3.pdf?_ga=2.219348827.37751313.1553622532-1472248397.1551840079 .   \n\n\u2191 Canedo, E.D. (30 January 2013). \"Modelo de confian\u00e7a para a troca de arquivos em uma nuvem privada - Tese (Doutorado em Engenharia El\u00e9trica)\". Universidade de Bras\u00edlia. http:\/\/repositorio.unb.br\/handle\/10482\/11987 .   \n\n\u2191 25.0 25.1 25.2 Juels, A.; Kaliski, Jr., B.S. (2007). \"PORs: Proofs of retrievability for large files\". Proceedings of the 14th ACM Conference on Computer and Communications Security: 584\u201397. doi:10.1145\/1315245.1315317.   \n\n\u2191 26.0 26.1 Kumar, R.S.; Saxena, A. (2011). \"Data integrity proofs in cloud storage\". Proceedings of the Third International Conference on Communication Systems and Networks: 1\u20134. doi:10.1109\/COMSNETS.2011.5716422.   \n\n\u2191 27.0 27.1 George, R.S.; Sabitha, S. (2013). \"Data anonymization and integrity checking in cloud computing\". Proceedings of the Fourth International Conference on Computing, Communications and Networking Technologies: 1\u20135. doi:10.1109\/ICCCNT.2013.6726813.   \n\n\u2191 28.0 28.1 Kavuri, S.K.S.V.A.; Kancherla, G.R.; Bobba, B.R. (2014). \"Data authentication and integrity verification techniques for trusted\/untrusted cloud servers\". Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics: 2590-2596. doi:10.1109\/ICACCI.2014.6968657.   \n\n\u2191 29.0 29.1 Al-Jaberi, M.F.; Zainal, A. (2014). \"Data integrity and privacy model in cloud computing\". Proceedings of the 2014 International Symposium on Biometrics and Security Technologies: 280-284. doi:10.1109\/ISBAST.2014.7013135.   \n\n\u2191 30.0 30.1 Kai, H.; Chuanhe, H.; Jinhai, W. et al. (2013). \"An Efficient Public Batch Auditing Protocol for Data Security in Multi-cloud Storage\". Proceedings of the 8th ChinaGrid Annual Conference: 51-56. doi:10.1109\/ChinaGrid.2013.13.   \n\n\u2191 31.0 31.1 Wang, Q.; Wang, C.; Li, J. et al. (2009). \"Enabling public verifiability and data dynamics for storage security in cloud computing\". Proceedings of the 14th European conference on Research in computer security: 355\u201370. doi:10.1007\/978-3-642-04444-1_22.   \n\n\u2191 32.0 32.1 Pflanzner, T.; Tornyai, R.; Kertesz, A. (2016). \"Towards Enabling Clouds for IoT: Interoperable Data Management Approaches by Multi-clouds\". In Mahmood, Z. publisher=Springer. Connectivity Frameworks for Smart Devices. pp. 187\u2013207. doi:10.1007\/978-3-319-33124-9_8. ISBN 9783319331225.   \n\n\u2191 Gracia-Tinedo, R.; Artigas, M.S.; Moreno-Martinez, A. et al. (2013). \"Actively Measuring Personal Cloud Storage\". Proceedings of the 2013 IEEE Sixth International Conference on Cloud Computing: 301-308. doi:10.1109\/CLOUD.2013.25.   \n\n\u2191 Pinheiro, A.; Canedo, E.D.; De Sousa, Jr.; R.T. et al. (2016). \"A Proposed Protocol for Periodic Monitoring of Cloud Storage Services Using Trust and Encryption\". Proceedings of the 2016 International Conference on Computational Science and Its Applications: 45\u201359. doi:10.1007\/978-3-319-42108-7_4.   \n\n\u2191 Pinheiro, A.; Canedo, E.D.; De Sousa, Jr.; R.T. et al. (2016). \"Trust-Oriented Protocol for Continuous Monitoring of Stored Files in Cloud\". Proceedings of the Eleventh International Conference on Software Engineering Advances: 295\u2013301. https:\/\/thinkmind.org\/index.php?view=article&articleid=icsea_2016_13_20_10164 .   \n\n\u2191 Jendrock, E.; Cervera-Navarro, R.; Evans, I. et al. (September 2014). \"Java Platform, Enterprise Edition: The Java EE Tutorial\". Java Documentation. https:\/\/docs.oracle.com\/javaee\/7\/tutorial\/ .   \n\n\u2191 \"GlassFish Server Open Source Edition, Release Notes, Release 4.1\" (PDF). Oracle. September 2014. https:\/\/javaee.github.io\/glassfish\/doc\/4.0\/release-notes.pdf . Retrieved 25 February 2018 .   \n\n\u2191 \"PostgreSQL: The World's Most Advanced Open Source Relational Database\". PostgreSQL Global Development Group. https:\/\/www.postgresql.org\/ . Retrieved 22 May 2016 .   \n\n\u2191 39.0 39.1 \"Java Platform, Standard Edition 7: API Specification\". Oracle Corporation. https:\/\/docs.oracle.com\/javase\/7\/docs\/api\/overview-summary.html . Retrieved 21 May 2016 .   \n\n\u2191 \"Advanced Encryption Standard (AES)\". NIST. November 2001. https:\/\/csrc.nist.gov\/publications\/detail\/fips\/197\/final .   \n\n\u2191 Bellare, M.; Kilian, J.; Rogaway, P. (1994). \"The Security of Cipher Block Chaining\". Proceedings of Advances in Cryptology \u2014 CRYPTO \u201994: 341-358. doi:10.1007\/3-540-48658-5_32.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\">https:\/\/www.limswiki.org\/index.php\/Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2019)LIMSwiki journal articles (all)LIMSwiki journal articles on cloud computingLIMSwiki journal articles on cybersecurity\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 27 March 2019, at 00:38.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 662 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","af5b38e70b68468e6df8188586e739da_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Security architecture and protocol for trust verifications regarding the integrity of files stored in cloud services<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><a href=\"https:\/\/www.limswiki.org\/index.php\/Cloud_computing\" title=\"Cloud computing\" class=\"wiki-link\" data-key=\"fcfe5882eaa018d920cedb88398b604f\">Cloud computing<\/a> is considered an interesting paradigm due to its scalability, availability, and virtually unlimited storage capacity. However, it is challenging to organize a cloud storage service (CSS) that is safe from the client point-of-view and to implement this CSS in public clouds since it is not advisable to blindly consider this configuration as fully trustworthy. Ideally, owners of large amounts of data should trust their data to be in the cloud for a long period of time, without the burden of keeping copies of the original data, nor of accessing the whole content for verification regarding data preservation. Due to these requirements, <a href=\"https:\/\/www.limswiki.org\/index.php\/Data_integrity\" title=\"Data integrity\" class=\"wiki-link\" data-key=\"382a9bb77ee3e36bb3b37c79ed813167\">integrity<\/a>, availability, <a href=\"https:\/\/www.limswiki.org\/index.php\/Information_privacy\" title=\"Information privacy\" class=\"wiki-link\" data-key=\"185f6d9f874e48914b5789317408f782\">privacy<\/a>, and trust are still challenging issues for the adoption of cloud storage services, especially when losing or leaking <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> can bring significant damage, be it legal or business-related. With such concerns in mind, this paper proposes an architecture for periodically monitoring both the information stored in the cloud and the service provider behavior. The architecture operates with a proposed protocol based on trust and encryption concepts to ensure cloud data integrity without compromising confidentiality and without overloading storage services. Extensive tests and simulations of the proposed architecture and protocol validate their functional behavior and performance.\n<\/p><p><b>Keywords<\/b>: cloud computing; cloud data storage; proof of integrity; services monitoring; trust\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Companies, institutions, and government agencies generate large amounts of digital information every day, such as documents, projects, and transaction records. For legal or business reasons, this information needs to remain <a href=\"https:\/\/www.limswiki.org\/index.php\/Data_retention\" title=\"Data retention\" class=\"wiki-link\" data-key=\"d77533b92d003d39cee958a82b62391a\">stored<\/a> for long periods of time.\n<\/p><p>Due to the popularization of cloud computing (CC), its cost reduction, and an ever-growing supply of cloud storage services (CSS), many companies are choosing these services to store their sensitive information. Cloud computing\u2019s advantages include scalability, availability, and virtually unlimited storage capacity. However, it is a challenge to build safe storage services, mainly when these services run in public cloud infrastructures and are managed by service providers under conditions that are not fully trustworthy.\n<\/p><p>Data owners often need to keep their stored data for a long time, though it is possible that they rarely will have to access it. Furthermore, some data could be stored in a CSS without its owner having to keep the original copy. However, in these situations, the storage service reliability must be considered, because even the best services sometimes fail<sup id=\"rdp-ebb-cite_ref-TandelAnImplem13_1-0\" class=\"reference\"><a href=\"#cite_note-TandelAnImplem13-1\">[1]<\/a><\/sup>, and since the loss of these data or their leakage can bring significant business or legal damage, the issues of integrity, availability, privacy, and trust need to be answered before the adoption of the CSS.\n<\/p><p>Data integrity is defined as the accuracy and consistency of stored data. These two properties indicate that the data have not changed and have not been broken.<sup id=\"rdp-ebb-cite_ref-DabasARecap14_2-0\" class=\"reference\"><a href=\"#cite_note-DabasARecap14-2\">[2]<\/a><\/sup> Moreover, besides data integrity, a considerable number of organizations consider both confidentiality and privacy requirements as the main obstacles to the acceptance of public cloud services.<sup id=\"rdp-ebb-cite_ref-DabasARecap14_2-1\" class=\"reference\"><a href=\"#cite_note-DabasARecap14-2\">[2]<\/a><\/sup> Hence, to fulfill these requirements, a CSS should provide mechanisms to confirm data integrity, while still ensuring user privacy and data confidentiality.\n<\/p><p>Considering these requirements, this paper proposes an architecture for periodically monitoring both the information stored in the cloud infrastructure and the contracted storage service behavior. The architecture is based on the operation of a proposed protocol that uses a third party and applies trust and encryption means to verify both the existence and the integrity of data stored in the cloud infrastructure without compromising these data\u2019s confidentiality. Furthermore, the protocol was designed to minimize the overload that it imposes on the cloud storage service.\n<\/p><p>To validate the proposed architecture and its supporting protocol, a corresponding prototype was developed and implemented. Then, this prototype was submitted to testing and simulations by means of which we verified its functional characteristics and its performance.\n<\/p><p>This paper addresses all of this and is structured as follows. The \"Background\" section reviews the concepts and definitions of cloud computing, encryption, and trust, then we present works related to data integrity in the cloud. Then we describe the proposed architecture, while its implementation is discussed in the following section. Afterwards, the \"Experimental validation\" section is devoted to the experiments and respective results, while the main differences between related works and the proposed architecture follow it. The paper ends with our conclusions and outlines future works.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Background\">Background<\/span><\/h2>\n<p>Cloud computing (CC) is a model that allows convenient and on-demand network access to a shared set of configurable computational resources. These resources can be quickly provisioned with minimal management effort and without the service provider\u2019s intervention.<sup id=\"rdp-ebb-cite_ref-MellTheNIST11_3-0\" class=\"reference\"><a href=\"#cite_note-MellTheNIST11-3\">[3]<\/a><\/sup> Since it constitutes a flexible and reliable computing environment, CC is being gradually adopted in different business scenarios using several available supporting solutions.\n<\/p><p>Relying on different technologies (e.g., virtualization, utility computing, grid computing, and service-oriented architecture) and proposing a new computational services paradigm, CC requires high-level management activities, which include: (a) selection of the service provider, (b) selection of virtualization technology, (c) virtual resources\u2019 allocation, and (d) monitoring and auditing procedures to comply with service level agreements (SLAs).<sup id=\"rdp-ebb-cite_ref-MillerCloud08_4-0\" class=\"reference\"><a href=\"#cite_note-MillerCloud08-4\">[4]<\/a><\/sup>\n<\/p><p>A particular CC solution comprises several components such as client modules, <a href=\"https:\/\/www.limswiki.org\/index.php\/Data_center\" title=\"Data center\" class=\"wiki-link\" data-key=\"9f5f939040d714745138ee295af8be71\">data centers<\/a>, and distributed servers. These elements form the three parts of the cloud solution<sup id=\"rdp-ebb-cite_ref-MillerCloud08_4-1\" class=\"reference\"><a href=\"#cite_note-MillerCloud08-4\">[4]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-VelteCloud09_5-0\" class=\"reference\"><a href=\"#cite_note-VelteCloud09-5\">[5]<\/a><\/sup>, each one with a specific purpose and specific role in delivering working applications based on the cloud.\n<\/p><p>The CC architecture is basically structured into two main layers: a lower and a higher resource layer, each one dealing with a particular aspect of making application resources available. The lower layer comprises the physical infrastructure, and it is responsible for the virtualization of storage and computational resources. The higher layer provides specific services, such as <a href=\"https:\/\/www.limswiki.org\/index.php\/Software_as_a_service\" title=\"Software as a service\" class=\"wiki-link\" data-key=\"ae8c8a7cd5ee1a264f4f0bbd4a4caedd\">software as a service<\/a> (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). Each of these layers may have its own management and monitoring systems, independent of one another, thus improving flexibility, reuse, and scalability.<sup id=\"rdp-ebb-cite_ref-ZhouServices10_6-0\" class=\"reference\"><a href=\"#cite_note-ZhouServices10-6\">[6]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-JingABrief10_7-0\" class=\"reference\"><a href=\"#cite_note-JingABrief10-7\">[7]<\/a><\/sup>\n<\/p><p>Since CC provides access to a shared pool of configurable computing resources, its provisioning mode can be classified by the intended access methods and coverage of services\u2019 availability, which yields different models of CC services\u2019 deployment, ranging from private clouds, in which resources are shared within an owner organization, to public clouds, in which cloud providers possess the resources that are consumed by other organizations based on contracts, but also including hybrid cloud environments and community clouds.<sup id=\"rdp-ebb-cite_ref-MellTheNIST09_8-0\" class=\"reference\"><a href=\"#cite_note-MellTheNIST09-8\">[8]<\/a><\/sup>\n<\/p><p>The central concept of this paper\u2019s proposal is the verification by the cloud service user that a particular property, in our case the integrity of files, is fulfilled by the cloud service provider, regardless of the mode of a service's provision and deployment, either in the form of private, public, or hybrid clouds.\n<\/p><p>The verification of file integrity is performed by means of a protocol that uses contemporaneous computational encryption, specifically public key encryption and hashes, which together provide authentication of messages and compact integrity verification sequences that are unequivocally bound to each verified file (signed file hashes).\n<\/p><p>This proposed protocol is conceived to allow the user of cloud services to check whether the services provider is indeed acting as expected in regard to maintaining the integrity of the user files, which corresponds to the idea of the user monitoring the provider to acquire and maintain trust in the provider behavior in this circumstance.\n<\/p><p>Some specific aspects of trust, encryption, and hashes that are considered as useful for this paper\u2019s comprehension are briefly reviewed in the subsections below.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Trust\">Trust<\/span><\/h3>\n<p>Trust is a common reasoning process for humans to face the world\u2019s complexities and to think sensibly about everyday life possibilities. Trust is strongly linked to expectations about something, which implies a degree of uncertainty and optimism. It is the choice of putting something in another\u2019s hands, considering the other\u2019s behavior to determine how to act in a given situation.<sup id=\"rdp-ebb-cite_ref-MarshFormal94_9-0\" class=\"reference\"><a href=\"#cite_note-MarshFormal94-9\">[9]<\/a><\/sup>\n<\/p><p>Trust can be considered as a particular level of subjective probability in which an agent believes that another agent will perform a certain action, which is subject to monitoring.<sup id=\"rdp-ebb-cite_ref-GambettaTrust08_10-0\" class=\"reference\"><a href=\"#cite_note-GambettaTrust08-10\">[10]<\/a><\/sup> Furthermore, trust can be represented as an opinion so that situations involving trust and trust relationships can be modeled. Thus, positive and negative feedback on a specific entity can be accumulated and used to calculate its future behavior.<sup id=\"rdp-ebb-cite_ref-J.C3.B8sangAMetric11_11-0\" class=\"reference\"><a href=\"#cite_note-J.C3.B8sangAMetric11-11\">[11]<\/a><\/sup> This opinion may result from direct experience or may come from a recommendation from another entity.<sup id=\"rdp-ebb-cite_ref-VictorTrust11_12-0\" class=\"reference\"><a href=\"#cite_note-VictorTrust11-12\">[12]<\/a><\/sup>\n<\/p><p>According to Adnane <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-AdnaneTrust13_13-0\" class=\"reference\"><a href=\"#cite_note-AdnaneTrust13-13\">[13]<\/a><\/sup> and De Sousa, Jr. and Puttini<sup id=\"rdp-ebb-cite_ref-DeSousaTrust10_14-0\" class=\"reference\"><a href=\"#cite_note-DeSousaTrust10-14\">[14]<\/a><\/sup>, trust, trust models, and trust management have been the subject of various research works demonstrating that the conceptualization of computational trust allows a computing entity to reason with and about trust, and to make decisions regarding other entities. Indeed, since the initial works on the subject by the likes of Marsh<sup id=\"rdp-ebb-cite_ref-MarshFormal94_9-1\" class=\"reference\"><a href=\"#cite_note-MarshFormal94-9\">[9]<\/a><\/sup> and Yahalom <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-YahalomTrust93_15-0\" class=\"reference\"><a href=\"#cite_note-YahalomTrust93-15\">[15]<\/a><\/sup>, computational trust is recognized as an important aspect for decision-making in distributed and auto-organized applications, and its expression allows formalizing and clarifying trust aspects in communication protocols.\n<\/p><p>Yahalom <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-YahalomTrust93_15-1\" class=\"reference\"><a href=\"#cite_note-YahalomTrust93-15\">[15]<\/a><\/sup>, for instance, find the notion of \"trust\" to mean that if an entity A trusts an entity B in some respect, this means that A believes that B will behave in a certain way and will perform some action under certain specific circumstances. This leads to the possibility of conducting a protocol operation (action) that is evaluated by the entity A on the basis of what A knows about the entity B and the circumstances of the operation. This accurately corresponds to the protocol relationship established between a CC service consumer and a CC service provider, which is the focus of the present paper.\n<\/p><p>Thus, in our proposal, trust is used in the context of a cloud computing service as a means to verify specific actions performed by the participating entities in this context. Using the definitions by Yahalom <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-YahalomTrust93_15-2\" class=\"reference\"><a href=\"#cite_note-YahalomTrust93-15\">[15]<\/a><\/sup> and Grandison and Sloman<sup id=\"rdp-ebb-cite_ref-GrandisonASurv00_16-0\" class=\"reference\"><a href=\"#cite_note-GrandisonASurv00-16\">[16]<\/a><\/sup>, we can state that in a CC service, one entity, the CC service consumer, may trust another one, the CC service provider, for actions such as providing identification to the other entity, not interfering in the other entity sessions, neither passively by reading secret messages, nor actively by impersonating other parties. Furthermore, the CC service provider will grant access to resources or services, as well as make decisions on behalf of the other entity, with respect to a resource or service that this entity owns or controls.\n<\/p><p>In these trust verifications, it is required to ensure some properties such as the secrecy and integrity of stored files, authentication of message sources, and the freshness of the presented proofs, avoiding proof replays. It is required as well to present reduced overhead in cloud computing protocol operations and services. In our proposal, these requirements are fulfilled with modern robust public key encryption involving hashes, as discussed hereafter, considering that these means are adequately and easily deployed in current CC service provider and consumer situations.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Encryption\">Encryption<\/span><\/h3>\n<p>Encryption is a process of converting (or ciphering) a plaintext message into a ciphertext that can be deciphered back to the original message. An encryption algorithm, along with one or more keys, is used either in the encryption or the decryption operation.\n<\/p><p>The number, type, and length of the keys used depend on the encryption algorithm, the choice of which is a consequence of the security level needed. In conventional symmetric encryption, a single key is used, and with this key the sender can encrypt a message, and a recipient can decrypt the ciphered message. However, key security becomes an issue since at least two copies of the key exist, one at the sender and another at the recipient.\n<\/p><p>Oppositely, in asymmetric encryption, the encryption key and the decryption key are correlated, but different, one being a public key of the recipient that can be used by the sender to encrypt the message, while the other related key is a recipient private key allowing the recipient to decrypt the message.<sup id=\"rdp-ebb-cite_ref-BellarePublic00_17-0\" class=\"reference\"><a href=\"#cite_note-BellarePublic00-17\">[17]<\/a><\/sup> The private key can be used by its owner to send messages that are considered signed by the owner since every entity can use the corresponding public key to verify if a message comes from the owner of the private key.\n<\/p><p>These properties of asymmetric encryption are useful for the trust verifications that in our proposal are designed for checking the integrity of files stored in cloud services. Indeed, our proposal uses encryption of hashes as the principal means to fulfill the trust requirements in these operations.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Hashes\">Hashes<\/span><\/h3>\n<p>A hash value, hash code, or simply hash is the result of applying a mathematical one-way function that takes a string of any size as the data source and returns a relatively small and fixed-length string. A modification of any bit in the source string dramatically alters the resulting hash code after executing the hash function.<sup id=\"rdp-ebb-cite_ref-BoseInfo08_18-0\" class=\"reference\"><a href=\"#cite_note-BoseInfo08-18\">[18]<\/a><\/sup> These one-way functions are designed to make it very difficult to deduce from a hash value the source string that was used to calculate this hash. Furthermore, it is required that it should be extremely difficult to find two source strings whose hash codes are the same, i.e., a hash collision.\n<\/p><p>Over the years, many cryptographic algorithms have been developed for hashes, for which the Message-Digest algorithm 5 (MD5) and Secure Hash Algorithm (SHA) family of algorithms can be highlighted, due to the wide use of these algorithms in the most diverse information security software packages. MD5 is a very fast cryptographic algorithm that receives as input a random-sized message and produces as output a fixed length hash with 128 bits.<sup id=\"rdp-ebb-cite_ref-RivestTheMD5_92_19-0\" class=\"reference\"><a href=\"#cite_note-RivestTheMD5_92-19\">[19]<\/a><\/sup>\n<\/p><p>The SHA family is composed of algorithms named as SHA-1, SHA-256, and SHA-512, which differ regarding the respective security level and the output hash length, that can vary from 160 to 512 bits. The SHA-3 algorithm was chosen by the National Institute of Standards and Technology (NIST) in an international competition that aimed to replace all of the SHA family of algorithms.<sup id=\"rdp-ebb-cite_ref-DworkinSHA-3_15_20-0\" class=\"reference\"><a href=\"#cite_note-DworkinSHA-3_15-20\">[20]<\/a><\/sup>\n<\/p><p>The Blake2 algorithm is an improved version of the hash cryptographic algorithm called \u201cBlake,\u201d a finalist of the SHA-3 selection competition that is optimized for software applications. Blake2 can generate hash values from eight to 512 bits. The main Blake2 characteristics are: the memory consumption reduction by 32% compared to other SHA algorithms, the processing speed being greater than that of MD5 on 64-bit platforms, direct parallelism support without overhead, and faster hash generation on multicore processors.<sup id=\"rdp-ebb-cite_ref-AumassonBLAKE2_13_21-0\" class=\"reference\"><a href=\"#cite_note-AumassonBLAKE2_13-21\">[21]<\/a><\/sup> In our proposed validation prototype, the Blake2 algorithm was considered as a good choice due to its combined characteristics of speed, security, and simplicity.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Related_work\">Related work<\/span><\/h2>\n<p>This section presents a brief review of papers regarding the themes of computational trust applications, privacy guarantees, data integrity verification, services management, and monitoring, all of them applicable to cloud computing environments.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Computational_trust_applications\">Computational trust applications<\/span><\/h3>\n<p>Depending on the used approach, trust can either be directly measured by one entity based on its own experiences or can be evaluated through the use of third-party opinions and recommendations.\n<\/p><p>Tahta <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-TahtaGenTrust15_22-0\" class=\"reference\"><a href=\"#cite_note-TahtaGenTrust15-22\">[22]<\/a><\/sup> propose a trust model for peer-to-peer (P2P) systems called \u201cGenTrust,\u201d in which genetic algorithms are used to recognize several types of attacks and to help a well-behaved node find other trusted nodes. GenTrust uses extracted features (number of interactions, number of successful interactions, the average size of downloaded files, the average time between two interactions, etc.) that result from a node\u2019s own interactions. However, when there is not enough information for a node to consider, recommendations from other nodes are used. Then, the genetic algorithm selects which characteristics, when evaluated together and in a given context, present the best result to identify the most trustful nodes.\n<\/p><p>Another approach is presented by Gholami and Arani<sup id=\"rdp-ebb-cite_ref-GholamiATrust15_23-0\" class=\"reference\"><a href=\"#cite_note-GholamiATrust15-23\">[23]<\/a><\/sup>, proposing a trust model named \u201cTurnaround_Trust\u201d aimed at helping clients to find cloud services that can serve them based on service quality requirements. The Turnaround_Trust model considers service quality criteria such as cost, response time, bandwidth, and processor speed, to select the most trustful service among those available in the cloud.\n<\/p><p>Our approach in this paper differs from these related works since we use trust metrics that are directly related to the stored files in CC and that are paired to the cryptographic proof of these files' integrity.\n<\/p><p>Canedo<sup id=\"rdp-ebb-cite_ref-CanedoModelo13_24-0\" class=\"reference\"><a href=\"#cite_note-CanedoModelo13-24\">[24]<\/a><\/sup> bases the proposed trust model on concepts such as direct trust, trust recommendation, indirect trust, situational trust, and reputation to allow a node selection for trustful file exchange in a private cloud. For the sake of trust calculation, the processing capacity of a node, its storage capacity, and operating system\u2014as well as the link capacity\u2014are adopted as trust metrics that compose a set representative of the node availability. Concerning reputation, the calculation considers the satisfactory and unsatisfactory experiences with the referred node informed by other nodes. The proposed model calculates trust and reputation scores for a node based on previously-collected information, i.e., either information requested from other nodes in the network or information that is directly collected from interactions with the node being evaluated.\n<\/p><p>In the present paper, our approach is applied to both private and public CC services, with the development of the necessary architecture and secure protocol for trust verification regarding the integrity of files in CC services.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Integrity_verification_and_privacy_guarantee\">Integrity verification and privacy guarantee<\/span><\/h3>\n<p>In their effort to guarantee the integrity of data stored in cloud services, many research works present proposals in the domain analyzed in this paper.\n<\/p><p>A protocol is proposed by Juels and Kaliski, Jr.<sup id=\"rdp-ebb-cite_ref-JuelsPors07_25-0\" class=\"reference\"><a href=\"#cite_note-JuelsPors07-25\">[25]<\/a><\/sup> to enable a cloud storage service to prove that a file subjected to verification is not corrupted. To that end, a formal and secure definition of proof of retrievability is presented, and the paper introduces the use of sentinels, which are special blocks hidden in the original file prior to encryption to be afterward used to challenge the cloud service. Based on Juels and Kaliski, Jr.'s work<sup id=\"rdp-ebb-cite_ref-JuelsPors07_25-1\" class=\"reference\"><a href=\"#cite_note-JuelsPors07-25\">[25]<\/a><\/sup>, Kumar and Saxena<sup id=\"rdp-ebb-cite_ref-KumarData11_26-0\" class=\"reference\"><a href=\"#cite_note-KumarData11-26\">[26]<\/a><\/sup> present another scheme where one does not need to encrypt all the data, but only a few bits per data block.\n<\/p><p>George and Sabitha<sup id=\"rdp-ebb-cite_ref-GeorgeData13_27-0\" class=\"reference\"><a href=\"#cite_note-GeorgeData13-27\">[27]<\/a><\/sup> propose a bipartite solution to improve privacy and integrity. The first part, called \u201canonymization,\u201d initially recognizes fields in records that could identify their owners and then uses techniques such as generalization, suppression, obfuscation, and the addition of anonymous records to enhance data privacy. The second part, called \u201cintegrity checking,\u201d uses public and private key encryption techniques to generate a tag for each record on a table. Both parts are executed with the help of a trusted third party called the \u201cenclave\u201d that saves all generated data that will be used by the de-anonymization and integrity verification processes.\n<\/p><p>An encryption-based integrity verification method is proposed by Kavuri <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-KavuriData14_28-0\" class=\"reference\"><a href=\"#cite_note-KavuriData14-28\">[28]<\/a><\/sup> The proposed method uses a new hash algorithm, the dynamic user policy-based hash algorithm, to calculate hashes of data for each authorized cloud user. For data encryption, an improved attribute-based encryption algorithm is used. The encrypted data and corresponding hash value are saved separately in cloud storage. Data integrity can be verified only by an authorized user and requires the retrieval of all the encrypted data and corresponding hash.\n<\/p><p>Al-Jaberi and Zainal<sup id=\"rdp-ebb-cite_ref-Al-JaberiData14_29-0\" class=\"reference\"><a href=\"#cite_note-Al-JaberiData14-29\">[29]<\/a><\/sup> provide another proposal to simultaneously achieve data integrity verification and privacy-preserving, which proposes the use of two encryption algorithms for every data upload or download transaction. The Advanced Encryption Standard (AES) algorithm is used to encrypt client data, which will be saved in a CSS, and an RSA-based partial homomorphic encryption technique is used to encrypt AES encryption keys that will be saved in a third-party entity together with a hash of the file. Data integrity is verified only when a client downloads one file.\n<\/p><p>Kai <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-KaiAnEffic13_30-0\" class=\"reference\"><a href=\"#cite_note-KaiAnEffic13-30\">[30]<\/a><\/sup> propose a data integrity auditing protocol to allow the fast identification of corrupted data using homomorphic cipher-text verification and a recoverable coding methodology. Checking the integrity of outsourced data is done periodically by either a trusted or untrusted entity. The adopted methodology aims at reducing the total auditing time and the communication cost.\n<\/p><p>The work of Wang <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-WangEnabling09_31-0\" class=\"reference\"><a href=\"#cite_note-WangEnabling09-31\">[31]<\/a><\/sup> presents a security model for public verification and assurance of stored file correctness that supports dynamic data operation. The model guarantees that no challenged file blocks should be retrieved by the verifier during the verification process, and no state information should be stored at the verifier side between audits. A Merkle hash tree (MHT) is used to save the authentic data value hashes, and both the values and positions of data blocks are authenticated by the verifier.\n<\/p><p>Our proposal in this paper differs from these described proposals since we introduce the idea of trust resulting from file integrity verification as an aggregate concept to evaluate the long-term behavior of a CSS and including most of the requirements specified in these other proposals, such as hashes of file blocks, freshness of verifications, and integrated support for auditing by an independent party. Further discussion on theses differences is presented in the \"Discussion\" section based on the results coming from the validation of our proposal.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Management_and_monitoring_of_CSS\">Management and monitoring of CSS<\/span><\/h3>\n<p>Some other research works were reviewed since their purpose is to provide management tools to ensure better use of the services offered by CSS providers, as well as monitoring functions regarding the quality of these services, thus allowing one to generate a ranking of these providers.\n<\/p><p>Pflanzner <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-PflanznerTowards16_32-0\" class=\"reference\"><a href=\"#cite_note-PflanznerTowards16-32\">[32]<\/a><\/sup> present an approach to autonomous data management within CSS. This approach proposes a high-level service that helps users to better manage data distributed in multiple CSS. The proposed solution is composed of a framework that consists of three components named MeasureTool, DistributeTool, and CollectTool. Each component is respectively responsible for performing monitoring processes for measuring the performance, splitting, and distributing file chunks between different CSS and retrieving split parts of a required file. Both historical performance and latest performance values are used for CSS selection and to define the number of file chunks that will be stored in each CSS.\n<\/p><p>Furthermore, they propose the use of cloud infrastructure services to execute applications on mobile data stored in CSS.<sup id=\"rdp-ebb-cite_ref-PflanznerTowards16_32-1\" class=\"reference\"><a href=\"#cite_note-PflanznerTowards16-32\">[32]<\/a><\/sup> In this proposal, the services for data management are run in one or more IaaS systems that keep track of the user storage area in CSS and execute the data manipulation processes when new files appear. The service running on an IaaS cloud downloads the user data files from the CSS, executes the necessary application on these files, and uploads the modified data to the CSS. This approach permits overcoming the computing capacity limitations of mobile devices.\n<\/p><p>The quality of services (QoS) provided by some commercial CSS is analyzed by Gracia-Tinedo <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-Gracia-TinedoActively13_33-0\" class=\"reference\"><a href=\"#cite_note-Gracia-TinedoActively13-33\">[33]<\/a><\/sup> For this, a measurement study is presented where important aspects such as transfer speed (upload\/download), behavior according to client geographic location, failure rate, and service variability related to file size, time, and account load are broadly explored. To perform the measurement, two platforms are employed, one with homogeneous and dedicated machines and the other with shared and heterogeneous machines distributed in different geographic locations. Furthermore, the measurement is executed using its own CSS REST interfaces, regarding mainly the methods PUT and GET, respectively used to upload and download files. The applied measurement methodology is demonstrated to be efficient and permits one to learn important characteristics about the analyzed CSS.\n<\/p><p>Our contributions in this paper comprise the periodic monitoring of files stored in the cloud, performed by an integrity checking service that is defined as an abstract role so that it can operate independently either the CSS provider or its consumer, preserving the privacy of stored file contents, and operating according to a new verification protocol. Both the tripartite architecture and the proposed protocol are described hereafter in this paper.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Proposed_architecture_and_protocol\">Proposed architecture and protocol<\/span><\/h2>\n<p>This section presents the proposed architecture that defines roles that work together to enable periodic monitoring of files stored in the cloud. Furthermore, the companion protocol that regulates how these roles interact with one another is detailed and discussed.\n<\/p><p>The architecture is composed of three roles: (i) Client, (ii) Cloud Storage Service (CSS), and (iii) Integrity Check Service (ICS). The Client represents the owner of files that will be stored by the cloud provider and is responsible for generating the needed information that is stored specifically for the purpose of file integrity monitoring. The CSS role represents the entity responsible for receiving and storing the client\u2019s files, as well as receiving and responding to challenges regarding file integrity that come from the ICS role. The ICS interfaces either with the Client of the CSS, so it acts as the responsible role for information regarding the Client files that are stored by the CSS and uses this information to constantly monitor the Client files\u2019 integrity by submitting challenges to the CSS and later validating the responses of the CSS to each verification challenge.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"The_proposed_protocol\">The proposed protocol<\/span><\/h3>\n<p>The trust-oriented protocol for continuous monitoring of stored files in the cloud (TOPMCloud) was initially proposed by Pinheiro <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-PinheiroAProposed16_34-0\" class=\"reference\"><a href=\"#cite_note-PinheiroAProposed16-34\">[34]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-PinheiroTrustOriented16_35-0\" class=\"reference\"><a href=\"#cite_note-PinheiroTrustOriented16-35\">[35]<\/a><\/sup> Then, it was further developed and tested, giving way to the results presented in this paper.\n<\/p><p>The TOPMCloud objective is to make the utilization of an outsourced service possible to allow clients to constantly monitor the integrity of their stored files in CSS without having to keep original file copies or revealing the contents of these files.\n<\/p><p>From another point of view, the primary requirement for the proposed TOPMCloud is to prevent the CSS provider from offering to and charging a client for a storage service that in practice is not being provided. Complementary requirements comprise low bandwidth consumption, minimal CSS overloading, rapid identification of a misbehaving service, strong defenses against fraud, stored data confidentiality, and utmost predictability for the ICS.\n<\/p><p>To respond to the specified requirements, TOPMCloud is designed with two distinct and correlated execution processes that are shown together in Figure 1. The first one is called \u201cFile Storage Process\u201d and runs on demand from the Client that is this process starting entity. The second is the \u201cVerification Process,\u201d which is instantiated by an ICS and is continuously executed to verify a CSS. An ICS can simultaneously verify more than one CSS by means of parallel instances of the Verification Process.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"069dd2b0a96c9bae84476cbcd57c2207\"><img alt=\"Fig1 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/c\/c1\/Fig1_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 1<\/b> Trust-oriented protocol for continuous monitoring of stored files in the cloud (TOPMCloud) processes<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The File Storage Process starts in the Client with the encryption of the file to be stored in the CSS. This first step, which is performed under the control of the file owner, is followed by the division of the encrypted file into 4096 chunks. These chunks are randomly permuted and are selected to be grouped into data blocks, each one with 16 distinct file chunks, and the position or address of each chunk is memorized. Then, hashes are generated from these data blocks. Each hash together with the set of its respective chunk addresses are used to build a data structure named the Information Table, which is sent to the ICS.\n<\/p><p>The selection and distribution of chunks used to assemble the data blocks are done in cycles. The number of cycles will vary according to the file storage period. Each cycle generates 256 data blocks without repeating chunks. The data blocks generated in each cycle contain all of the chunks of the encrypted file (256 * 16 = 4096).\n<\/p><p>The chosen values 4096, 16, and 256 come from a compromise involving the analysis of the protocol in the next subsections and the experimental evaluation that is presented in the \"Experimental validation\" section of this paper. Therefore, these values represent choices that were made considering the freshness of the information regarding the trust credited to a CSS, the time for the whole architecture to react to file storage faults, the required number of verifications to hold the trust in a CSS for a certain period of time, as well as the expected performance and the optimization of computational resources and network capacity consumption. The chosen values are indeed parameters in our prototype code, so they can evolve if the protocol requirements change.\n<\/p><p>The Verification Process in the ICS starts with the computation of how many files should be verified and how many challenges should be sent to a CSS, both numbers being calculated according to the trust level assigned to the CSS. Each stored hash and its corresponding chunk addresses will be used only once by the ICS to send an integrity verification challenge to the CSS provider.\n<\/p><p>In the CSS, the stored file will be used to respond to the challenges coming from the ICS. On receiving a challenge with a set of chunk addresses, the CSS reads the chunks from the stored file, assembles the data block, generates a hash from this data block, and sends this hash as the challenge answer to the ICS.\n<\/p><p>To finalize the verification by the ICS, the hash coming in the challenge answer is compared to the original file hash, and the result activates the trust level classification process. For this process, if the compared hashes are equal, this means that the verified content chunks are intact in the stored file in the CSS.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Trust_level_classification_process\">Trust level classification process<\/span><\/h3>\n<p>The trust level is evaluated as a real value in the range (\u22121, +1), with values from \u22121, meaning the most untrustful, to +1, meaning the most trustful, thus constituting the classification level that is attributed by the ICS to the CSS provider.\n<\/p><p>In the ICS, whenever a file hash verification process fails, the trust level of the verified CSS is downgraded, according to the following rules: when the current trust level value is greater than zero, it is set to zero (the ICS reacts quickly to a misbehavior from a CSS that was considered up to the moment as trustful); when the trust value is in the range between zero and \u22120.5, it is reduced by 15%; otherwise, the ICS calculates the value of 2.5% from the difference between the current trust level value and \u22121, and the result is subtracted from the trust level value (the ICS continuously downgrades a CSS that is still considered untrustful). These calculations are shown in Algorithm 1.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Alg1_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"86938169c90d555ba028042cf0632122\"><img alt=\"Alg1 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/1\/16\/Alg1_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Alg. 1<\/b> Pseudocode for computing the TrustLevel in the case of hash verification failures<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Conversely, whenever a checking cycle is completed without failures (all data blocks of a file have been checked without errors), the trust level assigned to a CSS is raised. If the current trust level value is less than 0.5, then the trust level value is raised by 2.5%. Otherwise, the ICS calculates the value of 0.5% from the difference between one and the current trust level value, and the result is added to the trust level value. These calculations are shown in Algorithm 2. This means that initially we softly redeem an untrustful CSS, while we exponentially upgrade a redeemed CSS and a still trustful CSS.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Alg2_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"15bd1e471915a5d917c674452fa43267\"><img alt=\"Alg2 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/3\/38\/Alg2_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Alg. 2<\/b> Pseudocode for computing the TrustLevel in the case of hash verification failures<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Again, these chosen thresholds and downgrading\/upgrading values come from the experimental evaluation that is presented in the \"Experimental validation\" section, based on performance and applicability criteria. They are indeed parameters in our prototype code, so they can evolve if the protocol requirements change.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Freshness_of_the_trust_verification_process\">Freshness of the trust verification process<\/span><\/h3>\n<p>Since it is important to update the perception that a Client has about a CCS provider, the observed values of trust regarding a CSS are also used to determine the rhythm or intensity of verifications to be performed for this CSS.\n<\/p><p>Thus, the freshness of results from the trust verification process is assured by updating in the ICS the minimum percentage values of the number of stored files to be verified in a CSS, as well as the minimum percentages of data blocks that should be checked. We choose to present these updates by day, though again, this is a parameter in our implemented prototype.\n<\/p><p>Consequently, according to the observed trust level for a CSS, the number of files and the percentage of these file contents checked in this CSS are set as specified in Table 1. In this table, the extreme values one and \u22121 should respectively represent blind trust and complete distrust, but they are not considered as valid for our classification purposes, since we expect trust to be an ever-changing variable, including the idea of redemption.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab1_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"bcfbaa9f2ccf6a6288e8ba3df1c30b20\"><img alt=\"Tab1 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/f\/f2\/Tab1_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 1<\/b> Classification of the trust levels for updating purposes<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Whenever the trust value equals zero, as a means to have a decidable system, a fixed value must be artificially assigned to it to preserve the dynamics of evaluations. Thus, if the last verified result is a positive assessment, the value +0.1 is assigned to the observed trust; otherwise, if a verification fault has been observed, the assigned value is \u22120.1.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Variation_of_the_trust_level_assigned_to_the_cloud_storage_service\">Variation of the trust level assigned to the cloud storage service<\/span><\/h3>\n<p>According to the TOPMCloud definition, the trust level assigned to a CSS always grows when a file-checking cycle is finished without the ICS detecting any verification failures during this cycle. Considering this rule, the first simulations regarding the evolution of trust in the ICS were used to determine the maximum number of days needed for the ICS to finish a checking cycle for a file stored in a CSS. The conclusion of a checking cycle indicates that each of the 4096 file chunks was validated as a part of one of the data blocks that are checked by means of the 256 challenges submitted by the ICS to the CSS.\n<\/p><p>The projected time for our algorithm to finish a file-checking cycle can vary between a minimum and a maximum value depending on the number of files simultaneously monitored by the ICS on a CSS. However, the checked file size should not significantly influence this time because the daily number of checked data blocks on a file is a percentage of the file size, as defined previously in Table 1.\n<\/p><p>By means of mathematical calculations, it is possible to determine that in a CSS classified with a \u201cvery high distrust\u201d level, i.e., the worst trust level, the maximum time to finish a checking cycle is 38 days. Comparatively, in a CSS classified with a \u201cvery high trust\u201d level, i.e., the best trust level, the time to finish a checking cycle can reach 1792 days. Figure 2 shows the maximum and the minimum number of days required to finish a file-checking cycle for each trust level proposed in TOPMCloud.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"a9ea4883ac9c3b31716c14415a312174\"><img alt=\"Fig2 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/a\/ab\/Fig2_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 2<\/b> Time required to complete a file-checking cycle<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Notwithstanding the mathematical calculations regarding the proposed protocol\u2019s maximum time required to finish a file-checking cycle, it is noticeable that this time can increase if the ICS or the CSS servers do not have enough computational capacity to respectively generate or to answer the necessary protocol challenges for each day. Furthermore, the file-checking cycle depends on the available network bandwidth and can worsen if the network does not support the generated packet traffic. This situation can occur when the number of CSS stored files is very large.\n<\/p><p>The variation of the time to conclude the checking cycle, according to the trust level assigned to the CSS, comes from the different number of data blocks verified per day. This variation aims to reward cloud storage services that historically have no faults, thus minimizing the consumption of resources such as processing capacity and network bandwidth. Moreover, this feature allows our proposed architecture to prioritize the checking of files that are stored in CSS providers, which have already presented faults. Consequently, this feature reduces the requested time to determine if other files were lost or corrupted.\n<\/p><p>Another interesting characteristic of the proposed protocol was analyzed with calculations that were realized to determine the number of file cycles concluded without identifying any fault so that the trust level assigned to a CSS raises to the highest trust level foreseen in Table 1, \u201cvery high trust.\u201d Figure 3 presents the results of this analysis using as a starting point the \u201cnot evaluated\u201d situation, which corresponds to a trust level equal to zero assigned to a CSS.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"54221820e30ebf397a2927cde711691d\"><img alt=\"Fig3 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/5\/5a\/Fig3_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 3<\/b> Expected best performing trust level evolution for a CSS.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>From the analysis of the results shown in Figure 2 and Figure 3, it could be concluded that the requested time for a CSS to obtain the maximum trust level is so large that it will be practically impossible to reach this level. This conclusion is easily obtained using the maximum number of days needed to finish a checking cycle for the \u201chigh trust\u201d level (896) multiplied by the number of successfully concluded cycles to reach the level of \u201cvery high trust\u201d (384 \u2212202 = 182). The result of this calculation is 163.072 days (182 * 896), which is approximately 453 years.\n<\/p><p>Although this is mathematically correct, in practice, this situation would never occur. The simple explanation for this fact is related to the number of files that have been simultaneously monitored by the ICS in the CSS. The maximum expected time for the file-checking cycle conclusion only occurs when the number of monitored files in a CSS, classified with the level \u201chigh trust,\u201d is equal to 25 or a multiple of this value. According to Table 1, this is due to the fact that, at the \u201chigh trust\u201d level, it is required that 16% of the file content should be checked by day. The maximum time spent in file checking only occurs when the result of this file percentage calculation is equal to an integer value. Otherwise, the result is rounded up, thus increasing the percentage of files effectively checked.\n<\/p><p>Indeed, if the ICS is monitoring exactly 25 files in a CSS that is classified with the \u201chigh trust,\u201d level and supposing that these files were submitted to CSS in the same day, the checking cycles for this set of files will finish in 896 days. Since in a period of 896 days, there are 25 concluded cycles, then about 20 years are needed for the CSS to attain the 182 cycles requested for reaching the next level, \u201cvery high trust.\u201d However, this situation worsens if the number of considered files decreases. For instance, considering the \u201chigh trust\u201d level, if there are only six files being monitored, then the time to attain the next level exceeds 65 years.\n<\/p><p>Figure 4 presents a comparative view of the time required to upgrade to the next trust level according to the number of monitored files. In general, less time will be required to increase the trust level if there are more monitored files.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"004dc13598314bc0cb620e4dd5b1b111\"><img alt=\"Fig4 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/2\/2b\/Fig4_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 4<\/b> Time to upgrade the trust level according to the number of monitored file<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>As can be seen in Figure 4, the best case is obtained when the number of monitored files is equal to the required number of successfully concluded cycles to upgrade to the next trust level. For this number of files, the time required to increase the trust level is always equal to the time needed to conclude one checking cycle.\n<\/p><p>Opposite to the trust level raising curve that reflects a slow and gradual process, the trust level reduction is designed as a very fast process. The trust value assigned to the CSS always decreases when a challenge result indicates a fault in a checked file.\n<\/p><p>To evaluate the proposed process for downgrading the measured trust level, calculations were performed aiming to determine how many file-checking failures are needed for a CSS to reach the maximum distrust level. Any trust level between \u201cvery high trust\u201d and \u201clow trust\u201d could be used as the starting point to these calculations. Then, when a challenge-response failure is identified, the trust value is changed to zero and the CSS is immediately reclassified to the \u201clow distrust\u201d level. From this level to the \u201cvery high distrust\u201d level, the number of file-checking failures required to reach each next distrust level is shown in Figure 5.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig5_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"087aef02a20b85070f2c605e66716b82\"><img alt=\"Fig5 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/1\/1e\/Fig5_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 5<\/b> Number of file-checking failures needed to downgrade to each distrust level<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Similarly to the trust level raising process, the required minimum time to downgrade to a distrust level is determined by the number of simultaneously-monitored files. Figure 6 presents a comparative view of the required minimum time to downgrade a CSS considering that all monitored files are corrupted and that failures will be identified upon the ICS receiving the first unsuccessful challenge response from the CSS.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig6_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"cfec7a5842f6f6edb4a5dceb397e769f\"><img alt=\"Fig6 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/7\/74\/Fig6_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 6<\/b> Trust level downgrade according to the number of monitored files<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>An important difference between the process to downgrade the trust level assigned to a CSS and the opposite process to upgrade this trust level is that the downgrade time is preferably presented as a number of days, whereas the upgrade time is preferably presented in years. As shown in Figure 6, the minimum time to downgrade a trust level will be one day when the number of monitored files is equal to or greater than the number of identified challenge failures required to downgrade a trust level according to Figure 5.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Architecture_implementation\">Architecture implementation<\/span><\/h2>\n<p>The implementation of the architecture was organized as a validation project comprising three phases. The first phase was devoted to the processes under the responsibility of the client in our proposal. The second and the third phases were respectively aimed at implementing the processes under the ICS responsibility and the processes under the responsibility of the CSS. Hence, a completely functional prototype was used for the validation of the proposed architecture and protocol.\n<\/p><p>In each implementation phase, one application was developed using Java Enterprise Edition (Java EE) components, such as Java Persistence API (JPA), Enterprise JavaBeans (EJB), Contexts and Dependency Injection (CDI), and Java API for XML Web Services (JAX-WS).<sup id=\"rdp-ebb-cite_ref-JendrockJava14_36-0\" class=\"reference\"><a href=\"#cite_note-JendrockJava14-36\">[36]<\/a><\/sup> A desktop application was developed for the Client, while two web service applications were developed respectively for the ICS and CSS. The chosen application server was Glassfish 4<sup id=\"rdp-ebb-cite_ref-OracleGlassFish14_37-0\" class=\"reference\"><a href=\"#cite_note-OracleGlassFish14-37\">[37]<\/a><\/sup>, and the chosen database management system (DBMS) was <a href=\"https:\/\/www.limswiki.org\/index.php\/PostgreSQL\" title=\"PostgreSQL\" class=\"wiki-link\" data-key=\"a5dd945cdcb63e2d8f7a5edb3a896d82\">PostgreSQL<\/a>.<sup id=\"rdp-ebb-cite_ref-PostgreSQL_38-0\" class=\"reference\"><a href=\"#cite_note-PostgreSQL-38\">[38]<\/a><\/sup>\n<\/p><p>These platforms were chosen to take into consideration the distributed characteristics of the proposed architecture and the need for continuous and asynchronous protocol communications between the predicted roles in this architecture. Thus, the choice of Java EEwas determined by its usability for the implementation of web services, task scheduling, event monitoring, and asynchronous calls. Both the Glassfish application server and the PostgreSQL DBMS were chosen because they are open-source applications and fully meet the developed application needs.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Client_application\">Client application<\/span><\/h3>\n<p>The Client application\u2019s main tasks are: to encrypt a file and to send it to one or more CSS, to split and select the file chunks, to assemble the data blocks and to generate their hashes, to group them in cycles, to generate the Information Table, and to send this table to the ICS. In our prototype, the client application also allows one to control the inventory of stored files in each CSS, to store the cryptographic keys, to look for pieces of information about the verification process and the file integrity in each CSS, to retrieve a file from a CSS, confirming its integrity, and deciphering its contents.\n<\/p><p>These functions are accessible in the Client application by means of a graphical interface through which the user selects files from the file system and at least one CSS to store the chosen files, as well as an ICS to verify the storage service used for these files. This same interface allows the client to inform about both the password to be used in the file encryption process and the number of years corresponding to the period to keep the file stored in the CSS. Furthermore, a numerical seed is given to add entropy to the process of choosing the chunks that will compose each data block. Figure 7 shows the application interface with the implemented \u201cUpload File\u201d function.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig7_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"ace519f7dd281f1673f3d698a37eaad0\"><img alt=\"Fig7 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/3\/3f\/Fig7_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 7<\/b> The client interface showing the \u201cUpload File\u201d function<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>In this prototype, the list of available CSS comes from previous registration activity, by means of which the user first registers the ICS with which it maintains a service level agreement. After an ICS has been selected, the Client application obtains a list of CSS from the selected ICS web service. This initial process requires the client user to maintain some contractual relationship regarding file storage verifications involving the Client application, the registered ICS, and the selected CSS. Then, the Client user is informed about the current trust level assigned by the ICS to each concerned CSS.\n<\/p><p>The file encryption process is performed using resources available in the \u201cjavax.crypto\u201d package<sup id=\"rdp-ebb-cite_ref-OracleJavaPlat_39-0\" class=\"reference\"><a href=\"#cite_note-OracleJavaPlat-39\">[39]<\/a><\/sup>, using the AES cryptographic algorithm<sup id=\"rdp-ebb-cite_ref-NISTAdvanced01_40-0\" class=\"reference\"><a href=\"#cite_note-NISTAdvanced01-40\">[40]<\/a><\/sup>, a 256-bit key, and the cipher-block chaining (CBC) operation.<sup id=\"rdp-ebb-cite_ref-BellareTheSec94_41-0\" class=\"reference\"><a href=\"#cite_note-BellareTheSec94-41\">[41]<\/a><\/sup>\n<\/p><p>The process of sending an encrypted file to a CSS was implemented using Java threads so that it is possible to simultaneously initiate the uploading of the encrypted file to each selected CSS. Furthermore, by means of threads, it is possible to proceed with the next steps in parallel, without the need to wait for a complete file upload, an operation that takes a variable duration according to the file size and the network characteristics.\n<\/p><p>The calculations of either the number of cycles and the chunk distribution is executed according to the number of years that the file must be stored by the CSS and monitored by the ICS. Although the number of used cycles should vary according to the trust level assigned to the CSS, in our validation prototype for these calculations, we choose to consider the worst case value, corresponding to the \u201chigh distrust\u201d level.\n<\/p><p>Each chunk address code is obtained through the SHA1PRNG algorithm, a pseudo-random number generator, executed by \u201cSecureRandom,\u201d a class from the \u201cjava.security\u201d package.<sup id=\"rdp-ebb-cite_ref-OracleJavaPlat_39-1\" class=\"reference\"><a href=\"#cite_note-OracleJavaPlat-39\">[39]<\/a><\/sup> To produce the hashes, the cryptographic function Blake2<sup id=\"rdp-ebb-cite_ref-AumassonBLAKE2_13_21-1\" class=\"reference\"><a href=\"#cite_note-AumassonBLAKE2_13-21\">[21]<\/a><\/sup> was chosen due to its speed, security, and simplicity.\n<\/p><p>Also in our prototype, the monitoring module shown in Figure 8 was developed to provide a practical tool for the user to access functions such as: to manage the inventory of stored files in a set of CSS providers, to check the file status assigned by the ICS according to results from verifications of the CSS, to download files from the CSS, and to decipher the downloaded files.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig8_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"e559d060952e51f7cba6ec933fda966d\"><img alt=\"Fig8 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/1\/12\/Fig8_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 8<\/b> The monitoring module entry screen<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>In the monitoring module, the \u201cStatus\u201d button shows information such as: the file source folder, the identification hash, the name of the CSS where the file is stored, the name of the ICS responsible for monitoring this CSS, the number of concluded checking cycles, the number of cycles that is currently being performed, the number of data blocks already checked in the current cycle, the ICS monitoring period for the last day, and its current status. Figure 9 shows the file status query screen.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig9_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"647dc5fa34e6ca37f7e7c50fc7eafaa1\"><img alt=\"Fig9 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/9\/98\/Fig9_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 9<\/b> File status query screen<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Integrity_checking_service_application\">Integrity checking service application<\/span><\/h3>\n<p>The ICS implementation comprises a web service application called \u201cVerifierService\u201d that presents the following functionalities: to receive the File Information Table submitted by the Client application, to select and send challenges regarding monitored files to the CSS that host them, to manage pending responses to challenges, to receive challenge responses from the CSS, to update the CSS trust levels, to receive and to answer to requests for the monitored file status, to receive, and to answer requests for information about the CSS with which there is a monitoring agreement.\n<\/p><p>In the File Information Table received from the Client application, the CSS is represented by the identifier of the monitoring contract between the ICS and the CSS. For each CSS contract in the information table, the ICS application saves a new record in a table named \u201carchives\u201d so that the monitoring of one file copy does not interfere with the monitoring of other copies. The following information is stored in the archives table: a file identifier hash, the chunk size in bytes, the number of generated cycles, and the contract identifier that defines the CSS where the file is stored.\n<\/p><p>For each data block hash received by means of the information table coming from a Client, a new record is generated by the ICS application in a table named \u201cblocks.\u201d The following information is stored in this table: the data block hash, the chunk address codes that compose this block, the cycle number to which it belongs, and the \"archives\" record identifier.\n<\/p><p>The process of selecting, generating, and sending challenges is an activity performed periodically (in our prototype, daily) by the ICS. This process comprises the following actions: selecting files to be checked in each CSS, selecting data blocks to be checked in each file, generating challenges, and sending the challenges to the CSS.\n<\/p><p>The trust level assigned to a CSS will be decremented whenever the ICS identifies a corrupted data block in a file. Conversely, the trust level is incremented after all data blocks from the same cycle have been the object of challenges to the CSS, and all the answers to these challenges confirm the integrity of the checked blocks. Other results obtained in a file marked as corrupted, whether positive or negative, will simply be ignored.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Cloud_storage_service_application\">Cloud storage service application<\/span><\/h3>\n<p>The CSS application was developed as a web service named \u201cStorerWebService.\u201d This application provides a service capable of receiving challenges from an ICS, processing them, and sending back to the ICS the challenge responses.\n<\/p><p>The CSS application in our prototype implementation includes the following functionalities: to receive challenges from an ICS and storing them in the CSS database, to monitor the database and process the pending challenges, to store the responses in the database, and to monitor the database to find and send pending responses to the ICS. Furthermore, the application also includes features for uploading and downloading files to simulate the features normally available in a CSS.\n<\/p><p>The challenge-receiving functionality is provided in ICS by means of a method called \u201casyncFileCheck\u201d that gets as the input parameter an object of the class called \u201cChallengeBean.\u201d This object contains the attributes \u201cidentifier,\u201d \u201caddressCodes,\u201d \u201cchunkLength,\u201d \u201cresponseUrl,\u201d and \u201cid,\u201d which respectively represent: the file identifier hash, the array with the set of chunk addresses codes to be read in the file and that will compose the data block on which the response hash will be generated, the chunk size in bytes, the ICS web service URL responsible for receiving the challenge answers, and the challenge identifier.\n<\/p><p>After receiving a challenge, the information is extracted from the ChallengeBean object and is stored in the CSS database, where it gets the \u201cWaiting for Processing\u201d status. The advantage of storing the received challenges for further processing is related to the organization of our general architecture for saving computational resources. This CSS asynchronous procedure prevents the ICS from needing to keep a process locked awaiting a response from the CSS for each submitted challenge, considering that the required time to process a challenge varies according to the checked file size, the number of simultaneously-received challenges, and the CSS computational capacity.\n<\/p><p>Another designed consequence of this model is the possibility of performing load distribution since the services responsible for receiving, processing, and responding to the challenges can be provided by totally different hardware and software infrastructures. The only requirement is that all infrastructure components must share access to the same database.\n<\/p><p>The response hash is generated from a data block assembled with file chunks that are read from the file being checked. The used chunks are those referenced by the 16 chunk addresses defined in the \u201caddressCodes\u201d attribute of the \u201cchallenge\u201d object. These address codes are integers ranging from zero to 4095 that are multiplied by the chunk size to obtain the address of the first byte of each chunk in the file.\n<\/p><p>After the completion of the chunk reading task, the obtained data are concatenated, forming a data block. From this data block, a 256-bit hash is generated using the Blake2 hash cryptographic function.<sup id=\"rdp-ebb-cite_ref-AumassonBLAKE2_13_21-2\" class=\"reference\"><a href=\"#cite_note-AumassonBLAKE2_13-21\">[21]<\/a><\/sup> The generated hash is saved in the database while waiting to be sent back to the ICS. A specific process monitors the pendent hash responses and sends it as challenge answers to ICS.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Experimental_validation\">Experimental validation<\/span><\/h2>\n<p>This section describes the setup used to perform the experiments designed to evaluate the performance, efficiency, and efficacy of the proposed protocol TOPMCloud. Then, the results of the experimental validation are presented and discussed.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Experimental_setup\">Experimental setup<\/span><\/h3>\n<p>Our experimental environment comprises five similar virtual machines (VM), each one running with 12 GB memory, 200 GB of hard disk space, running under the operating system Ubuntu Server 14.04 LTS Server. All of them were set to run in a private cloud hosted by the Decision Technologies Laboratory at the University of Bras\u00edlia, Brazil. Since this setup presents common functionalities that are found in most commercial CSS providers, it is considered as a configuration that adequately represents the utilization of these services as provided by commercial cloud services. The basic operating system functions that are required from the CSS provider are file access operations and hash calculations\u2014which are commonly available in cloud computing services\u2014and object class libraries. Otherwise, these functions can be easily deployed by commands from the cloud services client. These VMs are configured to perform the three roles designed in our architecture, with the following distribution of services in the VM set: one VM holds the Client role; one VM performs the Integrity Check Service (ICS) role; and the remaining three VMs hold the Cloud Storage Services (CSS) role.\n<\/p><p>The experiments were realized using four files with different sizes (2.5, 5, 10, and 15 GB). For each file, an information table was generated considering the utilization of the file storage service and its monitoring during five different time periods (1, 5, 10, 20, and 30 years). Files with diverse content types and formats, such as International Organization for Standardization (ISO) 9660, XenServer Virtual Appliance (XVA), and Matroska Video (MKV), were used. For each considered time period, cryptographic keys with the same size were used, but with different and aleatory values, so that each generated encrypted file was completely distinct from other files generated from the same origin file.\n<\/p><p>With the described configuration operating during three months, logs were generated from the ICS monitoring process verifying files stored in the three CSS providers. In total, 60 files were monitored by the ICS, 20 of them stored in each CSS. In this period, some CSS servers were randomly chosen to be turned off and after a while to be switched on again, in order to simulate fault conditions.\n<\/p><p>In order to evaluate the behavior of the proposed protocol, some experiments were realized with contextual modifications, including the deliberate change of file contents and the change of trust levels assigned to each CSS.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Submission_of_files\">Submission of files<\/span><\/h3>\n<p>Our experiments begin by observing the performance during the process of preparation and submission of files to CSS and the related transmission of their respective information tables to the ICS. The steps of this process have their duration time measured so that we collect observations regarding the following tasks: encrypting the source file to an encrypted file, hashing this encrypted file, computing cycles and distributing chunks on data blocks, hashing these data blocks, and finally sending the information table to ICS.\n<\/p><p>Each of these tasks has its execution time varying according to the processed file size or the foreseen time period for its storage in the CSS. We present hereafter average times taken from a number of 20 controlled repetitions of the experiment or test. Figure 10 shows both the encryption and the hash generation average time by file sizes.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig10_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"056c4498771a5667a625d6b1f089f543\"><img alt=\"Fig10 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/6\/65\/Fig10_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 10<\/b> Average time for file encryption and hash generation by file size<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>As explained before, the task \u201ccomputing cycles and distributing chunks\u201d is responsible for selecting 16 chunk address codes for each data block required for filling the computing cycles. Hence, its execution time varies exclusively according to the file storage period. Figure 11 shows the required average time for computing cycles and distributing chunks.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig11_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"69915bbe27c4741780b395f086ed370a\"><img alt=\"Fig11 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/f\/fe\/Fig11_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 11<\/b> Required average time for computing cycles and distributing chunks<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The task \u201chashing data blocks\u201d comprises actions for randomly reading the chunks in the encrypted file, assembling data blocks by the concatenation of chunks and, finally, the hash generation from each assembled data block. In this task, the execution time varies according to both the file size and the expected cloud storage time. The larger the file, the larger will be the size of each data block. Additionally, the longer the storage period, the greater the quantity of data blocks to be generated. Figure 12 shows a graph with the time variation for generating data block hashes according to the file size and the chosen storage period.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig12_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"38b5d3dfebe4b334714035a1b9201cfd\"><img alt=\"Fig12 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/a\/a9\/Fig12_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 12<\/b> Average time for data block hashing<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Observing Figure 12, it is possible to verify that the time for generating all hashes from a 15 GB file presents a disproportionate growth when compared with the other file sizes, independent of the cloud storage period. From this observation, it is possible to infer that for files with sizes of the order of 15 GB or greater it is necessary to optimize the proposed protocol.\n<\/p><p>Another important component in the file submission process total execution time is the time required to send the information table to the ICS. This time varies according to the quantity of generated data blocks, which in turn varies according to the storage period for the file in the CSS. The measured time in this task comprises the connection with web service in the ICS, the sending of the information table, its storage in the ICS database, and the receiving of a successful storage confirmation from the ICS. Figure 13 shows the required average time for sending an information table to the ICS according to the CSS storage period.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig13_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"9a9f9cbbfec47a8ce9b6829a4f1cb29d\"><img alt=\"Fig13 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/1\/18\/Fig13_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 13<\/b> Required average time for sending an information table to the integrity check service (ICS)<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Network_bandwidth_consumption\">Network bandwidth consumption<\/span><\/h3>\n<p>We show results from experiments realized to determine the effective consumption of network bandwidth by the file monitoring process execution. By design, the TOPMCloud implies a variation in network resource consumption according to the trust level assigned to the CSS.\n<\/p><p>For evaluating this feature, each trust level foreseen in TOPMCloud was successively assigned to a CSS, and for each assigned level, the ICS performed daily file verifications on the said CSS. The measurements regarding the traffic due to challenges sent by the ICS and corresponding answers were based on traffic collection with the Wireshark tool. Using a controlled network environment and with filters applied on Wireshark, it was possible to exclusively capture packets generated either by the concerned ICS and the CSS applications. Figure 14 shows the average daily network bandwidth consumption by stored file according to the trust level assigned to the CSS.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig14_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"78f215e77d6a9534d788c3332f8cd6a0\"><img alt=\"Fig14 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/0\/0d\/Fig14_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 14<\/b> Average daily network bandwidth consumption by stored file<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The rate of network traffic per stored files attains its maximum value when the number of stored files in the CSS is equal to one since the ICS is required to use network bandwidth just for this file. In this case, at least one of these file data blocks will be verified per day, independent of the trust level assigned to the CSS. If this trust level is higher and there is a set of files to be verified, the network traffic will serve to monitor percentages of these files\u2019 contents. The network traffic per stored file always attains its minimum when an integer value is attributed to the percentage computation defined at Table 1, column \u201cFiles verified by day,\u201d for the number of files stored in the CSS, according to its trust level.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"File_integrity_checking_by_the_ICS\">File integrity checking by the ICS<\/span><\/h3>\n<p>As mentioned before, this operation of the ICS was monitored for three months, in a setup involving one ICS and three CSS. During that period, either the ICS or the CSS stored their logs in text files, with each log record containing the execution time spent with the actions related to the file integrity-checking process.\n<\/p><p>As the file integrity checking is executed by means of challenges sent by the ICS to the CSS, the response time is directly proportional to the verified file size. Thus, the information about challenge results obtained from the ICS logs was grouped and classified according to verified file size. The registered values in the ICS logs comprise the executed actions since the moment of storing the challenge in the \u201crequests\u201d table up to the receiving of the respective answer sent by the CSS. In our experiment, the ICS logs registered 1340 records of successful checking results. Figure 15 shows the medium, maximum, and minimum time spent to process a challenge by the verified file size.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig15_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"7d5b4ac8bcb5f7261842adf636ea3d3b\"><img alt=\"Fig15 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/3\/39\/Fig15_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 15<\/b> Time spent to conclude the processing of a challenge by file size<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The most time-consuming actions in the processing of a challenge are realized in the CSS and comprise the reassembly of the data block from the stored file chunks, followed by the task of data block hashing. From the three CSS used in our experiment, the number of records collected in their logs was respectively 360, 470, and 510. These records registered the time spent by each CSS in the execution of the aforementioned actions. In spite of the same number of files having been stored in all three CSS, the number of processed challenges for each CSS varied, because each CSS was randomly turned down for a random time interval. Figure 16 shows the comparison between the average time spent by each CSS answering the requested challenges coming from the ICS.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig16_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"79114db35f7313ceced6f2f145f5abfd\"><img alt=\"Fig16 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/8\/86\/Fig16_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 16<\/b> Average time spent by the cloud storage service (CSS) to answer ICS challenges, by CSS and file size<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Simulation_of_file_storage_faults\">Simulation of file storage faults<\/span><\/h3>\n<p>The simulation of integrity faults in files stored in the CSS was performed by means of the controlled modification of bytes within some stored files. This simulation purpose was to determine how much time is required by the ICS to identify a fault. For this simulation, 10 out of the 15 \u201cfive GB checked files\u201d were chosen, and then these files were randomly distributed in two groups each one possessing five files, which had in common the same number of modifications in their bytes.\n<\/p><p>In File Group 1, 5368 sequential bytes (0,0001%) were changed in each file, and the position for this modification inside each file was randomly chosen. In Group 2, in each file, a number of 5,368,709 sequential bytes (1%) was changed at the end of the file.\n<\/p><p>Figure 17 shows the results regarding the perception of these faults by the ICS.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig17_Pinheiro_Sensors2018_18-3.png\" class=\"image wiki-link\" data-key=\"d48185fce4124022c6a7de42d8d644aa\"><img alt=\"Fig17 Pinheiro Sensors2018 18-3.png\" src=\"https:\/\/www.limswiki.org\/images\/f\/f3\/Fig17_Pinheiro_Sensors2018_18-3.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Fig. 17<\/b> Detection by the ICS of integrity faults in files stored in a CSS<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Discussion\">Discussion<\/span><\/h2>\n<p>The related research works described in \"Related works\" present viable alternatives that respond to the problem of verifying the integrity of files stored in the cloud, but none of them presents a unique and complete solution that allows permanent and active monitoring, execution by an independent third party, file integrity verification without access to file content, low and predictable computational cost, and balanced computational resources\u2019 consumption according to the measured CSS QoS. The main differences between the protocol proposed in this work and those of the cited related works is further discussed as follows.\n<\/p><p>With the proposals by Juels and Kaliski, Jr.<sup id=\"rdp-ebb-cite_ref-JuelsPors07_25-2\" class=\"reference\"><a href=\"#cite_note-JuelsPors07-25\">[25]<\/a><\/sup>, and Kumar and Saxena<sup id=\"rdp-ebb-cite_ref-KumarData11_26-1\" class=\"reference\"><a href=\"#cite_note-KumarData11-26\">[26]<\/a><\/sup>, small changes in files cannot be detected because only a few specific bits in each file are tested, while in our TopMCloud, all bits are tested. In this case, our solution benefits from the integrity check proposed and tested according to the presented results.\n<\/p><p>The proposed solution by George and Sabitha<sup id=\"rdp-ebb-cite_ref-GeorgeData13_27-1\" class=\"reference\"><a href=\"#cite_note-GeorgeData13-27\">[27]<\/a><\/sup> requires a trusted third party to save pieces of information about the files. The trusted third party is necessary because it is not possible to restore the original file content without the saved information. Oppositely, in TopMCloud, the third party can be untrusted because it verifies the file integrity without having direct access to the encrypted file or any information related to its original content.\n<\/p><p>With Kavuri <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-KavuriData14_28-1\" class=\"reference\"><a href=\"#cite_note-KavuriData14-28\">[28]<\/a><\/sup> and Al-Jaberi and Zainal<sup id=\"rdp-ebb-cite_ref-Al-JaberiData14_29-1\" class=\"reference\"><a href=\"#cite_note-Al-JaberiData14-29\">[29]<\/a><\/sup>, it is necessary to retrieve the whole ciphered file to verify its integrity, while in TopMCloud, the monitoring is constantly performed without retrieving from the CSS any bit of the stored file. In this case, our solution reduces bandwidth consumption when network traffic is to be considered and still guarantees the integrity of the files when their storage time has to be taken into account, according to the client needs.\n<\/p><p>With the proposals of Kai <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-KaiAnEffic13_30-1\" class=\"reference\"><a href=\"#cite_note-KaiAnEffic13-30\">[30]<\/a><\/sup> and Wang <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-WangEnabling09_31-1\" class=\"reference\"><a href=\"#cite_note-WangEnabling09-31\">[31]<\/a><\/sup>, the monitoring solutions are based on asymmetric homomorphic algorithms, a type of cryptography scheme that consumes large amounts of computational resources. In TopMCloud, hashes perform faster and consume fewer resources. Thus our solution benefits from speed in processing large files and being able to maintain integrity and confidentiality.\n<\/p><p>Another interesting consideration is that none of the related works were designed to monitor large files with 5, 10, and 15 GB, or more than that. Consequently, among the reviewed publications, there were none presenting test results analogous to those coming from the TopMCloud validation process. For this reason, it was not possible to perform a qualitative analysis of the results obtained with the tests applied in TopMCloud in comparison to the mentioned related works.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusion_and_future_work\">Conclusion and future work<\/span><\/h2>\n<p>In this paper, a distributed computational architecture was proposed aimed at monitoring the integrity of stored files in a Cloud Storage Service (CSS) without compromising the confidentiality of these files\u2019 contents. The proposed architecture and its supporting protocol leverage the notion of trust so that a CSS consumer uses a third-party file-checking service, the Integrity Check Service (ICS), to continuously challenge the CSS provider regarding the integrity of stored files and, based on these verifications, to present a level of trust attributed to this CSS provider.\n<\/p><p>Based on the behavior of each CSS, the file-checking frequency adapts dynamically, either increasing if stored file integrity failures are observed or decreasing if a low failure rate is observed. Consequently, the verification effort is oriented toward being more intensive when it is effectively needed, thus optimizing computational and network resource consumption regarding the proposed protocol\u2019s execution.\n<\/p><p>The proposed protocol was also designed to address requirements such as low bandwidth consumption, capacity to quickly identify misbehaving storage services, strong resistance against fraud, reduced CSS overhead, confidentiality of stored file contents and capacity to provide predictability, and maximum resource savings for the ICS.\n<\/p><p>Another view of our proposal is that it was designed to provide an efficient control over the integrity of files stored in a CSS without overloading the service providers that present appropriate behavior, but quickly acting if this behavior becomes problematic, which requires our architecture to identify faults and provide early alerts about corrupted files to their owners. These are the main reasons for choosing in our design to perform file integrity monitoring by means of the hashing of stored file parts, without that ICS needing to directly access file contents and avoiding the CSS from processing the complete file on each checking.\n<\/p><p>The design choice of hashing and verifying data blocks, which are assembled from randomly-chosen file chunks, was demonstrated in our experimental validation as an effective method to detect the simulated file integrity faults that were injected in the CSS under test. Even small modifications in very large files took an average of 14 days to be identified.\n<\/p><p>Furthermore, based on the test results in our experimental setup, it was possible to verify that the time taken to generate a File Information Table, as well as the size of this table, can be considered adequate, being proportional to the file size. The network bandwidth consumption for the monitoring was very low regardless of the trust level assigned to a CSS.\n<\/p><p>Another feature of the proposed architecture is that if necessary, it can be used to avoid the CSS consumer needing to store copies of files that are stored in the cloud since the consumer is ensured the capacity to retrieve the files from multiple CSS providers, which constitutes a redundant cloud storage configuration. In this case, the redundancy level must be appropriately chosen by the storage client according to the file information criticality and using the measurements made by the ICS regarding the trust in each CSS used. The CSS classifications in trust levels, according to their availability and their stored file integrity history, besides being applied to relieve computational load for a well-behaving CSS, also allow the clients to use this information to critically select the most suitable CSS to be contracted, according to the information criticality level associated with the files that will be stored in the cloud.\n<\/p><p>The proposed architecture proved to be quite robust during the tests, satisfactorily responding to the fault simulations. Interestingly enough, the developed prototype also resisted failures not foreseen in the protocol very well, such as unplanned server shutdown (due to electricity outages), not requiring any human intervention for the functionalities to return after restarting the system.\n<\/p><p>As future work, it is intended to add a functionality that allows the sharing of the measured CSS trust level between different ICS. This functionality would allow, for example, that a fault identified in a CSS by an ICS would alert others ICS so that they can pro-actively react, prioritizing the checking of files stored in that CSS.\n<\/p><p>Aiming to obtain better performance with files larger than 30 GB, it is intended to test the modifications of our protocol parameters, such as increasing the number of chunks per file and\/or the number of chunks per data blocks. In this same sense, it is also intended to improve the architecture implementation in order to configure different processing parallelism schemes.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>This research work was supported by Sungshin W. University. In addition, A.P. thanks the Brazilian Army Science and Technology Department. E.D.C. thanks the Ministry of Planning, Development and Management (Grant SEST - 011\/2016). R.T.d.S.J. thanks the Brazilian research and innovation Agencies CAPES - Coordination for the Improvement of Higher Education Personnel (Grant 23038.007604\/2014-69 FORTE - Tempestive Forensics Project), CNPq - National Council for Scientific and Technological Development (Grant 465741\/2014-2 Science and Technology National Institute - INCT on Cybersecurity), FAPDF - Research Support Foundation of the Federal District (Grant 0193.001366\/2016 - UIoT - Universal Internet of Things), the Ministry of Planning, Development and Management (Grant 005\/2016 DIPLA), and the Institutional Security Office of the Presidency of the Republic of Brazil (Grant 002\/2017). R.d.O.A. thanks the Brazilian research and innovation Agencies CAPES - Coordination for the Improvement of Higher Education Personnel (Grant 23038.007604\/2014-69 FORTE - Tempestive Forensics Project), CNPq - National Council for Scientific and Technological Development (Grant 465741\/2014-2 Science and Technology National Institute - INCT on Cybersecurity), FAPDF - Research Support Foundation of the Federal District (Grant 0193.001365\/2016 - SSDDC - Secure Software Defined Data Center), and the Institutional Security Office of the Presidency of the Republic of Brazil (Grant 002\/2017).\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Author_contributions\">Author contributions<\/span><\/h3>\n<p>A.P., E.D.C. and R.T.d.S.J conceived the security architecture and the proposed protocol for trust verifications regarding the integrity of files in cloud services. A.P. developed the corresponding prototype for validation purposes. R.d.O.A., L.J.G.V and T.H.K. conceived the experiments and specified data collection requirements for the validation of results. All authors contributed equally to performing the experiments, analyzing resulting data and writing the paper.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Conflicts_of_interest\">Conflicts of interest<\/span><\/h3>\n<p>The authors declare no conflict of interest.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-TandelAnImplem13-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-TandelAnImplem13_1-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Tandel, S.T.; Shah, V.K.; Hiranwal, S. (2013). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/web.archive.org\/web\/20150118081656\/http:\/\/ijsacs.org\/previous.html\" data-key=\"20cb93ab5eacfef99a1720c4f16cddc9\">\"An implementation of effective XML based dynamic data integrity audit service in cloud\"<\/a>. <i>International Journal of Societal Applications of Computer Science<\/i> <b>2<\/b> (8): 449\u2013553<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/web.archive.org\/web\/20150118081656\/http:\/\/ijsacs.org\/previous.html\" data-key=\"20cb93ab5eacfef99a1720c4f16cddc9\">https:\/\/web.archive.org\/web\/20150118081656\/http:\/\/ijsacs.org\/previous.html<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+implementation+of+effective+XML+based+dynamic+data+integrity+audit+service+in+cloud&rft.jtitle=International+Journal+of+Societal+Applications+of+Computer+Science&rft.aulast=Tandel%2C+S.T.%3B+Shah%2C+V.K.%3B+Hiranwal%2C+S.&rft.au=Tandel%2C+S.T.%3B+Shah%2C+V.K.%3B+Hiranwal%2C+S.&rft.date=2013&rft.volume=2&rft.issue=8&rft.pages=449%E2%80%93553&rft_id=https%3A%2F%2Fweb.archive.org%2Fweb%2F20150118081656%2Fhttp%3A%2F%2Fijsacs.org%2Fprevious.html&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DabasARecap14-2\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-DabasARecap14_2-0\">2.0<\/a><\/sup> <sup><a href=\"#cite_ref-DabasARecap14_2-1\">2.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Dabas, P.; Wadhwa, D. (2014). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/ijcat.com\/archieve\/volume3\/issue6\/ijcatr03061002\" data-key=\"ab6243b7f66877124457cbdf22f8d746\">\"A Recapitulation of Data Auditing Approaches for Cloud Data\"<\/a>. <i>International Journal of Computer Applications Technology and Research<\/i> <b>3<\/b> (6): 329\u201332. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.7753%2FIJCATR0306.1002\" data-key=\"0f4526a352b0f001fcce84827febf25f\">10.7753\/IJCATR0306.1002<\/a><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/ijcat.com\/archieve\/volume3\/issue6\/ijcatr03061002\" data-key=\"ab6243b7f66877124457cbdf22f8d746\">https:\/\/ijcat.com\/archieve\/volume3\/issue6\/ijcatr03061002<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Recapitulation+of+Data+Auditing+Approaches+for+Cloud+Data&rft.jtitle=International+Journal+of+Computer+Applications+Technology+and+Research&rft.aulast=Dabas%2C+P.%3B+Wadhwa%2C+D.&rft.au=Dabas%2C+P.%3B+Wadhwa%2C+D.&rft.date=2014&rft.volume=3&rft.issue=6&rft.pages=329%E2%80%9332&rft_id=info:doi\/10.7753%2FIJCATR0306.1002&rft_id=https%3A%2F%2Fijcat.com%2Farchieve%2Fvolume3%2Fissue6%2Fijcatr03061002&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MellTheNIST11-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MellTheNIST11_3-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Mell, P.; Grance, T. (September 2011). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/csrc.nist.gov\/publications\/detail\/sp\/800-145\/final\" data-key=\"ae8ce90ee7b772c45c8b110f207a5144\">\"The NIST Definition of Cloud Computing\"<\/a>. <i>Computer Security Resource Center<\/i><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/csrc.nist.gov\/publications\/detail\/sp\/800-145\/final\" data-key=\"ae8ce90ee7b772c45c8b110f207a5144\">https:\/\/csrc.nist.gov\/publications\/detail\/sp\/800-145\/final<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+NIST+Definition+of+Cloud+Computing&rft.atitle=Computer+Security+Resource+Center&rft.aulast=Mell%2C+P.%3B+Grance%2C+T.&rft.au=Mell%2C+P.%3B+Grance%2C+T.&rft.date=September+2011&rft_id=https%3A%2F%2Fcsrc.nist.gov%2Fpublications%2Fdetail%2Fsp%2F800-145%2Ffinal&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MillerCloud08-4\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MillerCloud08_4-0\">4.0<\/a><\/sup> <sup><a href=\"#cite_ref-MillerCloud08_4-1\">4.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Miller, M. (2008). <i>Cloud Computing: Web-Based Applications That Change the Way You Work and Collaborate Online<\/i>. Que Publishing. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780789738035.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Cloud+Computing%3A+Web-Based+Applications+That+Change+the+Way+You+Work+and+Collaborate+Online&rft.aulast=Miller%2C+M.&rft.au=Miller%2C+M.&rft.date=2008&rft.pub=Que+Publishing&rft.isbn=9780789738035&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-VelteCloud09-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-VelteCloud09_5-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Velte, T.; Velte, A.; Elsenpeter, R.C. (2009). <i>Cloud Computing: A Practical Approach<\/i>. McGraw-Hill Education. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780071626941.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Cloud+Computing%3A+A+Practical+Approach&rft.aulast=Velte%2C+T.%3B+Velte%2C+A.%3B+Elsenpeter%2C+R.C.&rft.au=Velte%2C+T.%3B+Velte%2C+A.%3B+Elsenpeter%2C+R.C.&rft.date=2009&rft.pub=McGraw-Hill+Education&rft.isbn=9780071626941&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ZhouServices10-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ZhouServices10_6-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Zhou, M.; Zhang, R.; Zeng, D.; Qian, W. (2010). \"Services in the Cloud Computing era: A survey\". <i>Proceedings from the 4th International Universal Communication Symposium<\/i>: 40\u201346. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FIUCS.2010.5666772\" data-key=\"3c8f6fba08ed7db2f3e683f3fe0ebe8a\">10.1109\/IUCS.2010.5666772<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Services+in+the+Cloud+Computing+era%3A+A+survey&rft.jtitle=Proceedings+from+the+4th+International+Universal+Communication+Symposium&rft.aulast=Zhou%2C+M.%3B+Zhang%2C+R.%3B+Zeng%2C+D.%3B+Qian%2C+W.&rft.au=Zhou%2C+M.%3B+Zhang%2C+R.%3B+Zeng%2C+D.%3B+Qian%2C+W.&rft.date=2010&rft.pages=40%E2%80%9346&rft_id=info:doi\/10.1109%2FIUCS.2010.5666772&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JingABrief10-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-JingABrief10_7-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Jing, X.; Jian-Jun, Z. (2010). \"A Brief Survey on the Security Model of Cloud Computing\". <i>Proceedings from the Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science<\/i>: 475\u20138. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FDCABES.2010.103\" data-key=\"03320831552afa8902a61c81874b806a\">10.1109\/DCABES.2010.103<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Brief+Survey+on+the+Security+Model+of+Cloud+Computing&rft.jtitle=Proceedings+from+the+Ninth+International+Symposium+on+Distributed+Computing+and+Applications+to+Business%2C+Engineering+and+Science&rft.aulast=Jing%2C+X.%3B+Jian-Jun%2C+Z.&rft.au=Jing%2C+X.%3B+Jian-Jun%2C+Z.&rft.date=2010&rft.pages=475%E2%80%938&rft_id=info:doi\/10.1109%2FDCABES.2010.103&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MellTheNIST09-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MellTheNIST09_8-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Mell, P.; Grance, T. (October 2009). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.nist.gov\/sites\/default\/files\/documents\/itl\/cloud\/cloud-def-v15.pdf\" data-key=\"468e6827937e213cac2d42a2cc805d8b\">\"The NIST Definition of Cloud Computing\"<\/a> (PDF)<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.nist.gov\/sites\/default\/files\/documents\/itl\/cloud\/cloud-def-v15.pdf\" data-key=\"468e6827937e213cac2d42a2cc805d8b\">https:\/\/www.nist.gov\/sites\/default\/files\/documents\/itl\/cloud\/cloud-def-v15.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+NIST+Definition+of+Cloud+Computing&rft.atitle=&rft.aulast=Mell%2C+P.%3B+Grance%2C+T.&rft.au=Mell%2C+P.%3B+Grance%2C+T.&rft.date=October+2009&rft_id=https%3A%2F%2Fwww.nist.gov%2Fsites%2Fdefault%2Ffiles%2Fdocuments%2Fitl%2Fcloud%2Fcloud-def-v15.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MarshFormal94-9\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MarshFormal94_9-0\">9.0<\/a><\/sup> <sup><a href=\"#cite_ref-MarshFormal94_9-1\">9.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Marsh, S.P. (April 1994). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/stephenmarsh.wdfiles.com\/local--files\/start\/TrustThesis.pdf\" data-key=\"3cc5ae170f6dde6017c28d5252198952\">\"Formalising Trust as a Computational Concept\"<\/a> (PDF). University of Stirling<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/stephenmarsh.wdfiles.com\/local--files\/start\/TrustThesis.pdf\" data-key=\"3cc5ae170f6dde6017c28d5252198952\">http:\/\/stephenmarsh.wdfiles.com\/local--files\/start\/TrustThesis.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Formalising+Trust+as+a+Computational+Concept&rft.atitle=&rft.aulast=Marsh%2C+S.P.&rft.au=Marsh%2C+S.P.&rft.date=April+1994&rft.pub=University+of+Stirling&rft_id=http%3A%2F%2Fstephenmarsh.wdfiles.com%2Flocal--files%2Fstart%2FTrustThesis.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GambettaTrust08-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GambettaTrust08_10-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Gambetta, D. (1990). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.nuffield.ox.ac.uk\/media\/1779\/gambetta-trust_making-and-breaking-cooperative-relations.pdf\" data-key=\"763cb0e6fb9b775cb7d658226bd3b00c\">\"Can We Trust Trust?\"<\/a>. In Gambetta, D. (PDF). <i>Trust: Making and Breaking Cooperative Relations (2008 Scanned Digital Copy)<\/i>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 0631155066<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.nuffield.ox.ac.uk\/media\/1779\/gambetta-trust_making-and-breaking-cooperative-relations.pdf\" data-key=\"763cb0e6fb9b775cb7d658226bd3b00c\">https:\/\/www.nuffield.ox.ac.uk\/media\/1779\/gambetta-trust_making-and-breaking-cooperative-relations.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Can+We+Trust+Trust%3F&rft.atitle=Trust%3A+Making+and+Breaking+Cooperative+Relations+%282008+Scanned+Digital+Copy%29&rft.aulast=Gambetta%2C+D.&rft.au=Gambetta%2C+D.&rft.date=1990&rft.isbn=0631155066&rft_id=https%3A%2F%2Fwww.nuffield.ox.ac.uk%2Fmedia%2F1779%2Fgambetta-trust_making-and-breaking-cooperative-relations.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-J.C3.B8sangAMetric11-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-J.C3.B8sangAMetric11_11-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">J\u00f8sang, A.; Knapskog, S.J. (2011). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/csrc.nist.gov\/csrc\/media\/publications\/conference-paper\/1998\/10\/08\/proceedings-of-the-21st-nissc-1998\/documents\/papera2.pdf\" data-key=\"5ecba1df1904afa05b5a2334e38010be\">\"A Metric for Trusted Systems\"<\/a> (PDF). <i>Proceedings from the 21st National Information Systems Security Conference<\/i><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/csrc.nist.gov\/csrc\/media\/publications\/conference-paper\/1998\/10\/08\/proceedings-of-the-21st-nissc-1998\/documents\/papera2.pdf\" data-key=\"5ecba1df1904afa05b5a2334e38010be\">https:\/\/csrc.nist.gov\/csrc\/media\/publications\/conference-paper\/1998\/10\/08\/proceedings-of-the-21st-nissc-1998\/documents\/papera2.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Metric+for+Trusted+Systems&rft.jtitle=Proceedings+from+the+21st+National+Information+Systems+Security+Conference&rft.aulast=J%C3%B8sang%2C+A.%3B+Knapskog%2C+S.J.&rft.au=J%C3%B8sang%2C+A.%3B+Knapskog%2C+S.J.&rft.date=2011&rft_id=https%3A%2F%2Fcsrc.nist.gov%2Fcsrc%2Fmedia%2Fpublications%2Fconference-paper%2F1998%2F10%2F08%2Fproceedings-of-the-21st-nissc-1998%2Fdocuments%2Fpapera2.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-VictorTrust11-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-VictorTrust11_12-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Victor, P.; De Cock, M.; Cornelis, C. (2011). \"Trust and Recommendations\". In Ricci, F.; Rokach, L.; Shapira, B.; Kantor, P.B.. <i>Recommender Systems Handbook<\/i>. Springer. pp. 645\u201375. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780387858197.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Trust+and+Recommendations&rft.atitle=Recommender+Systems+Handbook&rft.aulast=Victor%2C+P.%3B+De+Cock%2C+M.%3B+Cornelis%2C+C.&rft.au=Victor%2C+P.%3B+De+Cock%2C+M.%3B+Cornelis%2C+C.&rft.date=2011&rft.pages=pp.%26nbsp%3B645%E2%80%9375&rft.pub=Springer&rft.isbn=9780387858197&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AdnaneTrust13-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AdnaneTrust13_13-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Adnane, A.; Bidan, C.; de Sousa J\u00fanior, R.T. (2013). \"Trust-based security for the OLSR routing protocol\". <i>Computer Communications<\/i> <b>36<\/b> (10\u201311): 1159-71. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.comcom.2013.04.003\" data-key=\"85c9da820316f3c0402f58847bfb2884\">10.1016\/j.comcom.2013.04.003<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Trust-based+security+for+the+OLSR+routing+protocol&rft.jtitle=Computer+Communications&rft.aulast=Adnane%2C+A.%3B+Bidan%2C+C.%3B+de+Sousa+J%C3%BAnior%2C+R.T.&rft.au=Adnane%2C+A.%3B+Bidan%2C+C.%3B+de+Sousa+J%C3%BAnior%2C+R.T.&rft.date=2013&rft.volume=36&rft.issue=10%E2%80%9311&rft.pages=1159-71&rft_id=info:doi\/10.1016%2Fj.comcom.2013.04.003&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DeSousaTrust10-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DeSousaTrust10_14-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">De Sousa Jr., R.T.; Puttini, R.S. (2010). \"Trust Management in Ad Hoc Networks\". In Yan, Z.. <i>Trust Modeling and Management in Digital Environments: From Social Concept to System Development<\/i>. IGI Global. pp. 224\u201349. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9781615206827.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Trust+Management+in+Ad+Hoc+Networks&rft.atitle=Trust+Modeling+and+Management+in+Digital+Environments%3A+From+Social+Concept+to+System+Development&rft.aulast=De+Sousa+Jr.%2C+R.T.%3B+Puttini%2C+R.S.&rft.au=De+Sousa+Jr.%2C+R.T.%3B+Puttini%2C+R.S.&rft.date=2010&rft.pages=pp.%26nbsp%3B224%E2%80%9349&rft.pub=IGI+Global&rft.isbn=9781615206827&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-YahalomTrust93-15\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-YahalomTrust93_15-0\">15.0<\/a><\/sup> <sup><a href=\"#cite_ref-YahalomTrust93_15-1\">15.1<\/a><\/sup> <sup><a href=\"#cite_ref-YahalomTrust93_15-2\">15.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Yahalom, R.; Klein, B.; Beth, T. (1993). \"Trust relationships in secure systems-a distributed authentication perspective\". <i>Proceedings from the 1993 IEEE Computer Society Symposium on Research in Security and Privacy<\/i>: 150\u201364. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FRISP.1993.287635\" data-key=\"67aacca3f3649d34c7585d8597f14cfb\">10.1109\/RISP.1993.287635<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Trust+relationships+in+secure+systems-a+distributed+authentication+perspective&rft.jtitle=Proceedings+from+the+1993+IEEE+Computer+Society+Symposium+on+Research+in+Security+and+Privacy&rft.aulast=Yahalom%2C+R.%3B+Klein%2C+B.%3B+Beth%2C+T.&rft.au=Yahalom%2C+R.%3B+Klein%2C+B.%3B+Beth%2C+T.&rft.date=1993&rft.pages=150%E2%80%9364&rft_id=info:doi\/10.1109%2FRISP.1993.287635&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GrandisonASurv00-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GrandisonASurv00_16-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Grandison, T.; Sloman, M. (2000). \"A survey of trust in internet applications\". <i>IEEE Communications Surveys & Tutorials<\/i> <b>3<\/b> (4): 2\u201316. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FCOMST.2000.5340804\" data-key=\"14947ab2bae5c5b5956051c0a12afa56\">10.1109\/COMST.2000.5340804<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+survey+of+trust+in+internet+applications&rft.jtitle=IEEE+Communications+Surveys+%26+Tutorials&rft.aulast=Grandison%2C+T.%3B+Sloman%2C+M.&rft.au=Grandison%2C+T.%3B+Sloman%2C+M.&rft.date=2000&rft.volume=3&rft.issue=4&rft.pages=2%E2%80%9316&rft_id=info:doi\/10.1109%2FCOMST.2000.5340804&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BellarePublic00-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BellarePublic00_17-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bellare, M.; Boldyreva, A.; Micali, S. (2000). \"Public-Key Encryption in a Multi-user Setting: Security Proofs and Improvements\". <i>Proceedings from Advances in Cryptology \u2014 EUROCRYPT 2000<\/i>: 259\u201374. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FCOMST.2000.5340804\" data-key=\"14947ab2bae5c5b5956051c0a12afa56\">10.1109\/COMST.2000.5340804<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Public-Key+Encryption+in+a+Multi-user+Setting%3A+Security+Proofs+and+Improvements&rft.jtitle=Proceedings+from+Advances+in+Cryptology+%E2%80%94+EUROCRYPT+2000&rft.aulast=Bellare%2C+M.%3B+Boldyreva%2C+A.%3B+Micali%2C+S.&rft.au=Bellare%2C+M.%3B+Boldyreva%2C+A.%3B+Micali%2C+S.&rft.date=2000&rft.pages=259%E2%80%9374&rft_id=info:doi\/10.1109%2FCOMST.2000.5340804&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BoseInfo08-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BoseInfo08_18-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Bose, R. (2008). <i>Information Theory, Coding and Cryptography<\/i> (2nd ed.). Mcgraw Hill Education. pp. 297\u20138. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9780070669017.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Information+Theory%2C+Coding+and+Cryptography&rft.aulast=Bose%2C+R.&rft.au=Bose%2C+R.&rft.date=2008&rft.pages=pp.%26nbsp%3B297%E2%80%938&rft.edition=2nd&rft.pub=Mcgraw+Hill+Education&rft.isbn=9780070669017&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RivestTheMD5_92-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RivestTheMD5_92_19-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Rivest, R. (April 1992). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/tools.ietf.org\/html\/rfc1321\" data-key=\"8da7f011547b1dca8a318d423b918f69\">\"The MD5 Message-Digest Algorithm\"<\/a>. <i>ietf.org<\/i><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/tools.ietf.org\/html\/rfc1321\" data-key=\"8da7f011547b1dca8a318d423b918f69\">https:\/\/tools.ietf.org\/html\/rfc1321<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 25 June 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+MD5+Message-Digest+Algorithm&rft.atitle=ietf.org&rft.aulast=Rivest%2C+R.&rft.au=Rivest%2C+R.&rft.date=April+1992&rft_id=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Frfc1321&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DworkinSHA-3_15-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DworkinSHA-3_15_20-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Dworkin, M.J. (04 August 2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.nist.gov\/publications\/sha-3-standard-permutation-based-hash-and-extendable-output-functions\" data-key=\"9063f642c90ef56ee74f764db982de98\">\"SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions\"<\/a>. NIST<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.nist.gov\/publications\/sha-3-standard-permutation-based-hash-and-extendable-output-functions\" data-key=\"9063f642c90ef56ee74f764db982de98\">https:\/\/www.nist.gov\/publications\/sha-3-standard-permutation-based-hash-and-extendable-output-functions<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=SHA-3+Standard%3A+Permutation-Based+Hash+and+Extendable-Output+Functions&rft.atitle=&rft.aulast=Dworkin%2C+M.J.&rft.au=Dworkin%2C+M.J.&rft.date=04+August+2015&rft.pub=NIST&rft_id=https%3A%2F%2Fwww.nist.gov%2Fpublications%2Fsha-3-standard-permutation-based-hash-and-extendable-output-functions&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AumassonBLAKE2_13-21\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-AumassonBLAKE2_13_21-0\">21.0<\/a><\/sup> <sup><a href=\"#cite_ref-AumassonBLAKE2_13_21-1\">21.1<\/a><\/sup> <sup><a href=\"#cite_ref-AumassonBLAKE2_13_21-2\">21.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Aumasson, J.-P.; Neves, S.; W.-O.; Winnerlein, C. (2013). \"BLAKE2: Simpler, Smaller, Fast as MD5\". <i>Proceedings from the 2013 International Conference on Applied Cryptography and Network Security<\/i>: 119\u201335. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-642-38980-1_8\" data-key=\"801920c337b01ae105eb389e24c4ea4f\">10.1007\/978-3-642-38980-1_8<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BLAKE2%3A+Simpler%2C+Smaller%2C+Fast+as+MD5&rft.jtitle=Proceedings+from+the+2013+International+Conference+on+Applied+Cryptography+and+Network+Security&rft.aulast=Aumasson%2C+J.-P.%3B+Neves%2C+S.%3B+W.-O.%3B+Winnerlein%2C+C.&rft.au=Aumasson%2C+J.-P.%3B+Neves%2C+S.%3B+W.-O.%3B+Winnerlein%2C+C.&rft.date=2013&rft.pages=119%E2%80%9335&rft_id=info:doi\/10.1007%2F978-3-642-38980-1_8&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TahtaGenTrust15-22\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-TahtaGenTrust15_22-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Tahta, U.E.; Sen, S.; Can, A.B. (2015). \"GenTrust: A genetic trust management model for peer-to-peer systems\". <i>Applied Soft Computing<\/i> <b>34<\/b>: 693\u2013704. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.asoc.2015.04.053\" data-key=\"789e817763d6a33f868620bee2f56805\">10.1016\/j.asoc.2015.04.053<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GenTrust%3A+A+genetic+trust+management+model+for+peer-to-peer+systems&rft.jtitle=Applied+Soft+Computing&rft.aulast=Tahta%2C+U.E.%3B+Sen%2C+S.%3B+Can%2C+A.B.&rft.au=Tahta%2C+U.E.%3B+Sen%2C+S.%3B+Can%2C+A.B.&rft.date=2015&rft.volume=34&rft.pages=693%E2%80%93704&rft_id=info:doi\/10.1016%2Fj.asoc.2015.04.053&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GholamiATrust15-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GholamiATrust15_23-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Gholami, A.; Arani, M.G. (2015). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/pdfs.semanticscholar.org\/487e\/11b3605276b5ff66de363d4e735bcdd740c3.pdf?_ga=2.219348827.37751313.1553622532-1472248397.1551840079\" data-key=\"bde683a418c4077be829aa520932bf18\">\"A Trust Model Based on Quality of Service in Cloud Computing Environment\"<\/a>. <i>International Journal of Database Theory and Application<\/i> <b>8<\/b> (5): 161\u201370<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/pdfs.semanticscholar.org\/487e\/11b3605276b5ff66de363d4e735bcdd740c3.pdf?_ga=2.219348827.37751313.1553622532-1472248397.1551840079\" data-key=\"bde683a418c4077be829aa520932bf18\">https:\/\/pdfs.semanticscholar.org\/487e\/11b3605276b5ff66de363d4e735bcdd740c3.pdf?_ga=2.219348827.37751313.1553622532-1472248397.1551840079<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Trust+Model+Based+on+Quality+of+Service+in+Cloud+Computing+Environment&rft.jtitle=International+Journal+of+Database+Theory+and+Application&rft.aulast=Gholami%2C+A.%3B+Arani%2C+M.G.&rft.au=Gholami%2C+A.%3B+Arani%2C+M.G.&rft.date=2015&rft.volume=8&rft.issue=5&rft.pages=161%E2%80%9370&rft_id=https%3A%2F%2Fpdfs.semanticscholar.org%2F487e%2F11b3605276b5ff66de363d4e735bcdd740c3.pdf%3F_ga%3D2.219348827.37751313.1553622532-1472248397.1551840079&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CanedoModelo13-24\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CanedoModelo13_24-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Canedo, E.D. (30 January 2013). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/repositorio.unb.br\/handle\/10482\/11987\" data-key=\"434ea1b36113e446aabc4a485682a08d\">\"Modelo de confian\u00e7a para a troca de arquivos em uma nuvem privada - Tese (Doutorado em Engenharia El\u00e9trica)\"<\/a>. Universidade de Bras\u00edlia<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"http:\/\/repositorio.unb.br\/handle\/10482\/11987\" data-key=\"434ea1b36113e446aabc4a485682a08d\">http:\/\/repositorio.unb.br\/handle\/10482\/11987<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Modelo+de+confian%C3%A7a+para+a+troca+de+arquivos+em+uma+nuvem+privada+-+Tese+%28Doutorado+em+Engenharia+El%C3%A9trica%29&rft.atitle=&rft.aulast=Canedo%2C+E.D.&rft.au=Canedo%2C+E.D.&rft.date=30+January+2013&rft.pub=Universidade+de+Bras%C3%ADlia&rft_id=http%3A%2F%2Frepositorio.unb.br%2Fhandle%2F10482%2F11987&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JuelsPors07-25\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-JuelsPors07_25-0\">25.0<\/a><\/sup> <sup><a href=\"#cite_ref-JuelsPors07_25-1\">25.1<\/a><\/sup> <sup><a href=\"#cite_ref-JuelsPors07_25-2\">25.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Juels, A.; Kaliski, Jr., B.S. (2007). \"PORs: Proofs of retrievability for large files\". <i>Proceedings of the 14th ACM Conference on Computer and Communications Security<\/i>: 584\u201397. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1145%2F1315245.1315317\" data-key=\"24f33589902208484f31276c4781ffb2\">10.1145\/1315245.1315317<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PORs%3A+Proofs+of+retrievability+for+large+files&rft.jtitle=Proceedings+of+the+14th+ACM+Conference+on+Computer+and+Communications+Security&rft.aulast=Juels%2C+A.%3B+Kaliski%2C+Jr.%2C+B.S.&rft.au=Juels%2C+A.%3B+Kaliski%2C+Jr.%2C+B.S.&rft.date=2007&rft.pages=584%E2%80%9397&rft_id=info:doi\/10.1145%2F1315245.1315317&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KumarData11-26\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-KumarData11_26-0\">26.0<\/a><\/sup> <sup><a href=\"#cite_ref-KumarData11_26-1\">26.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Kumar, R.S.; Saxena, A. (2011). \"Data integrity proofs in cloud storage\". <i>Proceedings of the Third International Conference on Communication Systems and Networks<\/i>: 1\u20134. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FCOMSNETS.2011.5716422\" data-key=\"01926aca9147d64831839b0d17509167\">10.1109\/COMSNETS.2011.5716422<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+integrity+proofs+in+cloud+storage&rft.jtitle=Proceedings+of+the+Third+International+Conference+on+Communication+Systems+and+Networks&rft.aulast=Kumar%2C+R.S.%3B+Saxena%2C+A.&rft.au=Kumar%2C+R.S.%3B+Saxena%2C+A.&rft.date=2011&rft.pages=1%E2%80%934&rft_id=info:doi\/10.1109%2FCOMSNETS.2011.5716422&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GeorgeData13-27\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-GeorgeData13_27-0\">27.0<\/a><\/sup> <sup><a href=\"#cite_ref-GeorgeData13_27-1\">27.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">George, R.S.; Sabitha, S. (2013). \"Data anonymization and integrity checking in cloud computing\". <i>Proceedings of the Fourth International Conference on Computing, Communications and Networking Technologies<\/i>: 1\u20135. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FICCCNT.2013.6726813\" data-key=\"b80140909b4c9d58500bdbe1fda5c109\">10.1109\/ICCCNT.2013.6726813<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+anonymization+and+integrity+checking+in+cloud+computing&rft.jtitle=Proceedings+of+the+Fourth+International+Conference+on+Computing%2C+Communications+and+Networking+Technologies&rft.aulast=George%2C+R.S.%3B+Sabitha%2C+S.&rft.au=George%2C+R.S.%3B+Sabitha%2C+S.&rft.date=2013&rft.pages=1%E2%80%935&rft_id=info:doi\/10.1109%2FICCCNT.2013.6726813&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KavuriData14-28\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-KavuriData14_28-0\">28.0<\/a><\/sup> <sup><a href=\"#cite_ref-KavuriData14_28-1\">28.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Kavuri, S.K.S.V.A.; Kancherla, G.R.; Bobba, B.R. (2014). \"Data authentication and integrity verification techniques for trusted\/untrusted cloud servers\". <i>Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics<\/i>: 2590-2596. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FICACCI.2014.6968657\" data-key=\"02318bf813fb3bba8569fcf6d6e93f0e\">10.1109\/ICACCI.2014.6968657<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+authentication+and+integrity+verification+techniques+for+trusted%2Funtrusted+cloud+servers&rft.jtitle=Proceedings+of+the+2014+International+Conference+on+Advances+in+Computing%2C+Communications+and+Informatics&rft.aulast=Kavuri%2C+S.K.S.V.A.%3B+Kancherla%2C+G.R.%3B+Bobba%2C+B.R.&rft.au=Kavuri%2C+S.K.S.V.A.%3B+Kancherla%2C+G.R.%3B+Bobba%2C+B.R.&rft.date=2014&rft.pages=2590-2596&rft_id=info:doi\/10.1109%2FICACCI.2014.6968657&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Al-JaberiData14-29\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-Al-JaberiData14_29-0\">29.0<\/a><\/sup> <sup><a href=\"#cite_ref-Al-JaberiData14_29-1\">29.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Al-Jaberi, M.F.; Zainal, A. (2014). \"Data integrity and privacy model in cloud computing\". <i>Proceedings of the 2014 International Symposium on Biometrics and Security Technologies<\/i>: 280-284. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FISBAST.2014.7013135\" data-key=\"881573be47976f4c0f4cd777f23c526c\">10.1109\/ISBAST.2014.7013135<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+integrity+and+privacy+model+in+cloud+computing&rft.jtitle=Proceedings+of+the+2014+International+Symposium+on+Biometrics+and+Security+Technologies&rft.aulast=Al-Jaberi%2C+M.F.%3B+Zainal%2C+A.&rft.au=Al-Jaberi%2C+M.F.%3B+Zainal%2C+A.&rft.date=2014&rft.pages=280-284&rft_id=info:doi\/10.1109%2FISBAST.2014.7013135&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KaiAnEffic13-30\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-KaiAnEffic13_30-0\">30.0<\/a><\/sup> <sup><a href=\"#cite_ref-KaiAnEffic13_30-1\">30.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Kai, H.; Chuanhe, H.; Jinhai, W. et al. (2013). \"An Efficient Public Batch Auditing Protocol for Data Security in Multi-cloud Storage\". <i>Proceedings of the 8th ChinaGrid Annual Conference<\/i>: 51-56. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FChinaGrid.2013.13\" data-key=\"593f922d9279895965fdec31dbf79c04\">10.1109\/ChinaGrid.2013.13<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Efficient+Public+Batch+Auditing+Protocol+for+Data+Security+in+Multi-cloud+Storage&rft.jtitle=Proceedings+of+the+8th+ChinaGrid+Annual+Conference&rft.aulast=Kai%2C+H.%3B+Chuanhe%2C+H.%3B+Jinhai%2C+W.+et+al.&rft.au=Kai%2C+H.%3B+Chuanhe%2C+H.%3B+Jinhai%2C+W.+et+al.&rft.date=2013&rft.pages=51-56&rft_id=info:doi\/10.1109%2FChinaGrid.2013.13&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WangEnabling09-31\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-WangEnabling09_31-0\">31.0<\/a><\/sup> <sup><a href=\"#cite_ref-WangEnabling09_31-1\">31.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wang, Q.; Wang, C.; Li, J. et al. (2009). \"Enabling public verifiability and data dynamics for storage security in cloud computing\". <i>Proceedings of the 14th European conference on Research in computer security<\/i>: 355\u201370. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-642-04444-1_22\" data-key=\"7f1c4022270a9a30c238e4d47ee45e11\">10.1007\/978-3-642-04444-1_22<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enabling+public+verifiability+and+data+dynamics+for+storage+security+in+cloud+computing&rft.jtitle=Proceedings+of+the+14th+European+conference+on+Research+in+computer+security&rft.aulast=Wang%2C+Q.%3B+Wang%2C+C.%3B+Li%2C+J.+et+al.&rft.au=Wang%2C+Q.%3B+Wang%2C+C.%3B+Li%2C+J.+et+al.&rft.date=2009&rft.pages=355%E2%80%9370&rft_id=info:doi\/10.1007%2F978-3-642-04444-1_22&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PflanznerTowards16-32\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-PflanznerTowards16_32-0\">32.0<\/a><\/sup> <sup><a href=\"#cite_ref-PflanznerTowards16_32-1\">32.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Pflanzner, T.; Tornyai, R.; Kertesz, A. (2016). \"Towards Enabling Clouds for IoT: Interoperable Data Management Approaches by Multi-clouds\". In Mahmood, Z. publisher=Springer. <i>Connectivity Frameworks for Smart Devices<\/i>. pp. 187\u2013207. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-319-33124-9_8\" data-key=\"52ec7a184fcad2fe9e99a2631dcda759\">10.1007\/978-3-319-33124-9_8<\/a>. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" data-key=\"f64947ba21e884434bd70e8d9e60bae6\">ISBN<\/a> 9783319331225.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Towards+Enabling+Clouds+for+IoT%3A+Interoperable+Data+Management+Approaches+by+Multi-clouds&rft.atitle=Connectivity+Frameworks+for+Smart+Devices&rft.aulast=Pflanzner%2C+T.%3B+Tornyai%2C+R.%3B+Kertesz%2C+A.&rft.au=Pflanzner%2C+T.%3B+Tornyai%2C+R.%3B+Kertesz%2C+A.&rft.date=2016&rft.pages=pp.%26nbsp%3B187%E2%80%93207&rft_id=info:doi\/10.1007%2F978-3-319-33124-9_8&rft.isbn=9783319331225&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Gracia-TinedoActively13-33\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Gracia-TinedoActively13_33-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Gracia-Tinedo, R.; Artigas, M.S.; Moreno-Martinez, A. et al. (2013). \"Actively Measuring Personal Cloud Storage\". <i>Proceedings of the 2013 IEEE Sixth International Conference on Cloud Computing<\/i>: 301-308. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1109%2FCLOUD.2013.25\" data-key=\"aa4b0284e2ad8adf3cecb68177934033\">10.1109\/CLOUD.2013.25<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Actively+Measuring+Personal+Cloud+Storage&rft.jtitle=Proceedings+of+the+2013+IEEE+Sixth+International+Conference+on+Cloud+Computing&rft.aulast=Gracia-Tinedo%2C+R.%3B+Artigas%2C+M.S.%3B+Moreno-Martinez%2C+A.+et+al.&rft.au=Gracia-Tinedo%2C+R.%3B+Artigas%2C+M.S.%3B+Moreno-Martinez%2C+A.+et+al.&rft.date=2013&rft.pages=301-308&rft_id=info:doi\/10.1109%2FCLOUD.2013.25&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PinheiroAProposed16-34\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PinheiroAProposed16_34-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pinheiro, A.; Canedo, E.D.; De Sousa, Jr.; R.T. et al. (2016). \"A Proposed Protocol for Periodic Monitoring of Cloud Storage Services Using Trust and Encryption\". <i>Proceedings of the 2016 International Conference on Computational Science and Its Applications<\/i>: 45\u201359. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-319-42108-7_4\" data-key=\"57f94427ed88626aba6e9993f74029ed\">10.1007\/978-3-319-42108-7_4<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Proposed+Protocol+for+Periodic+Monitoring+of+Cloud+Storage+Services+Using+Trust+and+Encryption&rft.jtitle=Proceedings+of+the+2016+International+Conference+on+Computational+Science+and+Its+Applications&rft.aulast=Pinheiro%2C+A.%3B+Canedo%2C+E.D.%3B+De+Sousa%2C+Jr.%3B+R.T.+et+al.&rft.au=Pinheiro%2C+A.%3B+Canedo%2C+E.D.%3B+De+Sousa%2C+Jr.%3B+R.T.+et+al.&rft.date=2016&rft.pages=45%E2%80%9359&rft_id=info:doi\/10.1007%2F978-3-319-42108-7_4&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PinheiroTrustOriented16-35\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PinheiroTrustOriented16_35-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pinheiro, A.; Canedo, E.D.; De Sousa, Jr.; R.T. et al. (2016). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/thinkmind.org\/index.php?view=article&articleid=icsea_2016_13_20_10164\" data-key=\"62f6a74d27fd938417986480fb94c5a1\">\"Trust-Oriented Protocol for Continuous Monitoring of Stored Files in Cloud\"<\/a>. <i>Proceedings of the Eleventh International Conference on Software Engineering Advances<\/i>: 295\u2013301<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/thinkmind.org\/index.php?view=article&articleid=icsea_2016_13_20_10164\" data-key=\"62f6a74d27fd938417986480fb94c5a1\">https:\/\/thinkmind.org\/index.php?view=article&articleid=icsea_2016_13_20_10164<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Trust-Oriented+Protocol+for+Continuous+Monitoring+of+Stored+Files+in+Cloud&rft.jtitle=Proceedings+of+the+Eleventh+International+Conference+on+Software+Engineering+Advances&rft.aulast=Pinheiro%2C+A.%3B+Canedo%2C+E.D.%3B+De+Sousa%2C+Jr.%3B+R.T.+et+al.&rft.au=Pinheiro%2C+A.%3B+Canedo%2C+E.D.%3B+De+Sousa%2C+Jr.%3B+R.T.+et+al.&rft.date=2016&rft.pages=295%E2%80%93301&rft_id=https%3A%2F%2Fthinkmind.org%2Findex.php%3Fview%3Darticle%26articleid%3Dicsea_2016_13_20_10164&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JendrockJava14-36\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-JendrockJava14_36-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Jendrock, E.; Cervera-Navarro, R.; Evans, I. et al. (September 2014). <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/docs.oracle.com\/javaee\/7\/tutorial\/\" data-key=\"7a709c9186dae75c8cf504d276a8204c\">\"Java Platform, Enterprise Edition: The Java EE Tutorial\"<\/a>. <i>Java Documentation<\/i><span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/docs.oracle.com\/javaee\/7\/tutorial\/\" data-key=\"7a709c9186dae75c8cf504d276a8204c\">https:\/\/docs.oracle.com\/javaee\/7\/tutorial\/<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Java+Platform%2C+Enterprise+Edition%3A+The+Java+EE+Tutorial&rft.atitle=Java+Documentation&rft.aulast=Jendrock%2C+E.%3B+Cervera-Navarro%2C+R.%3B+Evans%2C+I.+et+al.&rft.au=Jendrock%2C+E.%3B+Cervera-Navarro%2C+R.%3B+Evans%2C+I.+et+al.&rft.date=September+2014&rft_id=https%3A%2F%2Fdocs.oracle.com%2Fjavaee%2F7%2Ftutorial%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-OracleGlassFish14-37\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-OracleGlassFish14_37-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/javaee.github.io\/glassfish\/doc\/4.0\/release-notes.pdf\" data-key=\"28f47afe8a5467c2de484ce3ca9b363d\">\"GlassFish Server Open Source Edition, Release Notes, Release 4.1\"<\/a> (PDF). Oracle. September 2014<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/javaee.github.io\/glassfish\/doc\/4.0\/release-notes.pdf\" data-key=\"28f47afe8a5467c2de484ce3ca9b363d\">https:\/\/javaee.github.io\/glassfish\/doc\/4.0\/release-notes.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 25 February 2018<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=GlassFish+Server+Open+Source+Edition%2C+Release+Notes%2C+Release+4.1&rft.atitle=&rft.date=September+2014&rft.pub=Oracle&rft_id=https%3A%2F%2Fjavaee.github.io%2Fglassfish%2Fdoc%2F4.0%2Frelease-notes.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PostgreSQL-38\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PostgreSQL_38-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/www.postgresql.org\/\" data-key=\"9898fa08ab53ad93fe00c36437b6a72b\">\"PostgreSQL: The World's Most Advanced Open Source Relational Database\"<\/a>. PostgreSQL Global Development Group<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/www.postgresql.org\/\" data-key=\"9898fa08ab53ad93fe00c36437b6a72b\">https:\/\/www.postgresql.org\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 22 May 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=PostgreSQL%3A+The+World%27s+Most+Advanced+Open+Source+Relational+Database&rft.atitle=&rft.pub=PostgreSQL+Global+Development+Group&rft_id=https%3A%2F%2Fwww.postgresql.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-OracleJavaPlat-39\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-OracleJavaPlat_39-0\">39.0<\/a><\/sup> <sup><a href=\"#cite_ref-OracleJavaPlat_39-1\">39.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/docs.oracle.com\/javase\/7\/docs\/api\/overview-summary.html\" data-key=\"a378af9910ab1fd38152d3aae2cf39d3\">\"Java Platform, Standard Edition 7: API Specification\"<\/a>. Oracle Corporation<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/docs.oracle.com\/javase\/7\/docs\/api\/overview-summary.html\" data-key=\"a378af9910ab1fd38152d3aae2cf39d3\">https:\/\/docs.oracle.com\/javase\/7\/docs\/api\/overview-summary.html<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 21 May 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Java+Platform%2C+Standard+Edition+7%3A+API+Specification&rft.atitle=&rft.pub=Oracle+Corporation&rft_id=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F7%2Fdocs%2Fapi%2Foverview-summary.html&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NISTAdvanced01-40\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NISTAdvanced01_40-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"nofollow\" class=\"external text wiki-link\" href=\"https:\/\/csrc.nist.gov\/publications\/detail\/fips\/197\/final\" data-key=\"b4f9ff717c495db559f57b5c285be379\">\"Advanced Encryption Standard (AES)\"<\/a>. NIST. November 2001<span class=\"printonly\">. <a rel=\"nofollow\" class=\"external free wiki-link\" href=\"https:\/\/csrc.nist.gov\/publications\/detail\/fips\/197\/final\" data-key=\"b4f9ff717c495db559f57b5c285be379\">https:\/\/csrc.nist.gov\/publications\/detail\/fips\/197\/final<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Advanced+Encryption+Standard+%28AES%29&rft.atitle=&rft.date=November+2001&rft.pub=NIST&rft_id=https%3A%2F%2Fcsrc.nist.gov%2Fpublications%2Fdetail%2Ffips%2F197%2Ffinal&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BellareTheSec94-41\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BellareTheSec94_41-0\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bellare, M.; Kilian, J.; Rogaway, P. (1994). \"The Security of Cipher Block Chaining\". <i>Proceedings of Advances in Cryptology \u2014 CRYPTO \u201994<\/i>: 341-358. <a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" data-key=\"ae6d69c760ab710abc2dd89f3937d2f4\">doi<\/a>:<a rel=\"nofollow\" class=\"external text wiki-link\" href=\"http:\/\/dx.doi.org\/10.1007%2F3-540-48658-5_32\" data-key=\"08d6e40a54eb451540264af3810a44e9\">10.1007\/3-540-48658-5_32<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Security+of+Cipher+Block+Chaining&rft.jtitle=Proceedings+of+Advances+in+Cryptology+%E2%80%94+CRYPTO+%E2%80%9994&rft.aulast=Bellare%2C+M.%3B+Kilian%2C+J.%3B+Rogaway%2C+P.&rft.au=Bellare%2C+M.%3B+Kilian%2C+J.%3B+Rogaway%2C+P.&rft.date=1994&rft.pages=341-358&rft_id=info:doi\/10.1007%2F3-540-48658-5_32&rfr_id=info:sid\/en.wikipedia.org:Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20190401185643\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.972 seconds\nReal time usage: 1.042 seconds\nPreprocessor visited node count: 29555\/1000000\nPreprocessor generated node count: 38952\/1000000\nPost\u2010expand include size: 197533\/2097152 bytes\nTemplate argument size: 74054\/2097152 bytes\nHighest expansion depth: 15\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 926.231 1 - -total\n 86.66% 802.656 1 - Template:Reflist\n 76.09% 704.796 41 - Template:Citation\/core\n 47.39% 438.897 23 - Template:Cite_journal\n 18.89% 174.950 11 - Template:Cite_web\n 14.37% 133.125 7 - Template:Cite_book\n 7.44% 68.911 1 - Template:Infobox_journal_article\n 7.12% 65.950 1 - Template:Infobox\n 4.43% 41.055 27 - Template:Citation\/identifier\n 4.30% 39.793 80 - Template:Infobox\/row\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:10981-0!*!0!!en!5!* and timestamp 20190401185642 and revision id 35334\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services\">https:\/\/www.limswiki.org\/index.php\/Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","af5b38e70b68468e6df8188586e739da_images":["https:\/\/www.limswiki.org\/images\/c\/c1\/Fig1_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/1\/16\/Alg1_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/3\/38\/Alg2_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/f\/f2\/Tab1_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/a\/ab\/Fig2_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/5\/5a\/Fig3_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/2\/2b\/Fig4_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/1\/1e\/Fig5_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/7\/74\/Fig6_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/3\/3f\/Fig7_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/1\/12\/Fig8_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/9\/98\/Fig9_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/6\/65\/Fig10_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/f\/fe\/Fig11_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/a\/a9\/Fig12_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/1\/18\/Fig13_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/0\/0d\/Fig14_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/3\/39\/Fig15_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/8\/86\/Fig16_Pinheiro_Sensors2018_18-3.png","https:\/\/www.limswiki.org\/images\/f\/f3\/Fig17_Pinheiro_Sensors2018_18-3.png"],"af5b38e70b68468e6df8188586e739da_timestamp":1554145002,"2f4c8b4d90b1c8731d79d7ff410d7c37":{"type":"chapter","title":"1. Cybersecurity","key":"2f4c8b4d90b1c8731d79d7ff410d7c37"}},"link":"https:\/\/www.limswiki.org\/index.php\/Book:LIMSjournal_-_Spring_2019","price_currency":"","price_amount":"","book_size":"","download_url":"https:\/\/www.limsforum.com?ebb_action=book_download&book_id=79638","language":"","cta_button_content":"","toc":[{"type":"chapter","name":"1. Cybersecurity","id":"2f4c8b4d90b1c8731d79d7ff410d7c37","children":[{"type":"article","name":"Security architecture and protocol for trust verifications regarding the integrity of files stored in cloud services (Pinheiro et al. 2018)","id":"af5b38e70b68468e6df8188586e739da","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Security_architecture_and_protocol_for_trust_verifications_regarding_the_integrity_of_files_stored_in_cloud_services"},{"type":"article","name":"SCADA system testbed for cybersecurity research using machine learning approach (Teixeira et al. 2018)","id":"d400aae80e71d72278a98ceb5a2237dd","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:SCADA_system_testbed_for_cybersecurity_research_using_machine_learning_approach"}]},{"type":"chapter","name":"2. Health, public health, and clinical informatics","id":"f4f13a36b3c5fbcb8475802ef2644a2e","children":[{"type":"article","name":"Codesign of the Population Health Information Management System to measure reach and practice change of childhood obesity programs (Green et al. 2018)","id":"945e3454ada339aaa7a7668d339d588c","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Codesign_of_the_Population_Health_Information_Management_System_to_measure_reach_and_practice_change_of_childhood_obesity_programs"},{"type":"article","name":"Development of an electronic information system for the management of laboratory data of tuberculosis and atypical mycobacteria at the Pasteur Institute in C\u00f4te d\u2019Ivoire (Kon\u00e9 et al. 2019)","id":"625b72cffd2a8d803eb5cb58c6ef954e","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Development_of_an_electronic_information_system_for_the_management_of_laboratory_data_of_tuberculosis_and_atypical_mycobacteria_at_the_Pasteur_Institute_in_C%C3%B4te_d%E2%80%99Ivoire"},{"type":"article","name":"Data to diagnosis in global health: A 3P approach (Pathinarupothi et al. 2018)","id":"8d21eded7dba3fec86203cded8451b7e","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Data_to_diagnosis_in_global_health:_A_3P_approach"},{"type":"article","name":"Building a newborn screening information management system from theory to practice (Pluscauskas et al. 2019)","id":"ab125d6daef2f763e588fcd5432c1b66","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Building_a_newborn_screening_information_management_system_from_theory_to_practice"},{"type":"article","name":"Adapting data management education to support clinical research projects in an academic medical center (Read 2019)","id":"cb9038099fb8453d3ea802865335a88b","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Adapting_data_management_education_to_support_clinical_research_projects_in_an_academic_medical_center"},{"type":"article","name":"Transferring exome sequencing data from clinical laboratories to healthcare providers: Lessons learned at a pediatric hospital (Swaminathan et al. 2018)","id":"47e85bcf8a99fb2753262f8d1499e7f0","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Transferring_exome_sequencing_data_from_clinical_laboratories_to_healthcare_providers:_Lessons_learned_at_a_pediatric_hospital"}]},{"type":"chapter","name":"3. Information retrieval, analysis, and visualization","id":"463407f6febae4584ffab2b203063e7a","children":[{"type":"article","name":"What Is health information quality? Ethical dimension and perception by users (Al-Jefri et al. 2018)","id":"86da8ed36fc493b6a573df8d0f7095ac","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:What_Is_health_information_quality%3F_Ethical_dimension_and_perception_by_users"},{"type":"article","name":"A view of programming scalable data analysis: From clouds to exascale (Talia 2019)","id":"804be563fdd6e10a6921069440e3e962","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:A_view_of_programming_scalable_data_analysis:_From_clouds_to_exascale"},{"type":"article","name":"Semantics for an integrative and immersive pipeline combining visualization and analysis of molecular data (Trellet et al. 2018)","id":"6ee24d5f7bd1af8e24033922d437ffd0","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Semantics_for_an_integrative_and_immersive_pipeline_combining_visualization_and_analysis_of_molecular_data"},{"type":"article","name":"Research on information retrieval model based on ontology (Yu 2019)","id":"15ab90bc3c6b03e3f0954255a3ab8dc7","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Research_on_information_retrieval_model_based_on_ontology"}]}],"settings":{"show_cover":1,"show_title":1,"show_subtitle":0,"show_full_title":1,"show_editor":1,"show_editor_pic":1,"show_publisher":1,"show_language":0,"show_size":0,"show_toc":0,"show_content_beneath_cover":1,"toc_links":"logged-in","cta_button":"1","content_location":"1","log_in_msg":"","cover_size":"medium"},"title_image":"https:\/\/www.limsforum.com\/wp-content\/uploads\/Fig1_Talia_JOfCloudComp2019_8.png"}}
LIMSjournal - Spring 2019
Volume 5, Issue 1
Editor: Shawn Douglas
Publisher: LabLynx Press
Copyright LabLynx Inc. All rights reserved.