Full article title Open data in scientific communication
Journal Folia Forestalia Polonica, Series A – Forestry
Author(s) Grygoruk, Dorota
Author affiliation(s) Forest Research Institute
Primary contact Email: farfald at ibles dot waw dot pl
Year published 2018
Volume and issue 60(3)
Page(s) 192–98
DOI 10.2478/ffp-2018-0019
ISSN 2199-5907
Distribution license Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Website https://content.sciendo.com/view/journals/ffp/60/3/article-p192.xml
Download https://content.sciendo.com/downloadpdf/journals/ffp/60/3/article-p192.xml (PDF)

Abstract

The development of information technology makes it possible to collect and analyze a growing number of data resources. The results of research, regardless of the discipline, constitute one of the main sources of data. Currently, research results are increasingly being published in the open access model. The open access concept has been accepted and recommended worldwide by many institutions financing and implementing research. Initially, the idea of openness concerned only the results of research and scientific publications; at present, more attention is paid to the problem of sharing scientific data, including raw data. Proceedings towards open data are intricate, as data specificity requires the development of an appropriate legal, technical and organizational model, followed by the implementation of data management policies at both the institutional and national levels.

The aim of this publication is to present the development of the open data concept in the context of open-access ideas and problems related to defining data in the process of data sharing and data management.

Keywords: open access, open data, research data, data management

Introduction

Modern information technology allows for the collection and analysis of a growing number of data resources. At the beginning of our century, it was estimated that new stored information grew about 30% a year between 1999 and 2002.[1] Scientific studies are one of the main sources of data, and their results are increasingly available in the form of scientific publications in the open access model. The beginning of open access (OA) dates back to the 1960s, when the first centers of scientific information were established in the United States of America. Publishing in prestigious scientific journals has become the guarantee of the professional advancement of authors and promotion of research centers.[2] In the opinion of Nielsen[3], the growth of the scientific journal system has created a body of shared knowledge and a collective long-term memory that is the basis for progress in science. New possibilities of dissemination of research findings emerged along with the development of the internet and digital technology. The first journals exclusively published on the internet were launched in the late 1980s.[2] The first open scientific repository in the fields of physics, astronomy, mathematics and computer science was established in 1991, and as of the end of August 2018 it contains 1,433,214 documents.[4] Currently, no library in the world subscribes to all printed scientific journals, as their prices and the number of studies published in them grow faster than the libraries’ budgets. The essence of open access is both access to research results without fee and the possibility of their re-use for scientific purposes—by reading, saving to a computer disk, copying, printing, looking up, and linking, as well as correct quoting through verifying the work authorship. The OA model ensures the process of publication reviewing, does not violate copyrights and adheres to anti-plagiarism regulations.[5]

The goals of open access have been defined in three declarations, that is, the Budapest Open Access Initiative[6], the Bethesda Statement on Open Access Publishing[7] and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities.[8] According to the Registry of Open Access Repository Mandates and Policies (ROARMAP), open access policies were adopted in a total of 83 non-research funders, 56 funding research organizations, 716 university and research institutions, 11 multiple research collectives and 75 sub-units of research organizations (as of March 2018).[9] From Poland, six scientific units have been entered in the register: the Adam Mickiewicz University; Institute of Nuclear Physics Polish Academy of Sciences; the Interdisciplinary Centre of Mathematical and Computational Modelling (ICM), University of Warsaw; the Medical University of Lodz; the Nofer Institute of Occupational Medicine in Lodz and the Polish Academy of Science Institute of Biochemistry and Biophysics.[10]

In Poland, the Ministry of Science and Higher Education (MNiSW) is responsible for science policy. In 2004, Poland signed the OECD Declaration on Access to Research Data From Public Funding. According to the provisions of the declaration, open access to research data is a prerequisite for innovation and improvement of scientific staff qualifications, as well as international scientific and technological cooperation. The document, however, does not provide guidelines for the implementation of the open science model.[11] Since 2010, MNiSW has financed the Springer Open Choice/Open Access Program, under which the employees and students affiliated with all the Polish academic, educational and scientific institutions can publish their research in the scientific journals published in open access by Springer.[12] In 2011, an expert opinion on the implementation and promotion of open access to scientific and educational contents was commissioned by MNiSW. The results of the analysis—carried out with reference to 12 countries and selected international organizations—were used to develop a model for the implementation of the OA model in Poland’s science system. The most important recommendations in the report regarded incorporation of the open access policy in the parametric evaluation of research centers and introduction of the OA mandate in the Polish institutions financing the research. At the same time, the need for OA training and modernization of IT infrastructure was emphasized.[13]

A range of international organizations, and the European Union (EU) as well, have a great influence on shaping the science system in Poland. For example, the EU's documents, such as 2012/417/EU: The Commission Recommendation of 17 July 2012 on Access to and Preservation of Scientific Information[14] and Regulation (EU) No 1290/2013 of the European Parliament and of the Council of 11 December 2013[15] recommend open access to research results financed by the EU; under the Horizon 2020 projects, open access to research results is obligatory. In 2015, the Minister of Science and Higher Education adopted the Directions for the Development of Open Access to Publications and the Results of Scientific Research in Poland. The document emphasizes that dissemination of open access to research results is a global trend, largely related to the development of information and communication technologies.[16] In 2018, the Report on the Implementation of the Policy of Open Access to Scientific Publications in 2015–2017 was published, which discusses the basic problems of the process of introducing open access to scientific content in Poland and provides recommendations for future activities. According to the authors of the report, only 20 research centers and universities in Poland have the institutional policy of OA, and only 18% of all scientific publications are published in the OA system. In Poland, there are no systemic solutions and adequate OA infrastructure. In addition, OA activities are not rewarded in the evaluation of scientific units or in the assessment of the academic staff.[17]

Open data

The development of the idea of open access to data is closely related to the activity of CODATA, the Committee on Data of the International Council for Science, which was established in 1966. The mission of the organization is to promote global cooperation in order to improve the availability and usability of data for all areas of research and to support international science for the benefit of society. CODATA performs its tasks both on an international scale and on the scale of individual member states, including Poland. CODATA also runs publishing activities and collaborates in the organization of large data conferences such as SciDataCon and International Data Week.[18] As a peer-reviewed, open electronic journal, the Data Science Journal publishes articles on the management, dissemination, use and reuse of research data and databases in all areas of research. The scope of the journal includes descriptions of data systems, their implementation and publication, applications, infrastructure, software, legal issues, reproducibility and transparency, accessibility and usability of complex data sets, with particular emphasis on principles, policies and practices for open data.[19]

Open access to data increases transparency of the research process and promotes scientific cooperation and the implementation of interdisciplinary scientific research. The development of some scientific disciplines (e.g., bioinformatics) is based on access to data, while other fields (e.g., astronomy, physics, climatology) are strongly associated with collecting and sharing data at a global level. The growing interest in the availability of research data is to a large extent related to the rapid development of digital technologies. Modern IT solutions enable generating, storing, processing and transmitting ever-larger data sets.[2]

Activities for open access and open data are complementary; however, data specificity requires the development of a legal, technical and organizational model as well as the implementation of appropriate data management procedures. The first key problem in the field of access to data is the lack of an agreed definition of "research data."[20][21] The diversity and specifics of scientific fields cause that research data is defined in various ways, for example:

  • data as registered factual materials, necessary to evaluate the results of scientific research and widely recognized by the scientific community
  • data as information, in particular collected facts and figures that can be used for research and be treated as a basis for further conclusions, discussions or calculations
  • data as records of facts (expressed as numbers, text, graphics or sounds) that are the result of study (e.g., observations, measurements, experiences, experiments, etc.), used as a base for scientific conclusions
  • data as raw data, which was obtained directly as a result of the use of a research tool (e.g., computer program, measuring equipment, survey, questionnaire); organized but not processed, for example, by means of statistical analyses
  • data as descriptions and information on data origin, e.g., metadata

Similar problem arise when defining "open research data." According to James[22], open data can be freely used, distributed by anyone and anywhere for any purpose. The authors of other definitions introduce certain limitations by, for example, licenses specifying the conditions for data sharing and the information on data source. In contrast to the publications made available in the open access model, the essential feature of open data is the possibility of its reuse in new analyses and re-dissemination. For this reason, data may be subject to exclusive rights, that is, copyrights, database rights and regulations on the subject of access to public data or the protection of personal data.[20]

Consistent with the European Commission (EC), open access to research data from the projects financed from public funds should be a standard practice.[23] The EC recommends the FAIR Principles for research data stewardship to make data findable, accessible, interoperable and reusable, therefore easy to find in open repositories or on the internet, for example, by linking to a scientific publication, available to one and all (also on license rights) and stored in standard formats that are easy to open, read and reuse.[24] In 2013, the European Commission, the United States National Science Foundation, the National Institute of Standards and Technology and the Australian Government’s Department of Innovation launched the Research Data Alliance (RDA) as a community-driven organization. The goal of this organization was to create the social and technical infrastructure to enable open sharing of data. As of September 2018, the RDA has over 7251 individual members from 137 countries (representatives of the Interdisciplinary Centre of Mathematical and Computational Modelling [ICM], the University of Warsaw are members of the RDA and represent the Polish scientific community). The Research Data Alliance enables data to be shared without barriers through working groups and interest groups (a total of 93 working groups), formed of experts from all around the world, from academia, industry and government.[25]

The diversity of data collected in research processes, recording formats and storage standards requires implementing system solutions in the area of open access policy at the institutional, national and international levels. Providing open access to research data will enable the reuse of data for analyses, surveys and tests, as well as publication of new results.[17][21][26]

Research data management

Each scientific process has a data life cycle that includes the stages of collecting, processing, analyzing, using and data sharing. The life cycle of scientific data can be extended ("given a second life") by appropriate management procedures, which enable data re-sharing and using in other scientific projects. Specific activities in field data management are required by some scientific journals (e.g., Nature, PLoS, etc.) and are also included in grant agreements, for example those signed with the EC or Poland’s government agency the National Science Centre (NCN).[27] Along with the EC recommendations, research data management should be carried out both during and after the implementation of the scientific project.[24] These activities include defining data, selecting formats, describing metadata and determining current storage location, which is then followed by selecting and preparing data for long-term storage, as well as choosing measures to secure data and to ensure data sharing. The selection of data for archiving should be carried out based on scientific and historical criteria, as well as the assessment of data documentation quality and the possibilities of future use of data and replication. An important issue is also the regulation of legal status during data sharing; for instance, data can be made available without a license on any terms of use, or with the FAIR Data Management Creative Commons license, or with a statement of surrender.[27]

According to Görögh[28], regardless of the methodology used, the long-term protection of scientific data—including raw data at an institutional level—has numerous advantages. It contributes to the comprehensive gathering of knowledge, increases the transparency of research and builds the prestige of a given scientific institution, as well as enhances the development of international cooperation and encourages participation in research consortia. Furthermore, open access to raw data at an institutional level ensures legal data protection, in particular, protection against the risk of copyright.

In the last decade, the role of research data management (RDM) in scientific communication has grown. An RDM policy was implemented at the University of Oxford in 2012. The main policy objectives refer to the evaluation of data collected through university projects, the determination of the minimum period of data storage after the publication of research results and the scope of responsibility of scientists and the university. It is the responsibility of researchers to develop and document procedures in regards to collecting, retaining, using, reusing and sharing scientific data. The university provides access to appropriate services and devices, including support by IT staff, and organizes training in research methods and data management.[29] The policy implementation was preceded by a survey related to the research data and sharing principles. About 300 academic employees responded to the survey questions. The answers confirmed diversity of data (text, numeric, spatial, statistical, multimedia, audio, bibliographic) collected during the implementation of scientific projects. About 75% of the respondents said that sharing data is not necessary, but at the same time, the majority of respondents acknowledged that data management is necessary and important for the research process. They also considered access to the scientific data from completed projects as an inspiration for new research ideas.[30]

A data management policy has also been implemented at the University of Cambridge. Here, regardless of the funder, each project starts with the preparation of a data management plan. The plan includes choosing the data format, software type and the method for storing data. There are recommended formats less vulnerable to obsolescence and easy to describe by metadata, which facilitates data interpretation and reuse in the future.[31]

Open access and data management policies has also been gradually implemented in Polish research centers. The Interdisciplinary Centre for Mathematical and Computer Modelling at the University of Warsaw (ICM UW) has established the first repository of accessible publications by Polish scientists. Since 2011, the Centre of Open Science (CeON) has collected and made available to anyone, free of charge, scientific articles, books, post-conference materials, scientific monographs and doctoral dissertations (in compliance with the CC-BY or CC-BY-SA copyright license). The data repository is compliant with the Open Archives Initiative Protocol for Metadata Harvesting (OAIPMH); therefore, the publications are easily accessible via the websites providing information on digital scientific resources.[32]

Due to the intensive data increase in many fields of science, the demand for technical infrastructure with high capacity, durability and performance has also increased. In Poland, as part of the OCEAN projects (Open Data Centre and Analysis) and RepOD (Open Data Repository), a modern infrastructure for storing and sharing data was implemented in order to manage research data responsibly and provide services to other scientific institutions and public sphere institutions.[33][34]

According to Kędzierska et al.[35], although the Polish scientific community declares the need for sharing raw data with scientific publications and appreciates data re-processing, it still however has more data stored in personal computers, and not in the institutional data repositories. The surveys with regard to the rules of sharing raw data and research results have been carried out in more than 200 research centers in Poland. Respondents indicated direct measurements and experimentation as the main sources of raw data and personal computers as the main tool for data storage. The idea of establishing a central data repository in research centers was approved by about 70% of the respondents, who, at the same time, confirmed greater acceptance of open access to scientific publications than to raw data.[36]

In Poland, a survey was also carried out regarding the collection, storage and sharing of scientific data at the Forestry Research Institute in 2015. Sixty-four percent of the institute’s academic staff took part in the research. The results of the survey confirm the diversity of data collected in the research on forest ecosystems. Most of the data is generated during field measurements, where modern measuring equipment is increasingly used, for example, terrestrial laser scanners and telemetry devices. The size of database resources at the institute has clearly increased in the recent years, which is the result of increased processing and analysis of spatial data. The respondents most often indicated personal computers as a tool for archiving their databases (raw and processed data), thus a means not assuring storage quality and security.

It is worth noting that modern IT tools are available in IBL because in the period of 2010–2014, the institute’s IT system was modernized as a part of an infrastructure project (Project # POIG.02.03.00-00-052/10).[37] The scope of the project included, among others, new technological solutions in the field of data archiving. Most of the survey respondents (82%) considered it useful to use archival data at the stage of drawing scientific conclusions and planning new research. The open access concept was accepted by 74% of respondents, above all, in the context of access to scientific publications. Open access to databases raises many controversies and fears among the scientific staff of the institute (e.g., in the context of copyright).[38]

The presented survey results[30][36][38] characterize various scientific environments both in Poland and other countries. The results obtained also show similarities in the work of the researcher, irrespective of the field of science. The modern measurement and analytical technology available today allows you to generate and process a variety of data resources with growing volume. However, the routine and habit of the scientific community are still a mental barrier to the dissemination of new forms of sharing knowledge in many scientific institutions, even though the concept of open access in science is no longer a niche initiative.[3]

Conclusion

Contemporary science is closely related to the development of information technology. In the internet age, open access to scientific publications as well as research data influences the development of communication/scientific cooperation. Until recently, the achievements of scientific centers were mainly evaluated on the basis of completed projects, scientific publications and professional achievements of the scientific staff. Today, the evaluation criteria are organizational and technological solutions that enable analysis. In this situation, it becomes necessary for research institutions to implement data management policies. Securing data against loss and guaranteeing access to them for future generations is a challenge for science centers, not only in Poland.

References

  1. Lyman, P.; Varian, H.R. (2003). "How Much Information? 2003". University of California at Berkeley. http://groups.ischool.berkeley.edu/archive/how-much-info-2003/. 
  2. 2.0 2.1 2.2 Hofmokl, J.; Tarkowski, A.; Bednarek-Michalska, B. et al. (2009) (PDF). Przewodnik po otwartej nauce. Interdyscyplinarne Centrum Modelowania. pp. 92. ISBN 9788391715048. https://depot.ceon.pl/bitstream/handle/123456789/65/przewodnik-po-otwartej-nauce.pdf. 
  3. 3.0 3.1 Nielsen, M. (17 July 2008). "The Future of Science". MichaelNielsen.org. http://michaelnielsen.org/blog/the-future-of-science-2/. 
  4. "arXiv.org". Cornell University Library. August 2018. https://arxiv.org/. 
  5. Suber, P. (2014) (PDF). Otwarty dostęp. Wydawnictwa Uniwersytetu Warszawskiego. pp. 198. ISBN 9788323515777. https://www.ifj.edu.pl/library/open-access/materials/Suber.pdf. 
  6. Chan, L.; Cuplinskas, D.; Eisen, M. et al. (14 February 2002). "Read the Budapest Open Access Initiative". Budapest Open Access Initiative. https://www.budapestopenaccessinitiative.org/read. 
  7. Brown, P.O.; Lutzker, A.P.; Cabell, D. et al. (20 June 2003). "Bethesda Statement on Open Access Publishing". The SPARC Open Access Newsletter. http://legacy.earlham.edu/~peters/fos/bethesda.htm. 
  8. Max Planck Gessellschaft (22 October 2003). "Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities". Open Access Max Planck Gessellschaft. https://openaccess.mpg.de/Berlin-Declaration. 
  9. "Welcome to ROARMAP". ROARMAP. University of Southampton. March 2018. http://roarmap.eprints.org/. 
  10. "Browse by Country - Poland". ROARMAP. University of Southampton. March 2018. http://roarmap.eprints.org/view/country/616.html. 
  11. OECD (2018). "Declaration on Access to Research Data from Public Funding" (PDF). https://legalinstruments.oecd.org/public/doc/157/157.en.pdf. 
  12. Springer Nature (2018). "Springer Open Choice for Polish Institutions". https://www.springer.com/gp/open-access/springer-open-choice/springer-compact/springer-open-choice-for-polish-institutions/11027898. 
  13. Niezgódka, M. (2011). "Wdrożenie i promocja otwartego dostępu do treści naukowych i edukacyjnych" (PDF). https://depot.ceon.pl/bitstream/handle/123456789/1545/20120208_EKSPERTYZA_OA%20ICM.pdf?sequence=1&isAllowed=y. 
  14. European Commission (17 July 2012). "2012/417/EU: Commission Recommendation of 17 July 2012 on access to and preservation of scientific information". https://publications.europa.eu/en/publication-detail/-/publication/48558fc9-d4c8-11e1-905c-01aa75ed71a1. 
  15. European Commission (12 November 2013). "Regulation (EU) No 1290/2013 of the European Parliament and of the Council of 11 December 2013 laying down the rules for participation and dissemination in "Horizon 2020 - the Framework Programme for Research and Innovation (2014-2020)" and repealing Regulation (EC) No 1906/2006 Text with EEA relevance". https://publications.europa.eu/en/publication-detail/-/publication/3c645e51-6bff-11e3-9afb-01aa75ed71a1/language-en. 
  16. Ministry of Science and Higher Education Poland (13 April 2018). "Kierunki rozwoju otwartego dostępu do publikacji i wyników badań naukowych w Polsce" (PDF). https://www.gov.pl/documents/1068557/1069061/20180413_Kierunki_rozwoju_OD_wersja_ostateczna.pdf. 
  17. 17.0 17.1 Ministry of Science and Higher Education Poland (April 2018). "Raport nt. realizacji polityki otwartego dostępu do publikacji naukowych w latach 2015-2017" (PDF). http://www.bip.mnisw.gov.pl/g2/oryginal/2018_04/7ed78f459cb760b267b19f8f38f8bb22.pdf. 
  18. "About CODATA". Committee on Data of the International Council for Science. 2018. http://www.codata.org/about-codata. 
  19. "Data Science Journal". Committee on Data of the International Council for Science. 2018. https://datascience.codata.org/. 
  20. 20.0 20.1 Leśniak, A.; Morys-Twarowski, M.; Siewicz, K. et al. (2015). Szprot, J.. ed. Open Science in Poland 2014: A Diagnosis. Wydawnictwa ICM. pp. 114. ISBN 9788363490102. http://pon.edu.pl/index.php/nasze-publikacje?pubid=16. 
  21. 21.0 21.1 Strzelczyk, E. (2017). "Otwarte dane badawcze – kolejny krok do otwierania nauki". Materiały konferencyjne EBIB 25. http://open.ebib.pl/ojs/index.php/Mat_konf/article/view/599. 
  22. James, L. (3 October 2013). "Defining Open Data". Open Knowledge International Blog. https://blog.okfn.org/2013/10/03/defining-open-data/. 
  23. "Amsterdam Call for Action on Open Science". Editorial Council. 8 April 2016. https://www.openaccess.nl/en/events/amsterdam-call-for-action-on-open-science. 
  24. 24.0 24.1 European Commission (21 March 2017). "Guidelines to the Rules on Open Access to Scientific Publications and Open Access to Research Data in Horizon 2020" (PDF). http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf. 
  25. "Research Data Alliance". Research Data Alliance Foundation. 2018. https://www.rd-alliance.org/. 
  26. Bednarek-Michalska, B. (2012). "Repozytoria surowych danych — dlaczego biblioteki powinny jeznać?" (PDF). Biuletyn EBIB 8 (135). https://repozytorium.umk.pl/bitstream/handle/item/207/135_michalska_.pdf. 
  27. 27.0 27.1 Hoffman-Sommer, M. (11 December 2015). "Zarządzanie danymi badawczymi". SlideShare. LinkedIn Corporation. https://www.slideshare.net/OpenSciencePlatform/zarzdzanie-danymi-badawczymi. 
  28. Görögh, E. (2014). "An introduction to Open Access in scholarly communication, research data and projects" (PDF). Conference on Grey Literature and Repositories: Proceedings 2014: 48–51. http://invenio.nusl.cz/record/180589/files/idr-879_1.pdf. 
  29. "Research Data Oxford". University of Oxford. 2018. http://researchdata.ox.ac.uk/. 
  30. 30.0 30.1 Wilson. J.A.J. (15 January 2015). "Good Practice in enabling the re-use of Research Data: The University of Oxford" (PDF). University of Oxford. http://helios-eie.ekt.gr/EIE/bitstream/10442/14579/1/Wilson-RECODE.pdf. 
  31. "Welcome to the University of Cambridge Research Data Management website". University of Cambridge. 2018. http://www.data.cam.ac.uk/. 
  32. Grodecka, K. (2013) (PDF). Udane projekty open access w Polsce. Stowarzyszenie EBIB. pp. 40. ISBN 9788363458058. https://www.ifj.edu.pl/library/open-access/materials/Grodecka.pdf. 
  33. "OCEAN Home". University of Warsaw. 2018. http://ocean.icm.edu.pl/. 
  34. "CeON RePOD". University of Warsaw. 2018. https://repod.pon.edu.pl/pl/group/icm-uw. 
  35. Kędzierska, E.; Kavalchuk, N.; Stepniak, J. (2014). "The report from the Survey of Polish Scientific and Research-Development Units" (PDF). Conference on Grey Literature and Repositories: Proceedings 2014: 56–9. http://invenio.nusl.cz/record/180589/files/idr-879_1.pdf. 
  36. 36.0 36.1 Stępniak, J. (20 October 2014). "Otwarte surowe dane i wyniki badań: Raport z badań w krajach grupy wyszehradzkiej". Politechniki Warszawskiej. https://repo.pw.edu.pl/docstore/download.seam?fileId=WUT46458a10ea7c4c5cb9f5559c709dfdd8. 
  37. "Leśne Centrum Informacji". Instytut Badawczy Leśnictwa. 3 November 2011. https://www.ibles.pl/web/zz/-/lesne-centrum-informacji. 
  38. 38.0 38.1 Grygoruk, D. (2017). "Open Access to Research Data on Forest Ecosystems in Poland". Task Quarterly 21 (4): 415–21. doi:10.17466/tq2017/21.4/w. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID, DOI, ISBN, and author information when they were missing from the original reference. The original article lists references alphabetically, but this version—by design—lists them in order of appearance. A few of the original URLs in citations were dead and were updated for this version. A few inline citations were simply website URLs, and those were turned into full citations here. No other modifications were made in accordance with the "no derivatives" portion of the distribution license.