Named data networking for genomics data management and integrated workflows
Managing and sharing big data in the cloud has its own set of cyberinfrastructure challenges, and that includes genomic research data. With large data sets being stored in numerous geographically distributed repositories around the world, effectively using this data for research can become enormously difficult. As such, Ogle et al., writing in Frontiers in Big Data, present their efforts towards reducing these network challenges using an internet architecture called named data networking (NDN). After discussing NDN, the authors describe the problems that come from wanting to manage and use big genomic data sets, as well as how NDN-based solutions can alleviate those problems. They then describe their method towards implementing NDN with genomics workflow tool GEMmaker in a cloud computing platform called the Pacific Research Platform. Through their efforts, they conclude that “NDN can serve data from anywhere, simplifying data management,” and when integrated with GEMmaker, “in-network caching can speed up data retrieval and insertion into the workflow by six times,” further improving use of big genomic data in practical research.