NIH Advancing Data Science

Handling biomedical data efficiently and effectively is of major concern to NIH as the agency https://www.nih.gov routinely supports intramural and extramural research projects that generate tremendous amounts of biomedical data. Regardless of the format, all types of data require hardware, architecture, and platforms to capture, organize, store, provide access, and perform computations.

However, as projects mature, data has traditionally been stored and made available to the broader community via public repositories, data generators, or data aggregators at local institutions. The result is that this model has become strained as the number of data-intensive projects and the amount of data generated for each project continues to grow rapidly. To modernize both present and future data, NIH has released the “NIH Strategic Plan for Data Science”.

NIH has established a new position and is in the process of hiring a Chief Data Strategist to be established within the NIH Office of the Director. The person selected for the position will work in close collaboration with the NIH Scientific Data Council https://datascience.nih.gov and the NIH Data Science Policy Council (DSPC) https://osp.od.nih.gov/scientific-sharing/nih-data-science-policy-council to guide the development and implementation of NIH’s data science activities.

This new leadership position is expected to forge partnerships with federal advisory offices such as the HHS Data Council https://aspe.hhs.gov/health-and-human-services-hhs-data-council , HHS Office of the Chief Technology Officer https://hhs.gov/about/agencies/asa/ocio/index.html, plus the HHS Office of the National Coordinator for Health Information Technology https://www.healthit.gov.

Other Federal agencies will also take part in the discussions such as the National Science Foundation https://www.nsf.gov, Department of Energy https://www.energy.gov, along with other agencies, international funding agencies, and the private sector.

The specific goals to modernize the data infrastructure are to:

  • Support common infrastructure and architecture so that more specialized platforms can be built and interconnected
  • Leverage commercial tools, technologies, services, and expertise, plus adopt and adapt tools and technologies from other fields for use in biomedical research
  • Enhance the biomedical data science research workforce through improved training programs and novel partnerships
  • Enhance data sharing, access, and interoperability
  • Ensure the security and confidentiality of patient and participant data in accordance with NIH requirements and applicable law
  • Improve the ability to capture, curate, validate, store, and analyze clinical data for biomedical research
  • Develop, promote and refine data standards to include standardized data vocabularies and applicable to a broad range of fields
  • Coordinate and collaborate with other federal private and international funding agencies and organizations to prevent unnecessary duplication
  • Ensure that new NIH data resources are connected to other NIH systems upon implementation

 

NIH in modernizing the biomedical research data ecosystem is funding the NIH Data Commons pilot https://commonfund.nih.gov/commons. The main goal for the pilot is to use a shared virtual space to store and work with biomedical research data and analytical tools to enable multiple datasets to be queried together. The pilot will also leverage currently available cloud-computing environments in a flexible and scalable way

Initially, NIH Data Commons will enable researchers to work with three test datasets, to include the National Heart, Lung, and Blood Institute’s “Trans-Omics for Precision Medicine” (TOPMed) program, the NIH Common Fund’s “Genotype-Tissue Expression” (GTEx) program, plus various model-organism data repositories.

Go to https://datascience.nih.gov/sites/default/files/NIH_Strategic_Plan-for-Data_Science_Final_508.pdf to view the “NIH Strategic Plan for Data Science”