NIH Advancing Data Science

Handling biomedical data efficiently and effectively is of major concern to NIH as the agency routinely supports intramural and extramural research projects that generate tremendous amounts of biomedical data. Regardless of the format, all types of data require hardware, architecture, and platforms to capture, organize, store, provide access, and perform computations.

However, as projects mature, data has traditionally been stored and made available to the broader community via public repositories, data generators, or data aggregators at local institutions. The result is that this model has become strained as the number of data-intensive projects and the amount of data generated for each project continues to grow rapidly. To modernize both present and future data, NIH has released the “NIH Strategic Plan for Data Science”.

NIH has established a new position and is in the process of hiring a Chief Data Strategist to be established within the NIH Office of the Director. The person selected for the position will work in close collaboration with the NIH Scientific Data Council and the NIH Data Science Policy Council (DSPC) to guide the development and implementation of NIH’s data science activities.

This new leadership position is expected to forge partnerships with federal advisory offices such as the HHS Data Council , HHS Office of the Chief Technology Officer, plus the HHS Office of the National Coordinator for Health Information Technology

Other Federal agencies will also take part in the discussions such as the National Science Foundation, Department of Energy, along with other agencies, international funding agencies, and the private sector.

The specific goals to modernize the data infrastructure are to:

  • Support common infrastructure and architecture so that more specialized platforms can be built and interconnected
  • Leverage commercial tools, technologies, services, and expertise, plus adopt and adapt tools and technologies from other fields for use in biomedical research
  • Enhance the biomedical data science research workforce through improved training programs and novel partnerships
  • Enhance data sharing, access, and interoperability
  • Ensure the security and confidentiality of patient and participant data in accordance with NIH requirements and applicable law
  • Improve the ability to capture, curate, validate, store, and analyze clinical data for biomedical research
  • Develop, promote and refine data standards to include standardized data vocabularies and applicable to a broad range of fields
  • Coordinate and collaborate with other federal private and international funding agencies and organizations to prevent unnecessary duplication
  • Ensure that new NIH data resources are connected to other NIH systems upon implementation


NIH in modernizing the biomedical research data ecosystem is funding the NIH Data Commons pilot The main goal for the pilot is to use a shared virtual space to store and work with biomedical research data and analytical tools to enable multiple datasets to be queried together. The pilot will also leverage currently available cloud-computing environments in a flexible and scalable way

Initially, NIH Data Commons will enable researchers to work with three test datasets, to include the National Heart, Lung, and Blood Institute’s “Trans-Omics for Precision Medicine” (TOPMed) program, the NIH Common Fund’s “Genotype-Tissue Expression” (GTEx) program, plus various model-organism data repositories.

Go to to view the “NIH Strategic Plan for Data Science”

Share Button