Linking NSF Scientist and Engineering Data to Scientific Productivity Data

February 29, 2008

National Science Foundation

Room 120

Arlington, Virginia

Workshop Organizers: Donna Ginther, Jinyoung Kim, and Gerald Marschke



Workshop Objectives

Research on innovators’ scientific careers using the Survey of Doctorate Recipients (SDR) has been hampered by a lack of information on scientific productivity measured by publications, citations, patents, and grant awards. Likewise, research on innovation and knowledge diffusion has been hampered by lack of information on inventor characteristics. The goal of this workshop is to foster the creation of the first nationally representative data set that links inventor’s characteristics with their innovations, to create the Survey of Doctorate Recipients Productivity Data (SDRPD). The SDRPD will create synergies between the fields of labor economics and industrial organization, and these data will provide greater insight into the process of innovation and entrepreneurship.

The workshop will bring together technical, research, and data personnel to explore how to match U.S. patent and publication records to the doctorate holder-level data in the Survey of Earned Doctorates (SED) and the SDR databases. With these data, researchers would be able to address important research and policy questions about the science and technology enterprise that are difficult if not impossible to address with existing datasets. These questions include how scientific careers develop in the U.S. economy, how the R&D enterprise is organized in industry and academe, and how the scientific labor market facilitates the diffusion of new technologies and know-how within the economy, among many others.  

A workshop is necessary because of the complexity and difficulty of matching the SDR with other data sources. The SDR can only be matched by name and address—variables that cannot be matched without error. In particular, data could be under-matched when names are spelled or transcribed incorrectly. Data can also be over-matched when names are very common. These two matching errors require the development of automated matching algorithms with statistical decision rules. The goal of the workshop is to bring individuals with data-matching expertise together with researchers interested in the matched data to establish best-practices for creating the SDRPD.

This workshop is funded by National Science Foundation Grant SRS-0725467.



8:30-9:00 am

Continental Breakfast


9:00-9:30 am

Opening Remarks

Donna Ginther (University of Kansas), Jinyoung Kim (Korea University), Gerald Marschke (University at Albany)

Donna Ginther, "Linking Academic Productivity Data to the SDR: An Attainable Goal"

9:30-10:45 am

Researcher Uses of Linked Data  

Richard Freeman (Harvard University), "What Magic Can We Do with Linked Data Sets"

Paula Stephan (Georgia State University), "Matching the SDR to Publications in the 1980s: Experiences and Outcomes"

Julia Lane (NSF), "Linking Administrative and Survey Data: Practical Experiences"


10:45-11:00 am



11:00-11:15 am

Coffee Break


11:15-12:00 pm

Linking Methods

William Winkler (U.S. Census Bureau), "Overview of Record Linkage for Name Matching" (references)

Michael Larsen (Iowa State University), "Practical and Theoretical Considerations for Linking Survey Data with Other Sources"

12:00-12:30 pm Discussion


12:30-1:30 pm



1:30-2:30 pm

Data dissemination

Tim Mulcahy and Chet Bowie (National Opinion Research Center), "NORC Data Enclave"

Stephen Cohen (NSF), "NSF Data Policies"


2:30-2:45 pm



2:45-3:00 pm

Coffee Break


3:00-3:45 pm

Conference Summary

Donna Ginther (University of Kansas), Jinyoung Kim (Korea University), Gerald Marschke (University at Albany)

Other useful materials:

  1. Trajtenberg, Shiff, and Melamed, "The 'Names Game'--Harnessing Inventors' Patent Data for Economic Research," (in presentation format), February 2008.


 Additional information about the conference can be found in the project summary and description from the NSF grant proposal.