Appendix 7

National Survey on Drug Use and Health

Survey methodology

[ pdf version ]

Note: The following information was excerpted from U.S. Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Results from the 2003 National Survey on Drug Use and Health: Main Findings (Rockville, MD: U.S. Department of Health and Human Services, 2004), pp. 7, 87-95. Non-substantive editorial adaptations have been made.

Survey methodology

The National Survey on Drug Use and Health (NSDUH) is an annual survey of the civilian, noninstitutionalized population of the United States age 12 and older and is sponsored by the U.S. Department of Health and Human Services, Substance Abuse and Mental Health Services Administration. Prior to 2002, the survey was called the National Household Survey on Drug Abuse (NHSDA). Because of improvements to the survey in 2002, the 2002 data constitute a new baseline for tracking trends in substance use and other measures. Therefore, estimates from the 2002 and 2003 NSDUHs should not be compared with estimates from the 2001 and earlier NHSDAs to assess changes in substance use over time.

NSDUH collects information from residents of households, noninstitutional group quarters (e.g., shelters, rooming/boarding houses, college dormitories, migratory worker camps), and civilians living on military bases. Persons excluded from the survey include homeless persons who do not use shelters, military personnel on active duty, and residents of institutional group quarters, such as jails, prisons, hospitals, and nursing homes. Since 1999, the NSDUH interview has been carried out using computer-assisted interviewing (CAI). Most of the questions are administered with audio computer-assisted self-interviewing (ACASI). ACASI is designed to provide the respondent with a highly private and confidential means of responding to questions to increase the level of honest reporting of illicit drug use and other sensitive behaviors. Less sensitive items are administered by interviewers using computer-assisted personal interviewing (CAPI).

Nationally, 130,605 addresses were screened for the 2003 survey, and 67,784 completed interviews were obtained. The survey was conducted from January through December 2003. Weighted response rates for household screening and for interviewing were 90.72% and 77.39%, respectively.

Although the design of the 2002 and 2003 NSDUHs is similar to the design of the 1999 through 2001 surveys, there are methodological differences that affect comparability of 2002 and 2003 estimates with estimates from prior surveys. In addition to the name change, each NSDUH respondent is now given an incentive payment of $30. These changes, implemented as of the 2002 survey, resulted in substantial improvement in survey response rates. The changes also affected respondents' reporting of many critical items that are the basis of prevalence measures reported by the survey each year. Comparability also could be affected by improved data collection quality control procedures that were introduced beginning in 2001, and by incorporating new population data from the 2000 decennial census into NSDUH sample weighting procedures. Analyses of the effects of each of these factors on NSDUH estimates have shown that 2002 and 2003 data should not be compared with earlier NHSDA survey data.

The 2002 and 2003 surveys were part of a coordinated 5-year 50-State sample design with an independent, multistage area probability sample for each of the 50 States and the District of Columbia to facilitate State-level estimation. For the 5-year 50-State design, 8 States were designated as large sample States (California, Florida, Illinois, Michigan, New York, Ohio, Pennsylvania, and Texas) with samples large enough to support direct State estimates. For the 2003 survey, sample sizes in these States ranged from 3,541 to 3,711. For the remaining 42 States and the District of Columbia, smaller, but adequate, samples were selected to support State estimates using small area estimation (SAE) techniques. Sample sizes in these States ranged from 856 to 964 in 2003.

States were first stratified into a total of 900 field interviewer (FI) regions (48 regions in each large sample State and 12 regions in each small sample State). These regions were contiguous geographic areas designed to yield the same number of interviews on average. Within FI regions, adjacent census blocks were combined to form the first-stage sampling units, called area segments. A total of 96 segments per FI region were selected with probability proportional to population size. Eight sample segments per FI region were fielded during the 2003 survey year.

These sampled segments were allocated equally into four separate samples, one for each 3-month period during the year, so that the survey is essentially continuous in the field. In each of these area segments, a listing of all addresses was made, from which a sample of 170,762 addresses was selected. Of the selected addresses, 143,485 were determined to be eligible sample units. In these sample units (which can be either households or units within group quarters), sample persons were randomly selected using an automated screening procedure programmed in a handheld computer carried by the interviewers. The number of sample units completing the screening was 130,605. Youths age 12 to 17 and young adults age 18 to 25 were oversampled at this stage so that each State's sample was approximately equally distributed among three major age groups. Because of the larger sample size, there was no need to oversample racial/ethnic groups, as was done for NHSDAs prior to 1999. A total of 81,631 persons were selected nationwide. The final sample of 67,784 persons was representative of the U.S. general population age 12 and older.

The data collection method involves in-person interviews, incorporating procedures that would be likely to increase respondents' cooperation and willingness to report honestly about their illicit drug use behavior. Confidentiality is stressed in all written and oral communications with potential respondents. Respondents' names are not collected with the data and computer-assisted interviewing (CAI) methods, including audio computer-assisted self-interviewing (ACASI), are used to provide a private and confidential setting to complete the interview.

Introductory letters are sent to sampled addresses, followed by an interviewer visit. A 5-minute screening procedure conducted using a handheld computer involves listing all household members along with their basic demographic data. The computer uses the demographic data in a preprogrammed selection formula to select zero to two sample person(s), depending on the composition of the household. This selection process is designed to provide the necessary sample sizes for the specified population age groupings.

The interviewer requests the selected respondent to identify a private area in the home to conduct the interview away from other household members. The interview averages about 1 hour and includes a combination of CAPI and ACASI. The interview begins in CAPI mode with the FI reading the questions from the computer screen and entering the respondent's replies into the computer. The interview then transitions to the ACASI mode for the sensitive questions. In this mode, the respondent can read the questions silently on the computer screen and/or listen to the questions read through headphones and enter his or her responses directly into the computer. At the conclusion of the ACASI section, the interview returns to the CAPI mode with the interviewer completing the questionnaire. All respondents who complete a full interview are given a $30 cash payment.

Even though editing and consistency checks are done by the CAI program during the interview, additional, more complex, edits and consistency checks also are conducted. Cases are retained only if respondents provided data on lifetime use of cigarettes and at least nine other substances. An important aspect of subsequent editing routines involves assignment of codes when respondents legitimately were skipped out of questions that definitely did not apply to them (e.g., if respondents never used a drug of interest). For key drug use measures, the editing procedures identify inconsistencies between related variables. Inconsistencies in variables pertaining to the most recent period that respondents used a drug are edited by assigning an "indefinite" period of use (e.g., use at some point in the lifetime, which could mean use in the past 30 days or past 12 months). Inconsistencies in other key drug use variables are edited by assigning missing data codes. These inconsistencies then are resolved through statistical imputation procedures discussed below.

For some key variables that still have missing or ambiguous values after editing, statistical imputation is used to replace these values with appropriate response codes. For example, the response is ambiguous if the editing procedures assigned a respondent's most recent use of a drug to "use at some point in the lifetime," with no definite period within the lifetime. In this case, the imputation procedures assign a definite value for when the respondent last used the drug (e.g., in the past 30 days, more than 30 days ago but within the past 12 months, more than 12 months ago). Similarly, if the response is completely missing, the imputation procedures replace missing values with nonmissing ones.

In most cases, missing or ambiguous values are imputed using a methodology called predictive mean neighborhoods (PMN), which was developed specifically for the 1999 survey and used in all subsequent survey years. PMN is a combination of a model-assisted imputation methodology and a random nearest neighbor hot-deck procedure. Whenever feasible, the imputation of variables using PMN is multivariate, in which imputation is accomplished on several response variables at once. In general, hot-deck imputation replaces a missing or ambiguous value taken from a "similar" respondent who has complete data. For random nearest neighbor hot-deck imputation, the missing or ambiguous value is replaced by a responding value from a donor randomly selected from a set of potential donors. Potential donors are those defined to be "close" to the unit with the missing or ambiguous value, according to a predefined function, called a distance metric. In the hot-deck stage of PMN, the set of candidate donors (the "neighborhood") consists of respondents with complete data who have a predicted mean close to that of the item nonrespondent. In particular, the neighborhood consists of either the set of the closest 30 respondents or the set of respondents with a predicted mean (or means) within 5% of the predicted mean(s) of the item nonrespondent, whichever set is smaller. If no respondents are available who have a predicted mean (or means) within 5% of the item nonrespondent, the respondent with the predicted mean(s) closest to that of the item nonrespondent is selected as the donor.

Although statistical imputation could not proceed separately within each State due to insufficient pools of donors, information about each respondent's State of residence was incorporated in the modeling and hot-deck steps. For most drugs, respondents were separated into three "State usage" categories as follows: respondents from States with high usage of a given drug were placed in one category, respondents from States with medium usage into another, and the remainder into a third category. This categorical "State rank" variable was used as one set of covariates in the imputation models. In addition, eligible donors for each item nonrespondent were restricted to be of the same State usage category (i.e., the same "State rank") as the nonrespondent.

The general approach to developing and calibrating analysis weights involved developing design-based weights, as the inverse of the selection probabilities of the households and persons. Adjustment factors, then were applied to the design-based weights to adjust for nonresponse, to poststratify to known population control totals, and to control for extreme weights when necessary. In view of the importance of State-level estimates with the 50-State design, it was necessary to control for a much larger number of known population totals. Several other modifications to the general weight adjustment strategy that had been used in past surveys also were implemented for the first time beginning with the 1999 CAI sample.

This general approach was used at several stages of the weight adjustment process, including (1) adjustment of household weights for nonresponse at the screener level, (2) poststratification of household weights to meet population controls for various demographic groups by State, (3) adjustment of household weights for extremes, (4) poststratification of selected person weights, (5) adjustment of personweights for nonresponse at the questionnaire level, (6) poststratification of person weights, and (7) adjustment of person weights for extremes.

An important limitation of the NSDUH estimates of drug use prevalence is that they are designed to describe only the target population of the survey, i.e., the civilian noninstitutionalized population age 12 and older. Although this population includes almost 98% of the total U.S. population age 12 and older, it does exclude some important and unique subpopulations who may have very different drug-using patterns. The survey excludes active military personnel, who have been shown to have significantly lower rates of illicit drug use. Persons living in institutional group quarters, such as prisons and residential drug treatment centers, are not included in the NSDUH and have been shown in other surveys to have higher rates of illicit drug use. Also excluded are homeless persons not living in a shelter on the survey date, another population shown to have higher than average rates of illicit drug use.

Table 1. NSDUH sample sizes by demographic characteristics
  2002 2003
Total 68,126 67,784
Male 32,76732,611
Female 35,359 35,173
Age group 
12 to 17 years23,645 22,665
18 to 25 years23,06622,738
26 years and older21,41522,381
Race, ethnicity 
White, non-Hispanic46,54845,870
Black, non-Hispanic 8,278 8,153
American Indian or Alaska Native 921 845
Native Hawaiian or other Pacific Islander 273 252
Asian 1,890 2,048
More than one race 1,405 1,543
Hispanic 8,811 9,073
Note: These sample size figures are the unweighted number of completed interviews in the 2002 and 2003 National Surveys on Drug Use and Health.