Data Processing and Weighting

Kelly L. Myrick; Marko Salvaggio; Lacreisha Ejike-King; Sheba K. Dunston; Rashida Dorsey-Johnson; Meena Khare; Denys T. Lau

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Myrick KL, Salvaggio M, Ejike-King L, et al. Planning, Development, Design, and Operation of the 2016 National Culturally and Linguistically Appropriate Services Survey for Office-based Physicians [Internet]. Atlanta (GA): National Center for Health Statistics (NCHS); 2025 Jan.

Cover of Planning, Development, Design, and Operation of the 2016 National Culturally and Linguistically Appropriate Services Survey for Office-based Physicians

Planning, Development, Design, and Operation of the 2016 National Culturally and Linguistically Appropriate Services Survey for Office-based Physicians [Internet].

Show details

Contents

< Prev Next >

Data Processing and Weighting

Data Edits and Quality Control

SRA International, Inc. was the data collection contractor for the National CLAS Physician Survey. After SRA collected electronic data from the web-based questionnaire, several steps were required for data processing. More detail on this process can be found in the public-use file documentation (15).

Upon receipt, all mailed questionnaires were put into batches. A data entry operator entered the questionnaire batches into the system. Another operator rekeyed each batch into the system. A third person resolved any discrepancies between the first and second data entry operators. The discrepancy rate was 0.58 discrepancies per questionnaire (15). Survey eligibility status could not be determined for a large number of National CLAS Survey physicians (n = 1,115 or 46.4%) (15).

Estimation Procedures and Weighting

Statistics produced from the 2016 National CLAS Physician Survey use a multistage estimation procedure. The objective of these procedures is to produce essentially unbiased national estimates (15). The weighting procedure has three components: 1) inflation by reciprocals of the selection probabilities, 2) ratio adjustment to fixed totals, and 3) adjustment for nonresponse (15). Each of these components is described in more detail later in this report; the information provided is entirely from the 2016 National CLAS Survey public-use file documentation (15).

Inflation by Reciprocals of Sampling Probabilities

The first weight component is the sampling weight (or reciprocal of the physician’s selection probability). Because the survey used a one-stage sample design, the sampling probabilities were determined by sampling strata defined by U.S. Census Bureau region and physician specialty group. For each sampling stratum, the initial selection probability is the number of sample physicians in the stratum divided by the total number of physicians listed in the sampling frame for that stratum.

Ratio Adjustment

The initial sampling weights were adjusted to assure that estimates would reflect the physician population in 2016, when the survey was conducted. A post-ratio adjustment was made within each of the sampling strata defined by Census region and physician specialty group to adjust for changes in the physician population represented in the sampling frame between the time when the sample was selected and when the survey was conducted. The ratio adjustment numerator was the number of sample-eligible physicians listed for the stratum in the American Medical Association master files obtained in June 2017 (that is, the first files obtained after the end of 2016) and American Osteopathic Association master files obtained in June 2017, and the denominator was the estimate of the numerator based on the sample. This adjusted sampling weight is referred to as PS_WGT.

Adjustment for Nonresponse and Eligibility

After the ratio was adjusted, the National CLAS Physician Survey weights were adjusted to account for nonresponse and eligibility by defining a composite response or eligibility class for each sample unit. An adjustment was made for those physicians whose eligibility for the survey was not determined, and for in-scope physicians who did not participate in the survey. Ultimately, adjustments were made by shifting the weights of nonrespondent or noneligible physicians to those who were deemed eligible respondents within the same Census region, specialty type (primary care, surgical specialty, or medical specialty), and physician specialty group. Smoothing techniques were used to avoid outlier weights. This weight, which was formed by multiple sequential adjustments, defines the final survey weight, CLASWEIGHT.

Final Estimation and Analytic Weights

The 2016 National CLAS Physician Survey data file contains the CLASWEIGHT, which is the physician-level analysis weight for producing national estimates from sample data (15). Each record in the data file represents one physician in the sample, and that single physician represents physicians within their region and specialty group. By combining the weights contained in the CLASWEIGHT variable on the 397 sample records for 2016, the user can obtain the estimated total of 293,306 physicians in the United States (15). This number is slightly lower than the physician estimate of 330,582 obtained from the 2016 NAMCS (14). The difference is due to the large number of National CLAS Physician Survey physicians for whom survey eligibility status could not be determined, n = 1,115 or 46.4% (Table D, final disposition codes 5 and 7) (15).

These weights allow data users to calculate estimates and the associated variances. See examples for SUDAAN, SAS, Stata, and SPSS code in Appendix III.

Assessment of Nonresponse Bias and Weighting Evaluation

Nonresponse bias in the National CLAS Physician Survey estimates was evaluated at the physician level. This involved several comparisons. Comparison 1 estimates use the ratio-adjusted weights (PS_WGT) between all sampled physicians and CLAS respondents. Comparison 1 could indicate if differences exist in the selected sample and the respondent sample. If so, this could indicate potential bias before nonresponse and eligibility adjustment. Note that if the selected sample contains many ineligible physicians, the characteristics of the eligible respondent physicians may differ from the sampled physicians despite the lack of nonresponse bias. Comparison 2 estimates among CLAS respondents use the final weights adjusted for nonresponse and unknown eligibility (CLASWEIGHT) between estimates of CLAS respondents and all sampled physicians using PS_WGT. This comparison was conducted to indicate whether the nonresponse and unknown eligibility adjustments improved the estimates by reducing nonresponse bias. Comparison 3 estimates of CLAS respondents use CLASWEIGHT to 2016 NAMCS respondents using the final weights for that survey (PHYZWT). The comparison could indicate whether differences exist for shared variables for CLAS and 2016 NAMCS respondents. The 2016 NAMCS was used for this analysis because it was conducted among office-based physicians in the same survey year. Wald 95% confidence intervals were constructed for comparison.

Comparing different weighting-adjustment methods or different survey systems by direct testing methods requires an understanding of all sources of variation. When such information is deemed limited or difficult to assess, an “overlap of confidence intervals” method is often used (29). This approach was used for this analysis. Although there are multiple ways to assess nonresponse bias, the confidence interval method was chosen because it allowed comparison of the complex structured data in the surveys. Additionally, it did not require a gold standard to be chosen for the analysis, because no gold standard has been established for the National CLAS Physician Survey and NAMCS measures used in this analysis. If the confidence intervals generated by estimates from both weighting schemes in the CLAS survey, or when comparing the CLAS survey with NAMCS, did not overlap, that was considered evidence of a difference. An overlap was treated as indication of similarity between the weights.

Potential bias was assessed according to sex (female or male), physician age (younger than 50 or 50 and older), and metropolitan status of the physician practice (metropolitan statistical area or nonmetropolitan statistical area). These variables were available in the sample files for both surveys and were not involved in the adjustment of weights. Other important factors may have contributed to nonresponse bias that were not measured and, as a result, could not be compared in the bias assessment.

Results of Nonresponse Bias

Figure 1 compares the weighted percent distributions of the 2016 National CLAS Physician Survey for all sampled physicians and respondents by physician sex and age, and the metropolitan status of physician practice. Estimates of females and males were similar between respondent physicians and sampled physicians, using the base weight (PS_WGT). Although the confidence intervals overlapped, they were wider for respondent physicians compared with sampled physicians. Results for CLAS respondents using the adjusted weight (CLASWEIGHT) were similar to those using the base weight. Results by metropolitan statistical area status showed similar patterns. However, estimates of age did not follow this pattern. The percentage of physicians younger than 50 was lower among respondent physicians using the base weights compared with sampled physicians with base weights, and their confidence intervals did not overlap. However, the final weights enhanced the estimate of respondent physicians younger than 50 and resulted in overlapping confidence intervals with sampled physicians with base weights, although estimates were still lower (33.8% for respondent physicians compared with 41.4% for sample physicians). The pattern of enhanced estimates of physicians age 50 and older using the final weights was similar. However, because the percentage estimate of physicians younger than 50 is complementary to the percentage estimate of physicians 50 and older, the relative magnitudes of the percentage estimates using base weights and final weights with respondents and base weights with all sampled physicians were in the opposite direction, compared with the pattern observed for physicians younger than 50.

Figure 1

Weighted percent distribution of all sampled physicians and respondents, by selected physician characteristics, 2016 National CLAS Physician Survey

Figure 2 compares weighted percent distributions after weight adjustments accounting for nonresponses of CLAS survey estimates (CLASWEIGHT) compared with NAMCS estimates (PHYZWT). Based on the overlap of the confidence intervals, no significant differences were observed between physician sex, physician age, or metropolitan statistical area of the physician practice in the National CLAS Physician Survey and 2016 NAMCS.

Figure 2

Final estimates for 2016 National CLAS Physician Survey and 2016 National Ambulatory Medical Care Survey

All material appearing in this report is in the public domain and may be reproduced or copied without permission; citation as to source, however, is appreciated.

Bookshelf ID: NBK612216

Contents

< Prev Next >

PubReader
Print View
Cite this Page
Myrick KL, Salvaggio M, Ejike-King L, et al. Planning, Development, Design, and Operation of the 2016 National Culturally and Linguistically Appropriate Services Survey for Office-based Physicians [Internet]. Atlanta (GA): National Center for Health Statistics (NCHS); 2025 Jan. Data Processing and Weighting.
PDF version of this title (1004K)

In this Page

Other titles in this collection

Vital and Health Statistics. Series 1. Programs and Collection Procedures

Recent Activity

Clear Turn Off Turn On

Data Processing and Weighting - Planning, Development, Design, and Operation of ...
Data Processing and Weighting - Planning, Development, Design, and Operation of the 2016 National Culturally and Linguistically Appropriate Services Survey for Office-based Physicians

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Bookshelf

Planning, Development, Design, and Operation of the 2016 National Culturally and Linguistically Appropriate Services Survey for Office-based Physicians [Internet].

Data Processing and Weighting

Data Edits and Quality Control

Estimation Procedures and Weighting

Inflation by Reciprocals of Sampling Probabilities

Ratio Adjustment

Adjustment for Nonresponse and Eligibility

Final Estimation and Analytic Weights

Assessment of Nonresponse Bias and Weighting Evaluation

Results of Nonresponse Bias

Figure 1

Figure 2

Views

In this Page

Other titles in this collection

Recent Activity