Development and Initial Validation of an Instrument to Measure Physicians' Use of, Knowledge about, and Attitudes Toward Computers
- Affiliations of the authors: University of North Carolina, Chapel Hill, NC (RDC); University of Virginia, Charlottesville, VA (WMD); University of Pittsburgh, Pittsburgh, PA (CPF)
- Correspondence and reprints: Randy D. Cork, MD, Oceania, Suite 103, 3145 Porter Drive, Palo Alto, CA 94304. e-mail: 〈 〉
- Received 29 April 1996
- Accepted 19 September 1997
This paper describes details of four scales of a questionnaire—“Computers in Medical Care”—measuring attributes of computer use, self-reported computer knowledge, computer feature demand, and computer optimism of academic physicians. The reliability (i.e., precision, or degree to which the scale's result is reproducible) and validity (i.e., accuracy, or degree to which the scale actually measures what it is supposed to measure) of each scale were examined by analysis of the responses of 771 full-time academic physicians across four departments at five academic medical centers in the United States. The objectives of this paper were to define the psychometric properties of the scales as the basis for a future demonstration study and, pending the results of further validity studies, to provide the questionnaire and scales to the medical informatics community as a tool for measuring the attitudes of health care providers.
Methodology The dimensionality of each scale and degree of association of each item with the attribute of interest were determined by principal components factor analysis with othogonal varimax rotation. Weakly associated items (factor loading <.40) were deleted. The reliability of each resultant scale was computed using Cronbach's alpha coefficient. Content validity was addressed during scale construction; construct validity was examined through factor analysis and by correlational analyses.
Results Attributes of computer use, computer knowledge, and computer optimism were unidimensional, with the corresponding scales having reliabilities of.79, .91, and .86, respectively. The computer-feature demand attribute differentiated into two dimensions: the first reflecting demand for high-level functionality with reliability of .81 and the second demand for usability with reliability of .69. There were significant positive correlations between computer use, computer knowledge, and computer optimism scale scores and respondents' hands-on computer use, computer training, and self-reported computer sophistication. In addition, items posited on the computer knowledge scale to be more difficult generated significantly lower scores.
Conclusion The four scales of the questionnaire appear to measure with adequate reliability five attributes of academic physicians' attitudes toward computers in medical care: computer use, self-reported computer knowledge, demand for computer functionality, demand for computer usability, and computer optimism. Results of initial validity studies are positive, but further validation of the scales is needed. The URL of a downloadable HTML copy of the questionnaire is provided.
The potential benefits of the application of computers to medical care are well recognized1; however, physicians must adopt and utilize computer technology as a part of their practices if these benefits are to be realized. Some authors believe that the medical profession as a whole has been slow to utilize computers for patient care.1 2
Many factors affect the use of computers by physicians, including personality characteristics,3 4 speciality,5 prior computing experience,5 and attitude toward computers and medical computing.6 7 8 Young7 notes “the nature of the doctor's work, his attitudes, interests, and enthusiasms” to be “the major reason for the non-acceptance of computer systems.” Anderson et al.5 9 found that physicians' attitudes were significantly related to hospital information system (HIS) use9 and that these attitudes “account for a significant portion of variance in HIS use even when other variables are controlled…. ”5 For this reason, it is important to develop methods for understanding and accurately measuring attributes of physicians and other health professionals that may predict their acceptance and mode of use of computer systems and thus guide the design of such systems. These attributes include how physicians currently use computers and how much they know about computers as well as their relevant beliefs and attitudes. This need has already been recognized by Farrell et al.,10 who note (in reference to practicing psychologists) that such measures “could be used to further explore the relationship between attitudes and computer implementation, to identify variables related to practitioners' attitudes toward computers, and to design and evaluate the impact of interventions aimed at overcoming practitioner resistance.”
A common approach to such measurement is the self-administered questionnaire composed of multiple separate items organized into scales, with each scale assumed to measure a particular attribute or attitude dimension. Use of multiple items to assess each dimension is essential to the measurement process. To develop these questionnaires it is necessary to conduct studies that examine the reliability and validity of the measurement process itself. Such measurement studies are distinct from more common demonstration studies, which make descriptive or comparative assertions based on the results of measurements. Measurement studies are important because they: 1) determine the psychometric properties (reliability and validity) of an instrument and consequently the degree of confidence that can be placed in assertions based on that instrument, and 2) define and document the instrument for reuse by future researchers.11
Two important properties of an instrument determined by measurement studies are reliability and validity. Reliability is generally synonymous with precision and indicates the degree to which the measurement process is consistent or reproducible. Reliability may be quantified by administering an instrument to the same group of subjects multiple times (test-retest reliability) or by examining the concordance between multiple items provided once to a group of subjects (internal consistency reliability). Cronbach's alpha12 is one commonly accepted measure of internal consistency reliability. The value of alpha ranges from zero (unreliable) to one (perfect reliability), with a value of.70 or greater considered acceptable for most purposes.13 Validity is generally synonymous with accuracy and is the degree to which the process measures what it is intended to measure. Three kinds of validity are generally recognized: 1) content validity: do the items appear to measure what they are intended to measure? 2) construct validity: do the item scores intercorrelate with other measures as expected? and 3) criterion-related validity: do the item scores correlate with an external standard?14
Measurement studies have been conducted for instruments measuring attitudes toward computers among varying groups including students,15 16 17 18 19 20 hospital information-system personnel,21 psychologists,10 and nurses.22 23 24 These studies have provided well-defined instruments that have subsequently been used by other authors to examine the attitudes of new populations. For example, at least six studies25 26 27 28 29 30 have examined nurses' attitudes toward computers using the “Nurses' Attitudes Toward Computerization Questionnaire” developed by Stronge and Brodt in 1985.23 Several compendia of survey instruments that measure attitudes of workers both in and out of health care are available.31 32 33
A number of surveys of physicians' attitudes toward computers in medical care have been conducted over the past 30 years. Most34 35 36 37 38 39 40 41 42 43 44 45 46 47 report only demonstration results and have not provided any evaluation of the psychometric properties of the measurement instrument employed. Others, as summarized in Table 1,48 49 50 51 52 53 54 55 have provided psychometric information as part of their reported results. These studies have addressed a wide variety of constructs, which may be categorized as opinions on computer characteristics, computer effects on health care, computer effects on health care personnel, prior computer experience, general attitudes towards computers, attitudes toward computer use, attitudes toward computer use in medicine, and user characteristics. As shown in Table 1, each study typically begins with a set of a priori constructs: the attributes that the instrument is hypothesized to be assessing. The subsequent data analysis, often employing the statistical technique of factor analysis, generates a set of a posteriori constructs: what, based on the data, the instrument appears to be assessing.
Although it is possible to use instruments developed for other health professions to measure physicians' attitudes toward computers,48 this approach may prove less than satisfactory. First, it is not known whether physicians and other professionals share a similar structure of attitudes and beliefs. Attitudes toward computers have been shown to differ among professions, including professions within health care.50 51 Such differences are not surprising, given the differing training, experience, roles, and activities of these professions. Even more important, instruments developed for other professions (e.g., nursing) may not address the unique training, roles, activities, and responsibilities of physicians.
While the literature on attitudes of physicians toward computers is fairly extensive, some important attributes have not been rigorously explored. These include for what purposes health professionals actually use computers and how much they known about the underlying technology. While several authors have previously measured physician attitudes toward the use of computers or have measured actual physician use of computers in medicine, only Anderson et al.5 8 9 have addressed computer use as a psychologic construct. To our knowledge, psychometric analysis of physician knowledge of computers has not been previously reported. One prior study49 measured enduser computer sophistication without specification of parameters describing how accurate or precise measurements using these methods would be. Teach and Shortliffe's widely cited study, published in 1981,52 employed as a priori constructs physicians' expectations, demands, acceptability, experience, and knowledge of computer-based consultation systems. While the wide recognition of this study suggests that the included constructs are of importance to the field, the focus on consultation systems and the use of a convenience sample of professional meeting attendees to validate the instrument are factors limiting the ability to generalize their results.
Study Goals and Questions
Because physicians' attitudes and other attributes appear to be important in determining the use of computers and because existing instruments may be less than satisfactory for measuring these attributes, we sought to develop an instrument specifically designed for physicians that measures with well-defined psychometric properties four important attributes regarding computers in medical care. To these ends, we convened a working group to modify the instrument originally used in the often-cited Teach and Shortliffe study.52 After developing the questionnaire,* we administered it to academic physicians at five institutions, generating a study sample with 771 subjects. The resulting data allowed us to explore the psychometric properties of the item sets (scales) purported to address each attribute:
Is the dimensionality of each attribute, as measured by the scale, as hypothesized?
Which items appear not to address the attribute and thus not to belong in the item set?
What is the reliability of the scales formed by these item sets?
To what extent does each scale appear to be a valid measure of the associated attribute?
Demonstration aspects of this research, focusing on the measured values of the attributes and their relationships to a variety of physician characteristics, have been the subject of some preliminary work56 57 and will be the subject of a future report.
In developing a questionnaire rooted in the instrument developed by Teach and Shortliffe,52 our goal was to develop an instrument both more general than the original in its evaluation of physicians' attitudes toward computer-based clinical decision aids and more representative of the current medical computing environment, yet similar enough to allow comparison with the results of the original study. In addition, we designed the new instrument to include measures of computer use, not included in most prior studies, and to specifically address the roles and activities of physicians.
To develop the instrument, we established a six-member group experienced in medical informatics and evaluation/measurement techniques. The group comprised two of this manuscript's coauthors (WMD and CPF) as well as four persons whose contributions are cited in the acknowledgments. The group engaged in an item-design process that proceeded over four months. After reviewing the original Teach and Shortliffe instrument and results of the reported study of its measurement properties,52 the group conceptualized four physician attributes to be assessed by the revised instrument: 1) extent of computer use; 2) self-reported knowledge of computer technology; 3) feature demand: how sophisticated information systems must be before physicians would be willing to use them; and 4) optimism about the impact of information technology on health care. As indicated in Table 1, these attributes closely resemble those addressed by the Teach and Shortliffe instrument. The new instrument was created by adding and deleting items from the original. Other items were modified to broaden their scope or update them in light of more recent technology. The final instrument, which has been briefly described elsewhere,56 57 consists of 89 items in four sections:
Section 1: Demographics. Respondent's age, gender, medical specialty/subspecialty, and percentage of professional time spent in each of the typical activities of an academic physician (clinical care and clinical teaching, didactic teaching, research, administration).
Section 2: Computer Experience. Number of hours of hands-on computer use per week, type of computer(s) used (IBM-compatible, Macintosh, terminal), configuration and location of computer(s) used (desktop at office, desktop at home, laptop), extent of prior computer training and experience, and self-rated computer sophistication. This section also included a set of ten items hypothesized to assess the “computer use” attribute. Each item listed a specific task undertaken by an academic physician along with five options for the respondent to indicate the relative frequency with which he or she personally uses a computer for this task.
Section 3: Computer Knowledge. This section comprised the 18 items used to assess the “self-reported computer knowledge” attribute. For each item, using a three-point response scale, respondents indicated the extent of their understanding of the distinction between a pair of medical computing concepts—for example, “hardware versus software.” This format was adapted directly from the Teach and Shortliffe instrument.
Section 4: Applications of Computers in Medicine. This part of the survey included three subsections. The first listed 18 potential functions of computers in medicine and asked the respondent to indicate the six considered highest priority and the six considered lowest priority for future system development. This subsection is not viewed as measuring an attribute of the respondent and is not further considered here. The second subsection included 17 items assessing the attitude of “feature demand.” Each item presented a feature or capability of a medical computing system. Using a five-point scale, respondents indicated the extent to which it was necessary that a system have each feature. The third subsection included the 18 items assessing the “computer optimism” attribute, which was modified from the “expectations” scale in the Teach and Shortliffe instrument. Each item listed a potential effect of computers on medicine or health care. Using a five-point reponse scale, the respondent indicated the extent to which each effect is considered beneficial or detrimental.
The sample consisted of 1,478 full-time physician faculty members in the Departments of Internal Medicine, Surgery, Radiology, and Radiation Oncology at Stanford University, the University of North Carolina at Chapel Hill, the University of California at San Francisco, Northwestern University, and the University of Illinois at Chicago. Responses were received from 771 subjects, for a response rate of 52%. The four specialties sampled reflect a diversity of medical practice. The institutions in the sample span a range of governance modes and geographic regions.
Questionnaires were distributed via campus mail accompanied by a cover letter generated by a faculty member identified with medical informatics at each institution. The cover letter assured confidentiality of the responses. Completed instruments were returned via campus mail. A second questionnaire was mailed to all subjects four to five weeks after the initial mailing, with a response requested only from those who had previously not responded. It is estimated that the instrument required 20 minutes to complete.
Responses were entered into a personal computer spreadsheet and checked for accuracy. Analyses were performed using Microsoft Excel for Windows version 5.0 and the statistical analysis program SYSTAT for Windows version 5.05 (SYSTAT Inc., Evanston, IL).
Analyses focused on the four item sets hypothesized to address computer use, computer knowledge, feature demand, and optimism. Using pairwise deletion of missing values, we conducted a principal components factor analysis with orthogonal varimax rotation for each item set.58 Each analysis was performed initially specifying one, two, and three-factor solutions. The sorted factor loadings, eigen values,58 59 and scree plots60 resulting from these analyses were examined to identify the number of dimensions or factors that made up the best solution for each item set. Some respondents had multiple missing values within an item set. We therefore established, for each item set, a threshold number of responses necessary to include the subject in the factor analysis of that set. Subjects below the threshold were excluded.
After determining the dimensionality of each item set from the factor analyses, we examined the factor loadings to determine whether all items in the set were associated with the attribute of interest. Items with a factor loading less than.40 were deleted. We computed the reliability, using Cronbach's alpha coefficient, of the resulting item set for each attribute. The reliability coefficient indicates the precision of measurement conducted by assigning each respondent an attribute score based on the summed (or averaged) responses across the items in the set.
The methods used in this study also allowed us to address some aspects of the validity of each attribute. Content validity was addressed through the instrument development process, both by basing the items on the prior Teach and Shortliffe instrument52 and by collegial development of the new items using a panel experienced in medical informatics. Construct validity was established in part by the results of the factor analysis. We hypothesized that the use, knowledge, and optimism attributes would be unidimensional. Based on the Teach and Shortliffe study,52 We expected a multidimensional structure for the feature demand scale. Construct validity was also explored by examination of the correlations among the attributes themselves and by examination of correlations between the attributes and other characteristics of the respondents as measured by selected other items of the survey. Specifically, we hypothesized that the computer use and computer knowledge attributes should be highly intercorrelated, whereas the other attributes should be only modestly intercorrelated. In the special case of the item set addressing computer knowledge, subsets of the items were hypothesized to fall into three categories of difficulty. Higher mean scores for items seen as less difficult would be evidence of the construct validity of this scale. Criterion-related validity was not addressed explicitly in this study.
In this section we first report some demographic characteristics and other characteristics of the respondents. We then report factor analysis results, with reliability indices, for each item set. Following this, we include a section addressing validity of all attributes.
Of the respondents (n = 771), 80.4% were male; the average age was 45.0 (±.4)† years. The distribution of specialties was 55.6% internal medicine, 23.9% surgery, 11.5% radiology, and 2.6% radiation oncology. An additional 6.4% of respondents reported specialties in other fields, primarily emergency medicine. Since these persons were on mailing lists of the targeted departments and likely had joint appointments, we retained them in the sample.
Respondents indicated that they devoted, on average, 49.1% (±.9%) of their professional time to clinical care and clinical teaching, 26.7% (±.9%) to research, 15.3% (±.6%) to administration, and 8.9% (±.3%) to didactic teaching. They reported a mean of 9.5 (±.3) hours of hands-on use of a computer per week. The modal respondent had participated in one (of six possible) types of computer training, with “self-guided learning about computers” as the dominant type. Respondents self-rated their computer sophistication on a five-point scale ranging from “very unsophisticated” (coded to one) to “very sophisticated” (coded to five). Mean score was 2.8, with a mode and median of 3.
Factor Analyses and Reliabilities
Items Assessing Computer Use
For the set of ten items addressing computer use, response options ranged from “Never perform this task” (coded to one) to “Always use a computer” (coded to five). Excluded from analysis were the responses of 87 physicians who responded “zero” to a preceding question, “In a typical week, how many hours do you personally use a computer hands-on?” Therefore, the results for this item set reflect only computer users in the sample. Also excluded were the responses of three additional physicians who completed less than eight of the ten items. Results of the remaining 681 respondents, including only those items with factor loading greater than.4 and sorted by factor loading, are provided in Table 2. Scree-plot analysis supported a one-factor solution including seven of the ten items and explaining 46% of the total variance. The reliability of the resulting seven-item scale was .79.
Adopting a one-factor solution necessitated that three items relating primarily to clinical uses of computers (“documenting patient information,” “accessing clinical data,” and “scheduling patient appointments”) were eliminated from the scale due to low factor loadings. This affected the interpretation of the scale in ways that will be discussed below.
Items Assessing Computer Knowledge
For this set, responses to each pair of computing concepts ranged from “I don't understand the distinction at all” (coded to one) to “I can define the distinction precisely” (coded to three). Responses of 16 physicians who completed fewer than 16 of the 18 items were excluded. Results for the remaining 755 respondents are provided in Table 3, sorted by factor loading. The scree plot supported a one-factor solution explaining 41% of the total variance. All 18 items displayed factor loadings greater than.40 and were retained. The reliability of the resulting scale was .91.
Items Assessing Feature Demand
For this item set, respondents rated each of 17 features of computer systems on a response scale ranging from “Vitally necessary” (coded to one) to “Not necessary” (coded to four). Responses of 86 physicians who completed less than 15 of the 17 items were excluded, leaving 685 respondents in the analysis. As shown in Table 4, factor analysis suggested that this item set was two-dimensional.
Two items (“Availability of on-line help” and “Confidentiality and security better than the paper record”) displayed factor loadings of less than.40 for both factors and were eliminated from both scales.
The first factor explained 23% of the total variance and included seven items with a reliability of .81. By inspection of the items loading on this first dimension, it was characterized as “demand for sophisticated computer features” as measured by demand for the capability to explain the rationale for patient care advice, provide accurate treatment recommendations, make accurate diagnoses, and other functions as shown in Table 4.
The second factor explains 17% of the total variance and included eight items with a reliability of .69. It was characterized as “demand for usability” as measured by demand for the capability to respond to queries in less than five seconds, display images in less than 30 seconds, allow access at any place in clinical setting, and other functions as shown in Table 4.
Items Assessing Optimism
In this set of 17 items, responses ranged from a belief that each stated impact of computers on health care would be “highly detrimental” (coded one) to “highly beneficial” (coded five). Responses of 50 physicians who completed less than 16 of the 17 items were excluded. Results of the remaining 721 respondents are provided in Table 5, sorted by factor loading. Results were consistent with a one-factor solution.
One item (“access to health care in remote or rural areas”) displayed a factor loading of.39 and was eliminated from the scale. The resulting one-factor, 16-item scale explained 34% of the total variance with reliability of .86.
Attribute Score Construction and Validity Analyses
As indicated previously, content validity for the item sets was addressed by grounding the questionnaire development in an earlier survey instrument and in the development of the form with guidance from an expert panel. Construct validity was established in part by the factor analyses and in part through an additional set of correlational analyses described separately for each item set below. To conduct these analyses, we first computed for each respondent a score for each attribute by summing over the items retained for each attribute. Because the feature demand item set was two-dimensional, five attribute scores were generated for each respondent. Because of the multiple comparisons made, only p values less than.01 were considered significant.
Tables 6 and 7 report correlation coefficients among the attribute scores and between the attribute scores and specific other items in the questionnaire that were employed in the construct validation studies.
For the computer-use items, we hypothesized that respondents with higher scores on the computer-use attribute would report greater times of hands-on computer use, more computer training experience, and greater self-reported computer sophistication. As shown in Table 6, all three posited correlations were positive and significant.
For the computer knowledge items (see Table 3), each of the 18 items had been placed a priori into categories seen as “easy,” “intermediate,” or “difficult.” As a test of construct validity it was expected that the mean responses to items in each of these categories would differ. The means (± SEM) were: 2.48 (±.02) for the “easy” items, 2.2 (±.02) for the “intermediate” items, and 1.5 (±.02) for the “difficult” items. By repeated measures analysis of variance, this difference was highly significant (F2,1498 = 2903.8, p <.0001). It was also hypothesized that respondents with greater computer knowledge would display greater levels of self-reported computer use, computer training, and computer sophistication. As shown in Table 6, these correlations are both positive and significant.
For the feature demand items, it was expected that less demanding respondents would spend more time with computers. Results of the correlational analysis, provided in Table 6, revealed a small and nonsignificant correlation.
For the optimism items, we hypothesized that respondents more optimistic about the impact of computers on health care would display greater weekly computer use, computer training, and computer sophistication. As shown in Table 6, small but still significant positive correlations are present.
Correlations among the five attribute scores are reported in Table 7. As hypothesized, a sizable and significant positive correlation is present between computer use and computer knowledge. Other correlations are small, even though some are significant because of the large sample size.
This report has focused on the measurement properties of a survey instrument to assess aspects of physicians' use, knowledge, and beliefs about computers in health care. This work differs in several ways from most prior studies of physicians' attitudes towards computers in medical care. First, the sample size of 771 is larger than that of prior works. Second, our work was guided by the earlier study of Teach and Shortliffe52 with defined a priori attributes and item sets hypothesized to assess each attribute. Third, the study distinguishes measurement issues, reported here, from demonstration issues to be reported in a separate report later.
Computer Use Attribute
Of the four attributes evaluated, the factor structure of the items addressing computer use was the least clear. The one-factor solution covered seven aspects of computer use; however, only one item directly relating to clinical computing (obtaining diagnostic/therapeutic advice) was retained. The remaining clinical items (documenting patient information, accessing clinical data, scheduling patient appointments) exhibited low loadings on the single factor and were excluded. A two-factor solution (not included) for this attribute created two four-item factors readily interpretable as “academic computing” and “clinical computing”; however, the two remaining items loaded on both factors. Also, while the four items loading on the “academic” factor in a two-factor solution would have resulted in a scale with acceptable reliability, the four items loading on the “clinical” factor exhibited a reliability level too low to be useful for research. Therefore, we rejected a two-factor solution.
To the extent that these academic physicians have a single measured value of “computer use,” this use supports academic rather than clinical responsibilities; there is only weak evidence here to support “clinical computer use” as a construct. This may be attributable to the fact that end-user tools supporting academic activity have been widely available for two decades, whereas the analogous end-user tools for clinical computing are relatively new at the institutions included in this study. Item sets assessing computer use should be revalidated in the near future to see whether the development and use of clinical computing applications will change this result.
Computer Knowledge Attribute
This scale measures one factor with high reliability and with all items retained. The Teach and Shortliffe study measured knowledge of computing concepts in a similar way with 22 items but did not conduct factor analysis of responses. Although our scale is based on theirs, we extended their work by generating subsets of items purported to be of differing levels of difficulty. The finding that items hypothesized to be more difficult generated lower mean scores adds substantially to the evidence supporting construct validity of this scale.
It is important to emphasize that this item set measures perceived knowledge, rather than actual knowledge as might be determined by a test administered under controlled conditions. We felt, as apparently did Teach and Shortliffe, that measuring perceived knowledge was a more practical strategy both because testing of actual knowledge may have been resented by respondents and because such testing could not have been administered under controlled or proctored conditions.
Feature Demand Attribute
This study confirmed the findings of Teach and Shortliffe that physicians' demand for functionality of computer systems is multidimensional. Examining the “demand” construct, they discovered two factors—demand for performance and demand for system accuracy. Our data differentiated two subscales that relate closely to those factors: the “sophisticated features” subscale taps a physician's belief that systems must provide high level functionality, and the “usability” subscale taps a belief that systems must be user-friendly, ergonomic, and convenient. The differentiation of these subscales seems intuitive, as physicians who feel strongly about one dimension could feel very differently about the other. Support for validity of these subscales derives primarily from the development process and the factor analysis. An anticipated negative correlation between hours of hands-on computer use and feature demand was not found in the data.
Several studies have undertaken factor analyses of the attitudes of physicians towards the effect and application of computers to medical care. Similar to our one-factor solution, Startsman and Robinson54 described a factor of “possible benefit of the application of computers to the problems of hospitals,” and Melhorn et al.53, using an almost identical instrument, uncovered a single factor of “attitudes toward specific uses and scientific applications of the computer.” The Teach and Shortliffe52 paper, referring to computerbased consultation systems, described as separate factors the effects of computers on individual practitioners, medical practice in general, and health manpower needs. Anderson et al.5 8 9 discovered five factors, as listed in Table 1. Our data, based on a large sample, offer strong support for a more parsimonious one-factor solution and the consequent greater reliability it provides for attribute measurement. The correlations between optimism scores and computer use, training, and sophistication, although statistically significant, are smaller than expected. This may be because the clinical orientation of the item set differs from the primarily academic computer use of the respondents.
This study has three important limitations that merit discussion and further research. These relate to the survey sample itself, the response rate to the survey, and the preliminary nature of the validity studies.
The survey sample consisted of academic physicians in four departments that were selected to represent the range of medical practice. Sampling entire departments was a deliberate strategy to maximize return rate, because a small number of departmental chairs could then be asked to promote the survey. The five participating institutions included those where the investigators were themselves located or had close colleagues who agreed to administer the survey. Methodologically, the five institutions comprised a convenience sample. The sample is therefore not completely representative of all academic physicians, and academic physicians are not, themselves, representative of all physicians or other care providers. Differences in discipline, type of responsibility, and practice volume will affect attitudes. For these reasons, the results of the study could not be generalized, even if all physicians in the sample had returned the survey. Researchers who wish to apply this instrument to other populations will need to establish reliability and validity for those populations.
The response rate of 52% raises the additional question of whether the responding group is representative of the sample surveyed. Bias in survey research caused by nonresponse has been extensively studied. As illustrated in examples provided by Kish,61 such biases typically disappear as response rates approach 80%. So while the researcher can generally be confident with an 80% response, a survey with a lower response rate is at risk. Two strategies are possible to minimize this risk. One is the use of extensive, but expensive, methods advocated by Dillman62 to maximize survey returns; the second is an a posteriori approach of studying a relatively small number of non-respondents to see whether they differ from respondents on specific characteristics. Although limitations imposed by sampling and nonresponse are of less concern in this study, which explores the measurement properties of an instrument, than they would be in a study whose purpose was to estimate the mean values of various parameters in a sample, future studies designed to provide a more complete validation should use one of these methods.
Finally, the validity studies conducted and reported in this paper are themselves preliminary in nature. Based on these findings, other investigators can employ this instrument with substantial confidence about the reliability of the scales but with less confidence that the scales measure what the investigators claim. Further validity studies are necessary to complement the initial content validity and construct validity investigations reported here. For example, criterion-related validity studies might administer to a sample of subjects our instrument along with a proctored test of computer knowledge in which the respondents must answer questions. This study would explore how well the self-reported computer knowledge, measured by our instrument, estimates “gold standard” computer knowledge as measured by an actual test. Another type of validity study, a construct validity study, would compare the responses of groups of physicians who, on theoretic grounds, would be expected to differ markedly in their responses. This would be done, for example, by administering the survey to graduates of medical informatics training programs and comparing their responses to those in a general sample.
The authors thank Robert Carlson, Mark Musen, Ted Shortliffe, and Jeremy Wyatt for their support and assistance in developing the instrument that is the focus of this study. They also thank Arthur Elstein, Michael Ravitch, and Paul Tang for their assistance in survey data collections.