4-Year Health Outcomes for Elderly, Poor, Chronically Ill Patients in HMO and Fee-for-Service Systems... [Fulltext, (c) AMA 1996

Oct 2 JAMA. 1996;276:1039-1047

Differences in 4-Year Health Outcomes for Elderly and Poor, Chronically Ill Patients Treated in HMO and Fee-for-Service Systems

Results From the Medical Outcomes Study

John E. Ware, Jr, PhD; Martha S. Bayliss, MSc; William H. Rogers, PhD; Mark Kosinski, MA; Alvin R. Tarlov, MD

Objective.--To compare physical and mental health outcomes of chronically ill adults, including elderly and poor subgroups, treated in health maintenance organization (HMO) and fee-for-service (FFS) systems.

Study Design.--A 4-year observational study of 2235 patients (18 to 97 years of age) with hypertension, non-insulin-dependent diabetes mellitus (NIDDM), recent acute myocardial infarction, congestive heart failure, and depressive disorder sampled from HMO and FFS systems in 1986 and followed up through 1990. Those aged 65 years and older covered under Medicare and low-income patients (200% of poverty) were analyzed separately.

Setting and Participants.--Offices of physicians practicing family medicine, internal medicine, endocrinology, cardiology, and psychiatry, in HMO and FFS systems of care. Types of practices included both prepaid group (72% of patients) and independent practice association (28%) types of HMOs, large multispecialty groups, and solo or small, single-specialty practices in Boston, Mass, Chicago, Ill, and Los Angeles, Calif.

Outcome Measures.--Differences between initial and 4-year follow-up scores of summary physical and mental health scales from the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) for all patients and practice settings.

Results.--On average, physical health declined and mental health remained stable during the 4-year follow-up period, with physical declines larger for the elderly than for the nonelderly (P<.001). In comparisons between HMO and FFS systems, physical and mental health outcomes did not differ for the average patient; however, they did differ for subgroups of the population differing in age and poverty status. For elderly patients (those aged 65 years and older) treated under Medicare, declines in physical health were more common in HMOs than in FFS plans (54% vs 28%; P<.001). In 1 site, mental health outcomes were better (P<.05) for elderly patients in HMOs relative to FFS but not in 2 other sites. For patients differing in poverty status, opposite patterns of physical health (P<.05) and for mental health (P<.001) outcomes were observed across systems; outcomes favored FFS over HMOs for the poverty group and favored HMOs over FFS for the nonpoverty group.

Conclusions.--During the study period, elderly and poor chronically ill patients had worse physical health outcomes in HMOs than in FFS systems; mental health outcomes varied by study site and patient characteristics. Current health care plans should carefully monitor the health outcomes of these vulnerable subgroups.

JAMA. 1996;276:1039-1047

ENROLLMENTS in health maintenance organizations (HMOs) have increased nearly 10-fold since 1976, and in some regions of the country, half of privately insured Americans are enrolled in HMOs.[ref. 1] Policies at the state and federal levels seek to affect a similar shift for those who are publicly insured, including both Medicare and Medicaid. Congress has signed legislation that will give Medicare patients strong financial incentives to enroll in managed care plans. Yet, as documented in a recent literature analysis,[ref. 2] little is known about health outcomes in HMOs for the elderly and the poor, who have historically tended to favor fee-for-service (FFS) over HMO systems.

The Medical Outcomes Study (MOS) was fielded to compare 4-year health outcomes for chronically ill patients treated in well-established HMOs and FFS plans serving the same "medical marketplaces" in 3 cities.[ref. 3] To increase the generalizability of results, adults with 4 physical conditions (hypertension, non-insulin-dependent diabetes mellitus [NIDDM], recent acute myocardial infarction, and congestive heart failure) and 1 mental condition (depressive disorder) were followed. Sampling patients with the same diagnoses across systems of care and measuring them with the same methods allowed more valid comparisons of outcomes across plans. To better address policy issues, the MOS oversampled the elderly and the poor. Focusing on chronically ill patients and oversampling of the elderly and poor increased the likelihood of detecting differences in health outcomes because these subgroups account for a disproportionate share of health care expenditures and are, therefore, prime targets of cost containment.

We report here the results of comparing changes in physical and mental health status between FFS and HMO systems, measured over a 4-year period. In contrast to previous MOS reports of outcomes for the average patient, we focus on outcomes for policy-relevant subgroups--including patients aged 65 years and older covered by Medicare and those near and below the poverty line. Further, results are reported for patients across all of the conditions sampled in the MOS and not just for patients with hypertension and NIDDM[ref. 4] and mental disorders.[ref. 5] [ref. 6]

METHODS

The MOS was an observational study of variations in practice styles and of outcomes for chronically ill adults treated in staff-model and independent practice HMOs vs FFS care in large multispecialty groups, small, single-specialty groups, and solo practices serving the same areas. Details of the MOS design, including site selection, sampling, clinician and patient recruitment, and data collection methods are documented elsewhere.[3,4,5,6] [ref. 7] [ref. 8] [ref. 9] [ref. 10] [ref. 11] [ref. 12] To briefly recap the study design, MOS sites included Boston, Mass, Chicago, Ill, and Los Angeles, Calif, which represent 3 of the 4 US census regions. When sampling began in 1986 and 1987, these cities included well-developed HMO and FFS plans, including 2 of the country's largest HMOs employing salaried physicians and 2 of the largest independent practice association (IPA) networks. In each city, 5 or 6 practice sites were sampled from each group practice HMO. The physician sample included 206 general internists, 87 family practitioners, 42 cardiologists, 27 endocrinologists, and 65 psychiatrists. In HMOs, patients treated by 8 nurse practitioners were also sampled. In addition, patients with a depressive disorder were sampled from the practices of 59 clinical psychologists and 9 social workers. Clinicians averaged 39.6 years of age; 22% were female, and 29% were international medical graduates.

Patient Sampling and Characteristics

Patients followed up longitudinally were selected from 28,257 adults who visited an MOS site in 1986; 71.6% agreed to participate. In 18,794 (92.9%) of the visits, a standardized screening form was completed both by the MOS clinician and the patient. Using criteria documented elsewhere,[ref. 3] clinicians identified patients with hypertension, NIDDM, myocardial infarction within the past 6 months, and congestive heart failure. Patients with depressive disorder were identified independently in a 2-stage screen, which included a patient-completed form and a computer-assisted diagnostic interview by telephone[ref. 3]; 80% of those contacted completed this screening process.

Patients were selected for follow-up on the basis of diagnosis and participation in baseline data collection, as documented in detail elsewhere.[ref. 5,7] Inclusion of patients with more than 1 of the 5 conditions, with or without other comorbidities, allowed for a more generalizable study. Of the 3589 eligible patients, 2708 (75.5%) completed a baseline assessment. We randomly selected 2235 of these for follow-up, by chronic condition and severity of their disease. A patient sample of this size was sufficient to detect clinically and socially relevant differences in health outcomes, defined as an average difference of 2 points or larger on a scale of 0 to 100,[ref. 3] in a comparison between HMO and FFS systems. Specifically, the statistical power was greater than 80%, with alpha at the .05 level for a 2-tailed test.

Patients ranged from 18 to 97 years of age, with a mean just under 58 years. At baseline, 36.8% were 65 years of age or older; all but 1 reported being covered by Medicare. (An additional 144 patients aged into this group during the 4-year follow-up.) A slight majority (54%) were female. About 22% were at or below 200% of the poverty line; 16% of those reported being covered by Medicaid. Three of 10 eligible for Medicare were also in the poverty group. Three of 4 had completed at least a 12th grade education; about 1 in 5 was nonwhite.

Patients sampled had the following diagnoses: hypertension (n=1318), NIDDM (n=441), congestive heart failure (n=215), recent acute myocardial infarction (n=104), and depressive disorder (n=444). (These numbers add to more than 2235 because some patients had more than one condition.)[ref. 7,9] As in previous MOS analyses,[ref. 8] FFS patients followed up in this study were significantly older (41.9 vs 32.9 years on average) than HMO patients, were more likely to be female (62.8% vs 57.8%), and were more likely to be in the poverty group (25.4% vs 18.1%). The FFS patients followed were also more likely to have congestive heart failure (11.8% vs 7.3%) and to have had a recent myocardial infarction (8.9% vs 3.4%). As documented in detail elsewhere (MOS unpublished data; see acknowledgment footnote at the end of this article for availability of all MOS unpublished data), 99% of patients followed in both FFS and HMO systems had 1 or more comorbid conditions; the most prevalent conditions were back pain/sciatica (39% and 37% in FFS and HMO systems, respectively), musculoskeletal complaints (24% and 22%), dermatitis (17% in each), and varicosities (15% and 14%).

Longitudinal Data Collection

After screening in the physician's office and enrollment by telephone interview, each patient was sent a baseline health survey by mail.[ref. 10] The baseline survey was completed, on average, 4 months after the patient's screening visit with an MOS clinician. Four-year follow-up data were obtained for 1574 of the 2235 patients (70.4% of the longitudinal cohort). Patients were lost to follow-up for a variety of reasons including refusals and failure to contact (n=661; 29.6%); 137 (6.1%) who died during follow-up were included in the analysis. Analysis of initial health status for those lost to follow-up for reasons other than death revealed no differences and loss to follow-up was equally likely in HMO and FFS systems. However, younger and poverty-stricken patients were more likely to be lost from both HMO and FFS systems. All analyses of outcomes adjusted for age, poverty status, and other variables to take into account this potential source of bias (see "Statistical Analysis").

Health Status Measures

Summary physical and mental health scales constructed from the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) were analyzed. These summary measures capture 82% of the reliable variance in the 8 SF-36 health scores estimated using the internal-consistency reliability method.[ref. 13] [ref. 14] [ref. 15] The construction of summary measures, score reliability and validity, and normative and other interpretation guidelines are documented elsewhere.[ref. 13.14]

Changes in health were estimated in 2 ways. First, baseline scores were subtracted from 4-year follow-up scores, with deaths assigned a follow-up physical health score of 0 (Table 1). Although these average change scores have the advantage of reflecting the magnitude of change in the metric of the scales, they mask the proportion of patients with follow-up scores that differed from those at baseline. Therefore, individual patients also were classified into 3 change categories: (1) those whose follow-up score did not change more than would be expected by chance ("same" group); (2) those who improved more than would be expected ("better" group); and (3) those whose score declined more than would be expected and those who died ("worse" group) (Table 1). This latter method has the advantage of combining health status and mortality without making any assumption about the "scale value" of death. Unlikely to be due to measurement error, changes large enough to be labeled better or worse also have been shown to be relevant in terms of a wide range of clinical and social criteria.[ref. 13]

Estimates of health outcomes for survivors only were substantially biased because deaths were more common among those with congestive heart failure, aged 65 years and older, and under FFS care; deaths were less likely for the clinically depressed group. Differences in survival rates between FFS and HMO systems were insignificant after adjustment for baseline patient characteristics. Thus, alternative methods of coding deaths[ref. 16] in estimating outcomes did not affect comparisons between FFS and HMO systems (MOS unpublished data).

Statistical Analysis

The goal of the analysis was to compare HMO and FFS systems of care in terms of average changes in health status and in terms of the percentages of patients who were better, the same, or worse at follow-up. These outcomes were estimated for all patients, and separately for subgroups differing in age, poverty status, and initial health. Multivariate statistical methods were used to adjust baseline scores so that the HMO and FFS groups would begin as equal as possible in terms of demographic and socioeconomic characteristics, study site, chronic conditions, disease severity, comorbid conditions, initial health status, and other design variables.

Independent regression models were estimated for physical and mental health summary measures, and F tests of significance determined whether adjusted change scores differed, on average, across HMO and FFS systems. To make sure that the summary measures did not miss a difference concentrated in 1 of the 8 scales, all comparisons between FFS and HMO systems also were replicated for each of the 8 SF-36 scales. Because the summary measures captured all significant differences, results of their analyses are reported here. Results for the 8 SF-36 scales are documented elsewhere (MOS unpublished data).

Multinominal (polytomous) logistic regression[ref. 17] methods were used to compare categorical changes (better, same, worse) in physical and mental health across HMO and FFS systems for the total sample and for the subgroups. Adjusted percentages for change categories were generated with statistical adjustments for the same baseline characteristics used in linear models (Table 2). The chi² tests of significance were computed to determine whether the percentages across change categories differed between HMO and FFS systems of care.

Comparisons of outcomes across systems reported here combine results for IPA "network" and staff-model HMOs. As in previous MOS analyses,[ref. 4] there were no significant differences in outcomes for those in IPAs and staff-model HMOs in any of the analyses performed and there were no consistent trends suggesting a difference between IPAs and staff-model HMOs. However, because only 28% of prepaid patients were sampled from IPAs, the MOS did not have enough statistical power to meaningfully compare outcomes across types of HMOs.

To facilitate interpretation, regression models were used to estimate adjusted outcomes for the total sample and for each subgroup in comparing outcomes between FFS and HMO systems. Formal statistical tests for interactions were performed to determine whether conclusions about differences between systems were the same across subgroups differing in age (Medicare), poverty status, Medicaid coverage, and initial health. To test for differences in outcomes for groups in better or worse initial health status, patients were stratified using baseline physical and mental health measures, both for linear and logistic regression models. Thirds of the sample were identified based on whether they were functioning (physically or mentally) higher, lower, or as would be expected at baseline, given their age and medical condition (Table 2).

In keeping with the logic of an intention-to-treat analysis, patients were analyzed according to the system from which they were sampled. In support of this decision, the great majority of patients had been in their system 4 years or more at the time of sampling and most who switched did not do so for another 2 years. Thus, more than two thirds of those who switched systems during the 4-year follow-up had been in the type of system they were sampled from for 6 or more years before switching. However, because MOS patients were more likely to switch from an HMO than from an FFS plan (20% vs 15%; P<.01), estimates of outcomes could have been biased. This potential source of bias was evaluated by comparing rates of switching within elderly and poverty subgroups along with average outcomes for those who did and did not switch. As documented elsewhere (MOS unpublished data), the relative probability of switching from an HMO observed within the elderly and poverty subgroups was comparable to that for the total sample. Further, baseline scores and average changes in physical and mental health did not differ significantly for those who did and did not switch plans within either subgroup (MOS unpublished data). Thus, conclusions about system differences in health outcomes are not likely to have been biased by the intention-to-treat method of analysis used in this study.

To evaluate whether differences in rates of loss to follow-up were a source of bias in comparisons of outcomes between systems, these rates were compared for the total sample and separately for the elderly and poverty subgroups. As documented in detail elsewhere (MOS unpublished data), follow-up rates did not differ between the 2 system cohorts for the total sample (71% vs 70% for FFS and HMO, respectively), among the elderly (both 74%), or for those in poverty (62% vs 60%). Baseline physical health scores for those followed up and lost to follow-up did not differ between FFS and HMO cohorts in analyses of the total sample or for elderly or poverty subgroups. To determine whether those lost and followed for health status outcomes had equal survival probabilities, survival was monitored for all study participants for 7 years after baseline. Survival probabilities did not differ for those followed up and those lost to follow-up. As documented in detail elsewhere (MOS unpublished data), mental health scores for those lost to follow-up were significantly (P<.001) lower at baseline for both FFS and HMO cohorts. The same pattern was observed for elderly and poverty subgroups, with a significant difference favoring FFS over HMO for the poverty group (P<.05) (MOS unpublished data). However, as documented in the tables cited in the "Results," adjusted physical and mental health scores for the follow-up samples analyzed here did not differ at baseline in comparisons between FFS and HMO cohorts within the total follow-up sample, the elderly subgroup, or the poverty subgroup.

To test whether differences in patient outcomes between FFS and HMO systems could be explained by the specialty of their regular physicians, these differences were also estimated with statistical adjustment for physician specialties. Estimates of outcomes for each system were equivalent with and without adjustment for specialty and are reported here without adjustment.

To facilitate interpretation, all tables of results include 95% confidence intervals around average change scores and all differences associated with a chance probability of .05 or less were considered statistically significant. Significance tests were not adjusted for multiple comparisons.

We hypothesized that the MOS sample would score below 50, the norm for the general population, on both measures at baseline, and they did. Because there are good arguments for hypothesizing better or worse outcomes across HMO and FFS systems over the 4-year follow-up period, we used 2-tailed tests of significance throughout.

RESULTS

Adjusted physical and mental health scores were virtually identical at baseline for patients sampled from HMO and FFS systems. In relation to published norms for the US general population,[ref. 13] MOS patients scored at the 24th and 35th percentiles for physical and mental health, respectively, indicating substantially more physical impairment and emotional distress than experienced by the great majority of adults. During the 4-year follow-up, average changes in physical and mental health were indistinguishable between HMO and FFS systems. Physical health scores declined about 3 points in both systems, lowering the average patient to the 19th percentile at follow-up. Mental health improved slightly in both systems, raising the average to about the 38th percentile.

The MOS had sufficient statistical power to detect differences in health outcomes as small as 1 to 2 points between HMO and FFS systems of care. According to published interpretation guidelines for the SF-36 Health Survey,[ref. 13] differences of this amount or smaller are rarely clinically or socially relevant. Thus, there is a basis for confidence that an important average difference in health outcomes between HMO and FFS systems was not missed.

Analyses of change scores categorized as better, same, or worse confirmed these results for physical and mental health for the average patient. However, the categorical analyses called attention to substantial variation in outcomes. Physical health scores at follow-up differed (from those at baseline) for 45% of patients; about 30% declined and 15% improved, more than would be expected due to measurement error. The reverse pattern--improvement more often than decline--was observed for mental health scores (Table 3).

Variations in Outcomes for Elderly and Poverty Groups

The average adjusted physical decline was greater for elderly than nonelderly patients (Delta=-5.8 vs -1.9; P<.001); 36% and 26% of elderly and nonelderly patients, respectively, scored worse at follow-up than at baseline (P<.001) (Table 3). Elderly patients scored higher in mental health than nonelderly at baseline (P<.001); nonelderly patients improved significantly over time while the elderly did not.

Both poverty and nonpoverty groups declined in physical health (Delta=-3.6 and -2.9, respectively), which are not significantly different amounts. Mental health improved significantly for nonpoverty patients but did not improve for those in the poverty group.

Differences in Outcomes by System: Elderly and Nonelderly

Although adjusted baseline scores were equivalent for elderly and nonelderly patients in comparisons between HMO and FFS systems , changes in physical and mental health scores over time for the elderly in HMO and FFS plans were significantly different from those for the nonelderly (F=2.1, P<.05, and chi²=35.6, P<.001 for physical health; F=1.3, P>.05, and chi²=25.9, P<.01 for mental health) (Table 4). Physical health outcomes were, on average, more favorable for nonelderly patients in HMOs, while physical health outcomes were more favorable for elderly patients in FFS.

Although we could say with statistical confidence that the patterns of average change scores were different across HMO and FFS systems for elderly and nonelderly patients, only pairwise comparisons between categories of changes were significant for the elderly (Table 4). The analysis of change categories also revealed that physical health was much less stable over time for elderly patients in HMOs compared to those in FFS (37% vs 63%, respectively, stayed the same; chi²=19.2, P<.001). The elderly treated in HMOs were nearly twice as likely to decline in physical health over time (54% vs 28%; P<.001) (Table 4). The difference in physical health outcomes favoring FFS over HMOs was statistically significant for elderly patients regardless of their initial health (MOS unpublished data). Physical health outcomes favoring FFS over HMOs for the elderly were also apparent in all 3 study sites (MOS unpublished data).

Average changes in mental health for elderly and nonelderly patients did not favor 1 system over the other (P>.05). However, analyses of mental health change categories for elderly patients favored HMOs over FFS; the elderly were twice as likely to improve in an HMO (26% vs 13% for FFS; chi²=7.1, P<.03). This result was due entirely to the better performance of HMOs in 1 study site. A formal test for a statistical interaction between plan and site revealed that mental health outcomes in HMOs differed significantly across the three sites (F=2.44, P<.01).

Differences in Outcomes of Poverty and Nonpoverty Groups by System

As shown in Table 5, comparisons of physical and mental health outcomes across HMO and FFS systems produced different patterns of results for poverty and nonpoverty groups (F=2.7, P<.01, and chi²= 24.2, P<.02 for physical health; F=4.2, P<.001, and chi²=23.0, P<.03 for mental health). Only the pairwise comparisons between HMO and FFS systems for poor patients who were in ill health at baseline were significant . Those in HMOs experienced an average decline of -2.0 in physical health; those in FFS improved 5.4 points, on average (P<.001). Comparison of categorical changes for poor patients in initial ill health also favored FFS plans, with 57% scoring better at follow-up in FFS versus 22% in HMOs (chi²=10.2, P<.006).

To determine whether Medicaid status accounted for differences observed in outcomes for the poor, HMO and FFS systems were compared among Medicaid patients (n=216). Medicaid patients in HMOs did not differ from Medicaid patients in FFS plans in health status at baseline or in health outcomes, as documented elsewhere (MOS unpublished data), and there were no noteworthy trends. However, because of the relatively small sample of Medicaid patients, the MOS did not have sufficient precision to rule out an important difference among Medicaid patients favoring either system.

COMMENT

Limitations

Limitations of the MOS have been discussed extensively,[ref. 3-9,11] but some limitations and potential sources of bias warrant special emphasis here. Analyses of 4-year health outcomes have been a long time coming because of the many methodological challenges faced by the MOS. Do results apply to current health care? If cost-containment pressures have increased since MOS data collection ended in the early 1990s, high-risk patient groups may be at an even greater risk today. If information systems for monitoring and improving the quality of care are better now and if health promotion and disease prevention initiatives are more successful in HMOs, MOS results may not apply to current health care.

The MOS was not a randomized trial; such trials are rare in health care policy research.[ref. 18] [ref. 19] Although quasi-experimental methods[ref. 20] achieved equivalent average baseline health status scores for nearly all pairwise comparisons between FFS and HMO systems of care, unmeasured risk factors could have biased estimates of differences in outcomes. Further, differences in outcomes that occurred "on the watch" of the FFS and HMO systems are not necessarily their responsibility. Structural and process differences in care beyond their control, such as arrangements for home health and long-term care, may account in part for MOS findings.

The MOS monitored outcomes in only 3 large urban cities; results should not be generalized to HMO or FFS plans in other cities or rural areas. Although the MOS represented 5 chronic conditions and many patients had comorbid conditions such as angina, back pain/sciatica, lung disease, and osteoarthritis, these patients do not necessarily represent other conditions or results of care provided by other medical specialties. All patients had a regular source of care. All patients were being actively treated when the MOS began, and only three fourths who agreed to participate were followed up longitudinally.

Two potential sources of bias in estimates of health outcomes--plan switching and loss to follow-up--were systematically studied. Patient loss to follow-up is an unlikely source of bias in comparisons of outcomes between systems because adjusted physical health scores at baseline did not differ between FFS and HMO cohorts followed within the total sample or for elderly or poverty subgroups (Tables 3 through 5). Further, all study participants were followed up through 1993 to determine their survival.[ref. 4] Seven years after baseline, those included and not included in this 4-year analysis were equally likely to have survived (MOS unpublished data).

Two of 10 HMO patients switched to an FFS plan by the end of the 4-year follow-up. Comparisons between systems could have been biased had these rates differed within elderly or poverty subgroups or had switchers experienced different outcomes than nonswitchers. However, rates of switching did not differ for elderly or poverty subgroups, and system differences in physical and mental health outcomes were indistinguishable for those who stayed in the same system, in comparison with those who switched (MOS unpublished data). Thus, it is unlikely that conclusions about system differences in outcomes were biased by switching. Because more than two thirds of patients who switched systems during the follow-up period had been in their system at least 6 years before switching, we adhered to the logic of intent to treat and analyzed patients according to the systems from which they were sampled. The finding that MOS patients were significantly more likely to switch from an HMO than to an HMO (20% vs 15%; chi²=7.3, P<.01) is surprising given that most MOS patients were aged 60 years or older, all were chronically ill, and financial incentives were beginning to favor HMOs over FFS during the MOS. The dynamics of switching and their implications for monitoring current health outcomes warrant further study.

Although the MOS achieved the desired statistical precision for overall HMO vs FFS comparisons, confidence intervals were too large for meaningful interpretation of some comparisons that yielded insignificant differences in outcomes. Examples include comparisons between IPAs, the fastest growing form of HMO, and staff-model HMOs; Medicaid and non-Medicaid groups could not be compared with precision, and comparisons between plans within sites were relatively imprecise, although the difference in 1 site was large enough to reach significance. (This difference would not have been significant with an adjustment for multiple comparisons.) For many comparisons, the MOS cannot rule out large differences in outcomes in either direction.

Interpretation of Results

The success of HMOs in reducing health care utilization has been documented in numerous studies.[ref. 2,19] With few exceptions, the best-designed and most recent studies show that HMOs achieve lower hospital admission rates, shorter hospital stays, rely on fewer subspecialists, and make less use of expensive technologies. Results from FFS-HMO comparisons of utilization rates in the MOS[ref. 6,11] are consistent with previous studies, and extend that evidence to the population of adults with chronic conditions, for whom health outcomes are reported here. Rarely have the same studies addressed health outcomes.[ref. 2,18] [ref. 21] [ref. 22] [ref. 23]

Results from the MOS lead us to several conclusions about health outcomes for the chronically ill adults who were treated in HMO and FFS systems of care during the years of the MOS. First, similarities in health outcomes between systems previously reported[ref. 4] for the average MOS patient with hypertension or NIDDM do not appear to hold for elderly patients covered by Medicare or for those in poverty. Elderly patients sampled from an HMO were more likely (than those sampled from an FFS plan) to have a poor physical health outcome in all 3 sites studied. Second, patients in the poverty group and particularly those most physically limited appear to be at a greater risk of a decline in health in an HMO than similar patients in an FFS plan. Finally, MOS results suggest the need for caution in generalizing conclusions about outcomes across study sites. Mental health outcomes for Medicare patients differed significantly across HMOs, suggesting that their performance relative to FFS plans may depend on site.

Previous studies[ref. 21-23] that found no differences in health outcomes between FFS and HMO plans followed patients for only 1 year. Were these studies too brief to draw conclusions about health outcomes? Supporting this explanation, significant differences in health outcomes observed between the FFS and HMO systems after 4 years of follow-up in the MOS were not statistically significant after 1 year. The importance of a longer follow-up is underscored by the observation that the 4-year statistical models reported here explained twice as much of the variance in patient outcomes as did the same models in analyses of 1- and 2-year outcomes (MOS unpublished data). Thus, follow-up periods longer than 1 year may be required to detect differences in outcomes for groups differing in chronic condition, age, income, and across different health care systems.

Future Outcomes Studies

Our results raise many questions that the MOS was not designed to address. What are the "clinical" correlates of changes in patient-assessed functional health and well-being? What can health care plans do to improve outcomes, and what specific treatments have been linked to physical and mental health outcomes as measured by the SF-36 Health Survey? Adverse medical events were too rare for meaningful comparison between plans in the MOS and were monitored only during the first 2 years of follow-up.[ref. 4] However, these events were significantly related to health outcomes, as hypothesized. Declines in SF-36 physical health scores were significantly more likely among patients who experienced a new myocardial infarction, weight loss sufficient to warrant a physician visit, and chest pain sufficient to require hospitalization (MOS unpublished data). These preliminary MOS results are consistent with published studies that have linked SF-36 health scores to disease severity and to treatment response, including severity of soft-tissue injuries[ref. 24] and changes in hematocrit among chronic dialysis patients.[ref. 25] The SF-36 studies of outcomes have also linked treatment to outcomes including drug treatment for depression among the elderly,[ref. 26] total knee replacement,[ref. 27] [ref. 28] heart valve replacement surgery,[ref. 29] use of aerosol inhalers in treating asthma,[ref. 30] intermittent vs maintenance drug therapy for duodenal ulcer,[ref. 31] elective hip arthroplasty,[ref. 32] elective coronary revascularization,[ref. 33] and various other elective surgical procedures.[ref. 34] Threedozen such studies using the SF-36 are cited elsewhere.[ref. 35] Identification of the clinical correlates of changes in physical and mental health status warrants high priority in outcomes and effectiveness research.[ref. 36]

Future studies should address whether variations in the quality of care explain differences in outcomes across systems. The MOS patients in HMOs reported fewer financial barriers and better coordination of services in comparisons with equivalent FFS patients.[ref. 12,36] Analyses of primary care quality criteria indicated that those in FFS systems experienced shorter treatment queues and better comprehensiveness and continuity of care and rated the quality of their care more favorably.[ref. 12] [ref. 37] Do such variations in process account for differences in outcomes? Practice-level analyses in progress have linked scores for primary care process indicators,[ref. 12] to 4-year health outcomes, as defined here, supporting this hypothesis. These and other associations warrant further study to determine which practice styles and specific treatments are most likely to improve health outcomes. Because many of the structural and process indicators being relied on to evaluate the quality of current health care have not been shown to predict outcomes, targeted monitoring efforts are required to discern health outcomes.

The MOS has demonstrated the feasibility and usefulness of readily available patient-based assessment tools, such as the SF-36 Health Survey, in monitoring outcomes across diverse patient populations and practice settings. The SF-36 summary measures of physical and mental health reduce the number of comparisons necessary to monitor outcomes while retaining the option of analyzing the 8-scale SF-36 health profile on which they are based. The reporting of results in change categories in terms of better, same, and worse may simplify the reporting of outcomes to diverse audiences and may make results easier for them to understand. More practical data collection and processing systems--under development--and advances in understanding of the specific treatments that improve health scores the most and the clinical and social relevance of those improvements will increase their usefulness in improving patient outcomes.[ref. 38]

Policy Implications

The MOS results reported here and previously,[ref. 4], for the average chronically ill patient constitute good news for those who consider HMOs as a solution to rising health care costs. Outcomes were equivalent for the average patient because those who were younger, relatively healthy, and relatively well-off financially did at least as well in HMOs as in the FFS plans. However, our results sound a cautionary note to policymakers who expect overall experience to date with HMOs to generalize to specific subgroups, such as Medicare beneficiaries or the poor. Patients who were elderly and poor were more than twice as likely to decline in health in an HMO than in an FFS plan (68% declined in physical health in an HMO vs 27% for FFS; P<.001) (MOS unpublished data). An implication for future evaluations of changes in health care policies is that high-risk groups, including the elderly and poor who are chronically ill, should be oversampled when outcomes are monitored to achieve the statistical precision necessary to rule out harmful health effects.

Medicaid coverage did not explain the differences in physical or mental health outcomes observed for the poor in MOS comparisons between FFS and HMO systems. Only 1 in 5 poor were covered under Medicaid. Further, when outcomes for MOS patients covered and not covered under Medicaid were compared, there were no significant differences between FFS and HMO plans and there were no noteworthy trends (MOS unpublished data). Poverty status, as opposed to Medicaid beneficiary status, was the better marker of risk of a poor health outcome in an HMO. This is not a new finding. The Health Insurance Experiment also observed that some health outcomes were less favorable over a 5-year follow-up for low-income patients in poor health in 1 HMO compared with equivalent patients under FFS care.[ref. 18]

Final Comment

In this article, the MOS has documented variations in health outcomes for chronically ill patients that cannot be explained in terms of measurement error. For elderly Medicare patients and for poor patients, variations in outcomes during a 4-year period extending through 1990 were linked to FFS and HMO systems of care (the latter were predominantly staff-model HMOs). Other explanatory factors included practice site, suggesting that health outcomes should be monitored on an ongoing basis, by particular HMO and by marketplace. Outcomes did not differ across systems for those covered under Medicaid and could not be explained in terms of the specialty training of physicians. The contrast between results reported here for high-risk patients vs results reported previously for the average patient[ref. 4] underscore the hazard in generalizing about outcomes on the basis of averages. This is why quality improvement initiatives focus on variations rather than only on usual performance.[ref. 38] Patient-based assessments of outcomes are likely to add significantly to the evidence used in informing the public and policymakers regarding which health care plans perform best--not just in terms of price, but in overall quality and effectiveness.

From The Health Institute, New England Medical Center (Drs Ware, Rogers, and Tarlov, Ms Bayliss, and Mr Kosinski), Tufts University School of Medicine (Drs Ware and Tarlov), and Harvard School of Public Health (Drs Ware and Tarlov), Boston, Mass.

Indications in the text of "MOS unpublished data" refer to 16 pages of additional documents that are available at http://www.sf-36.com on the Internet. These data are also available from the National Auxiliary Publications Service, document 05340. Order from NAPS, c/o Microfiche Publications, PO Box 3513, Grand Central Station, New York, NY 10163-3513. Remit in advance, in US funds only, $7.75 for photocopies or $5 for microfiche. Outside the United States and Canada, add postage of $4.50. The postage charge for any microfiche order is $1.50.

Collection of 4-year health outcome data and preparation of this article were supported by grant 91-013 from the Functional Outcomes Program of the Henry J. Kaiser Family Foundation, at The Health Institute, New England Medical Center, Boston, Mass (John E. Ware, Jr, PhD, principal investigator). Design and implementation of the MOS were sponsored by the Robert Wood Johnson Foundation, Princeton, NJ; the Henry J. Kaiser Family Foundation, Menlo Park, Calif; and the Pew Charitable Trusts, Philadelphia, Pa. Previously reported analyses were sponsored by the National Institute on Aging, Bethesda, Md; the Agency for Health Care Policy and Research; and the National Institute of Mental Health, Rockville, Md. Participating plans, professional organizations who assisted in recruitment, and our many colleagues who contributed to the success of the MOS are acknowledged elsewhere.[ref. 3] The authors acknowledge the thorough and constructive suggestions received from Allyson Ross Davies, PhD, Kathleen Lohr, PhD, Edward Perrin, PhD, Dana Safran, ScD, and anonymous JAMA peer reviewers; and gratefully acknowledge the editing and typing assistance of Orna Feldman, Sharon Ployer, Rebecca Voris, and Andrea Molina.

Reprints: John E. Ware, Jr, PhD, The Health Institute, New England Medical Center, Box 345, 750 Washington St, Boston, MA 02111 (e-mail: john.ware@es.nemc.org).

References

1. Group Health Association of America. Patterns in HMO Enrollment. Washington, DC: Group Health Association of America; June 1995. Return to text.

2. Miller RH, Luft HS. Managed care plan performance since 1980: a literature analysis. JAMA. 1994;271:1512-1519. Return to text.

3. Tarlov AR, Ware JE, Greenfield S, Nelson EC, Perrin E, Zubkoff M. The Medical Outcomes Study: an application of methods for monitoring the results of medical care. JAMA. 1989;262:925-930. Return to text.

4. Greenfield S, Rogers W, Mangotich M, Carney MF, Tarlov AR. Outcomes of patients with hypertension and non-insulin-dependent diabetes mellitus treated by different systems and specialties: results from the Medical Outcomes Study. JAMA. 1995;274:1436-1474. Return to text.

5. Wells KB, Hays RD, Burnam MA, Rogers W, Greenfield S, Ware JE. Detection of depressive disorder for patients receiving prepaid or fee-for-service care: results from the Medical Outcomes Study. JAMA. 1989;262:3298-3302. Return to text.

6. Rogers WH, Wells KB, Meredith LS, Sturm R, Burnam A. Outcomes for adult outpatients with depression under prepaid or fee-for-service financing. Arch Gen Psychiatry. 1993;50:517-525. Return to text.

7. Stewart AL, Ware JE, eds. Measuring Functioning and Well-being: The Medical Outcomes Study Approach. Durham, NC: Duke University Press; 1992. Return to text.

8. Kravitz RL, Greenfield S, Rogers WH, et al. Differences in the mix of patients among medical specialties and systems of care: results from the Medical Outcomes Study. JAMA. 1992;267:1617-1623. Return to text.

9. Stewart AL, Greenfield S, Hays RD, et al. Functional status and well-being of patients with chronic conditions: results from the Medical Outcomes Study. JAMA. 1989;262:907-913. Return to text.

10. Berry S. Methods of collecting health data. In: Stewart AL, Ware JE, eds. Measuring Functioning and Well-being: The Medical Outcomes Study Approach. Durham, NC: Duke University Press; 1992:48-64. Return to text.

11. Greenfield S, Nelson EC, Zubkoff M, et al. Variations in resource utilization among medical specialties and systems of care: results from the Medical Outcomes Study. JAMA. 1992;267:1624-1630. Return to text.

12. Safran D, Tarlov AR, Rogers W. Primary care performances in fee-for-service and prepaid health care systems: results from the Medical Outcomes Study. JAMA. 1994;271:1579-1586. Return to text.

13. Ware JE, Kosinski M, Keller SK. SF-36 Physical and Mental Health Summary Scales: A User's Manual. Boston, Mass: The Health Institute, New England Medical Center; 1994. Return to text.

14. Ware JE, Kosinski M, Bayliss MS, McHorney CA, Rogers WH, Raczek A. Comparison of methods for scoring and statistical analysis of SF-36 Health Profiles and Summary Measures: summary of results from the Medical Outcomes Study. Med Care. 1995;33(suppl 4):AS264-AS279. Return to text.

15. McHorney CA, Ware JE, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36), II: psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care. 1993;31:247-263. Return to text.

16. Diehr P, Patrick D, Hedrick S, et al. Including deaths when measuring health status over time. Med Care. 1994;32(suppl 4):AS164-AS172. Return to text.

17. STATA Reference Manual: Release 3.1, Volume 3. 6th ed. College Station, Tex: STATA Corp; 1993: 3-16. Return to text.

18. Ware JE, Brook RH, Rogers WH, et al. Comparison of health outcomes at a health maintenance organization with those of fee-for-service care. Lancet. 1986;1:1017-1022. Return to text.

19. Manning WG, Leibowitz A, Goldberg GA, Rogers WH, Newhouse JP. A controlled trial of the effect of a prepaid group practice on use of services. N Engl J Med. 1984;310:1505-1510. Return to text.

20. Cook TD, Campbell DT. The design and conduct of quasi-experiments and true experiments in field settings. In: Dunnette MD, ed. Handbook of Industrial and Organizational Psychology. Chicago, Ill: Rand McNally College Publishing Co; 1976:223-326. Return to text.

21. Lurie N, Moscovice IS, Finch M, Christianson JB, Popkin MK. Does capitation affect the health of the chronically mentally ill? results from a randomized trial. JAMA. 1992;267:3300-3304. Return to text.

22. Retchin SM, Clement DG, Rossiter LF, Brown B, Brown R, Nelson L. How the elderly fare in HMOs: outcomes from the Medicare competition demonstrations. Health Serv Res. 1992;27:651-669. Return to text.

23. Clement DG, Retchin SM, Brown RS, Stegall MH. Access and outcomes of elderly patients enrolled in managed care. JAMA. 1994;271:1487-1492. Return to text.

24. Beaton DE, Bombardier C, Hogg-Johnson S. Choose your tool: a comparison of the psychometric properties of five generic health status instruments in workers with soft tissue injuries. Qual Life Res. 1994;3:50-56. Return to text.

25. Beusterien KM, Nissenson AR, Port FK, Kelly M, Steinwald B, Ware JE. The effects of recombinant human erythropoietin on functional health and well-being in chronic dialysis patients. J Am Soc Nephrol. 1996;7:1-11. Return to text.

26. Beusterien K, Steinwald B, Ware JE. Usefulness of the SF-36 health survey in measuring health outcomes in the depressed elderly. J Geriatr Psychiatry Neurol. 1996;9:1-9. Return to text.

27. Kantz ME, Harris WJ, Levitsky K, Ware JE, Davies AR. Methods for assessing condition-specific and generic functional status outcomes after total knee replacement. Med Care. 1992;30(suppl 5):MS240-MS252. Return to text.

28. Hawker G, Melfi C, Paul J, Green R, Bombardier C. Comparison of a generic (SF-36) and a disease-specific (WOMAC) instrument in the measurement of outcomes after knee replacement surgery. J Rheumatol. 1995;22:1193-1196. Return to text.

29. Phillips RC, Lansky DJ. Outcomes management in heart valve replacement surgery: early experience. J Heart Valve Dis. 1992;1:42-50. Return to text.

30. Okamoto LJ, Noonan M, Kirchdoerfer LJ, Boyer JG, Kellerman DJ, Saiers JA. Quality of life in patients with severe asthma: baseline health profile and effects of fluticasone propionate aerosol. Ann Allergy Asthma Immunol. 1996;76:1-7. Return to text.

31. Rampal P, Martin C, Marquis P, Ware JE, Bonfils S. A quality of life study in five hundred and eighty-one duodenal ulcer patients. Scand J Gastroenterol. 1994;29(suppl):44-51. Return to text.

32. Stucki G, Liang MH, Phillips C, Katz JN. The Short Form-36 is preferable to the SIP as a generic health status measure in patients undergoing elective total hip arthroplasty. Arthritis Care Res. 1995;8:174-181. Return to text.

33. Krumholz HM, McHorney CA, Clark L, Levesque M, Baim DS, Goldman L. Changes in health after elective percutaneous coronary revascularization: a comparison of generic and specific measures. Med Care. 1996;34:754-759. Return to text.

34. Temple PC, Travis B, Sachs L, Strasser S, Choban P, Flancbaum L. Functioning and well-being of patients before and after elective surgical procedures. J Am Coll Surg. 1995;181:17-25. Return to text.

35. Shiely J-C, Bayliss MS, Keller SD, Tsai C, Ware JE. SF-36 Health Survey Annotated Bibliography: First Edition, 1988-1995. Boston, Mass: The Health Institute, New England Medical Center. In press. Return to text.

36. Roper WL, Winkenwender W, Hackbarth GM, Krakauer H. Effectiveness in health care: an initiative to evaluate and improve medical practice. N Engl J Med. 1988;319:1197-1202. Return to text.

37. Rubin H, Gandek B, Rogers WH, Kosinski M, McHorney C, Ware JE. Patient's ratings of outpatient visits in different practice settings: results from the Medical Outcomes Study. JAMA. 1993;207:835-840. Return to text.

38. Davies AR, Halpern R. Health Care Outcomes: An Introduction. Irving, Tex: Voluntary Hospitals of America Inc; 1993. Return to text.