COSP, The Consumer Operated Services Program Multisite Research Initiative

Rationale for Measures. Since the Well-Being Project a decade ago (Campbell & Schraiber, 1989), it has been clear that consumers can and should be involved in the development of tools and protocols that measure things that are important in their lives. The type of questions you ask, lead to the type of answers you get. Consumer instruments and multi-stakeholder measurement efforts teach us new things. We were pleased that the GFA provides the opportunity to use both consumer and professional tools. The CP could be developed for this project in two ways: (1) advance a CP that includes a combination of subscales from validated instruments, along with new items to designed to capture desired domains, or (2) use of a battery of validated instruments combined with consumer-developed surveys. Developing of a CP that includes a combination of subscales and new items can be a lengthy process, from identifying domains and items, to field-testing, to analysis of psychometric properties. MIMH has, however, recently completed an outcomes protocol for community-based peer support programs through support from the UIC National Research and Training Center. The Peer Outcomes Protocol (POP, 1998) will be validated this summer and its psychometric properties established. The POP was developed through a systematic review of over 30 consumer surveys that produced a pool of 400 items. Items were ranked and clustered into domains through a focus group process called concept-mapping (Trochim et al., 1993). The POP includes the following modules that can be used together or separately. They measure: Basic Demographics, Services, Hospitalization, Employment, Housing and Community Life, Social Support, Quality of Life and Well-Being, Recovery, Empowerment and Personhood, Crime and Violence, and Program Satisfaction. It also includes Subscales from the SF-12 (Ware & Keller, 1996b) and Lehman’s Quality of Life Instrument (Lehman, 1988). We will submit the POP for review by the Methods and CP TF and the Consumer Research and Provider TF and possible adoption to the SC. The Outcome Based Performance Measures (1993) developed by The Accreditation Council on Services for People with Disabilities, the Toolkit for Measuring Psychosocial Rehabilitation Outcomes (1998) and the process evaluation tool of the consumer-run drop-in centers by Mowbray & Tan (1992) should also be evaluated for possible use as well.

Other consumer-developed instruments or multi-stakeholder tools that can be used to measure the outcomes and services identified in the GFA include: (1) housing and supports preference survey ( Ralph & Campbell, 1995); housing and housing services satisfaction (Felton et al., 1993); empowerment ( Rogers et al., 1997; Segel et al., 1995); satisfaction (Dow & Ward., 1995; MHSIP, 1996, Chamberlin et al., 1996); and social inclusion and well-being (Campbell & Schraiber, 1989; Campbell, 1992). A scale to measure the construct of empowerment as defined by consumers was developed at the Center for Psychiatric Rehabilitation at Boston University (Rogers et al., 1997). Factor analysis on the 28-item scale identified 5 underlying dimensions of empowerment: self-efficacy—self-esteem, power—powerlessness, community activism, righteous anger, and optimism—control over the future. Using this scale identifies an empowered person as one who has a sense of self-worth, self efficacy, and power; is optimistic about the ability to exert control over his/her life; and recognizes the importance of the group to effect change while still valuing personal autonomy. Empowerment was related to quality of life and income but not to demographic variables. The authors write that the scale demonstrates internal consistency and some evidence for validity, but further testing is required to establish if it has discriminant validity and sensitive to change. The empowerment scale developed by Segal et al. (1995) through baseline and six-month interviews in four consumer-run self-help agencies includes 3 scales: Personal Empowerment Scale, Organizational Empowerment Scale, and Extra-Organizational Empowerment Scale. These scales showed strong internal consistency and stability and were sensitive to user changes over time. Other consumer instruments also measure important outcomes such as healing and recovery (Dumont, 1995). The Crisis Hostel Project, a consumer-run research demonstration project, developed the healing scale related to hospitalization, empowerment, and satisfaction. This is also being utilized as part of the SAMHSA Managed Care for Vulnerable Populations initiative.

If the SC elects to use a supplemental battery of existing instruments, the following review can provide direction on the availability of appropriate professional scales. Studies of mental health stakeholders (McGuirk et al., 1994) and state departments of mental health (Evensen & Viewig, 1998) are in good agreement on outcome domains. Based on these studies, we also recommend that the following tools be used to measure such concepts and conditions as: (1) Felt Coercion (Gardner et al., 1993); (2) Satisfaction (Attkinson & Greenfield, 1994a; Akkkisson & Greenfield, 1994b; Attkinson & Zwick, 1982); (3) Health Status (Ware, Kosinski & Keller, 1996a, 1996b); (4) Quality of Life, in particular the measurement of social support (Lehman, 1988); (5) Self Esteem (Rosenberg, 1965); (6) Symptom Level--Basis 32 combines symptoms, functioning and quality of life into a 32-item scale (Eisen, Dill, & Grob, 1994); (7) Level of Functioning--the most commonly used assessment has been the GAF, which forms Axis V of DSM IV (American Psychiatric Association, 1994). Although the GAF is short and simple to use, there have been problems with its reliability, and we recommend instead the Multnomah Community Ability Scale, a 17-item instrument targeted at functional assessment for a severely mentally ill population (Barker et al., 1993); (8) Employment--a small number of items utilized in the SAMHSA Managed Care for Vulnerable Populations initiative which measure: earned income, employment status, job characteristics, and volunteer activities could be used; and (9) a measure of cultural competency (Missouri Department of Mental Health , 1998).

Strategies for Data collection, Processing, Control and Storage.

Consumer-level Database. The consumer-level database will contain information on individual socio-demographic and clinical characteristics; service needs; service delivery and utilization ("claims data"); the key outcomes as measured by the CPs as well as specific measures intended to capture special outcomes reflecting features of each site’s program; and consumer satisfaction. In addition, it will include information about services received by the individual.

Project-level Database. The multi-site evaluation will make systematic use of a wide variety of qualitative and quantitative process, implementation, and contextual data about the COS and their community contexts. These data will be maintained in a project-level database (PLD), which will be crucial to effective analysis of these data and maximizing their utility for policy guidance. This type of database has been used to link the process and outcome components of multi-site evaluations (NIAAA, 1992; CMHS, 1996). It also represents one kind of link between qualitative and quantitative research traditions (Dennis, Fetterman & Sechrest, 1994) in assessing project implementation and in tracking fidelity to program models. ROW Sciences has pioneered the development of this technology in the context of multi-site studies (Johnsen, et al., 1997; R.O.W. Sciences, 1998; Sonnefeld, 1996). The proposed PLD is also a device for incorporating characteristics of projects and their environments into multilevel analyses, including hierarchical linear modeling, as detailed below. It can also be used to support quantitative research synthesis (meta-analysis) strategies for integrative reviews of evaluations. In a field such as the evaluation of COS where little formal evaluation exists, the design and development of a project level database can lay an important foundation for the continuing synthesis of subsequent evaluations of such programs. Data on populations, program models and components, implementation problems, local environmental contexts, and evaluation methods will be rated or coded, with checks on the reliably of the coding (Orwin, 1993; Stock, 1993). The goal is to accommodate all of these sources of analytical complexity and potential problems (e.g., comparison group nonequivalence) by recognizing them at the outset and as they arise, and facilitating solutions to these problems within the analysis. As changes occur and new issues arise, each can also be tracked and coded.

Plans for Statistical Analysis. The statistical analyses of outcomes data will take place at three levels: (a) analysis within sites to determine the effect of specific interventions; (b) multi-site analysis which employs a meta-analytic approach; and (c) longitudinal, multi-level analytic approaches which can accommodate variations in program interventions, program implementation, and system characteristics within a single analytic framework. Each of these approaches will be affected both by the designs employed by sites, and the likely problems that will be encountered within such an effort.

Design Features. While the GFA leads to an expectation that sites would propose designs featuring random assignment to traditional services or traditional services plus consumer operated service programs, a subsequent advisory memo softened this requirement. Because of this, we expect that at least some sites will propose an experimental framework, but that others will propose a quasi-experimental framework. In a quasi-experimental framework, because one cannot depend on randomization to render "all else equal," one must adjust for prior differences between consumers in the consumer-operated programs and traditional services. Among these differences may be differences related to self-selection into these types of programs. We will therefore check for specific baseline non-equivalences with simple univariate for each variable, scale, or composite within each key domain, and assess and adjust for overall non-equivalence as part of the multivariate selection modeling described below. In addition to this potential difference among sites, we also anticipate that there may be some differences in the comparison groups that are recommended. We anticipate that some sites may, for example, incorporate a third comparison group of persons who receive only services from COS. Introduction of these variations in the research design from site to site will introduce some complications into both the plans for statistical data analysis, and the clarity with which we can draw certain types of conclusions. Taking appropriate steps to rule out extremely non-equivalent groups should help to resolve this issue.

Data analysis of multi-site data. Our approach to conducting multi-site evaluations is informed by linking process and outcome components of the evaluation through a project level database. That approach permits the cross-site evaluation to absorb and standardize selected aspects of the site-level studies that are conducted locally or at the county or state level, and make use of the results of these studies to enrich the quantitative analyses of the outcomes of the consumer-run programs. The approach is well-suited to the goal of promoting continuous participation of mental health consumer/survivor researchers. Because of its focus on developing appropriate standardized descriptions of key program features and coding those that emerge over the course of the study, the process data collection and analysis procedures can be based on the perspectives of all participants. We are taking this approach in our role in the ongoing CMHS ACCESS demonstration and the CC for SAMHSA’s Managed Care for Vulnerable Populations cooperative agreement.

There are many ways to analyze cross-site data. We favor two general approaches: 1) meta-analysis or quantitative synthesis of site-level effect-size estimates, and 2) multi-level analysis of the aggregated cross-site data. The two approaches are statistically related but practically distinct, and proceed in very different ways. The first proposed strategy involves the synthesis of "best estimates" of program effects on outcomes for each site derived from site-level analyses and the subsequent analysis of variations among these estimates in the same way that an explanatory meta-analysis examines the variation in effect sizes derived from separate published studies. The second strategy makes use of recently developed statistical and analytical tools for multi-level analysis with hierarchical linear modeling (Bryk & Raudenbush, 1992). (Related techniques are also referred to as random regression or mixed regression.) Hierarchical linear modeling techniques have been developed to deal with the kinds of "nested" data generated by multi-site studies. Cross-site data will be nested when individuals are assigned to traditional or consumer-operated service programs, and provide information at baseline and follow-up, but are members of programs within communities. Statistical techniques implemented in software such as HLM, MIXREG, and the SAS Proc Mixed routine assess the intra-class correlation due to the nested data (e.g., individuals nested within intact groups or communities). This correlation is ignored by ordinary linear regression models, leading to inflated estimates. Both meta-analytic and hierarchical linear modeling techniques have been used with recent success in cross-site analyses of multi-site studies (see Orwin et al., 1994a, and Selzer, 1994, for quantitative synthesis and multi-level modeling, respectively.) The details of the statistical models used with the COS evaluation data will of course depend on the characteristics of the design and data collection procedures agreed upon by the SC. The choice of statistical analysis will depend in part on, for example, the extent of agreement that outcomes in each site’s program will be well-measured by the CP. To the extent that there are important variations in outcome measurement across sites, meta-analytic techniques rather than the pooling characteristic of hierarchical techniques will be employed. If the efforts of the CC to encourage consistency in other design features such as follow-up periods, interviewer training and quality control are successful, the pooling and hierarchical analysis may be advisable.

In either the meta-analytic approach or the hierarchical linear model approach, assessments of the non-equivalence of the groups being compared will be necessary to permit statistical adjustments intended to reduce selection bias. In the final achieved sample of participants on whom there are both baseline and follow-up data, non-equivalence between the traditional and COS may arise in two ways: either because of non-random assignment or because of differential attrition from assessment (loss to follow-up). It is important to assess and attempt to adjust for selection bias because substantial non-equivalence arising from either source may result in unfairly negative judgments of the effectiveness of COS. Available techniques for adjusting for selection bias range from traditional ANCOVA, through Heckman’s (1980; 1989) widely used method, and the currently promising propensity-score approach developed by Rubin (1991; 1997; Rosenbaum and Rubin, 1985). Although all available techniques fall short of ensuring the elimination of selection bias except in certain favorable circumstances (Stozenberg & Relles, 1990; Glyn, Laird, & Rubin 1986) we will be able to explore the sensitivity of any estimates of the COS effectiveness to the statistical adjustment methods used.

Statistical power analysis is recommended in any evaluation study, especially in the light of the finding that many evaluations have been underpowered (Lipsey, 1990). But attention to statistical power is particularly important in the analysis of multi-site evaluations, where sites will be implementing different programs, site-specific measures beyond the CP, and study designs. Concerns about power traditionally focus on the need for adequate sample size, but other factors are also important. Some sites may have smaller samples than others, but some may have particularly unbalanced designs, choose special outcome measures with less sensitivity to change, employ less sensitive statistical tests, or be less successful in maintaining program integrity during implementation. Any of these factors could reduce the power of a study to detect statistically reliable effects of a given size under standard statistical criteria such as p<.05 (Hansen & Collins, 1994; Lipsey, 1990). Although these areas will be the focus of TA efforts, it can be expected that at the end of the study some sites will be able to address the research questions of the study with greater statistical power than others. This should be considered in any cross-site analysis of findings, to distinguish between differences in outcomes and differences in power. Proposed team members have conducted integrated statistical analyses of outcome and service data similar to those outlined for the COS evaluation.

Plan to ensure that Multi-Site Study Design Addresses Purposes of GFA . To ensure that the final plan proposed for the multi-site study design addresses the purposes of the GFA, as well as incorporating all aspects of the logic model, prior to submitting the plan for adoption by the SC, members of the CC will carefully review the plan for consistency with the GFA. Concurrently, we will provide a copy of the same plan to the GPO and ask that the GPO make a careful review of the plan to ensure that all required elements are included within this plan within a two week period. After making any changes needed to bring the plan in line with GFA purposes, members of the CC will bring the final plan forward to the SC for ratification. The plan which is adopted will be distributed to each site and serve as the blueprint for later collaborative efforts.

Support site specific study features. In the development of study approaches by project sites, we anticipate that a number of hypotheses will be proposed, and there may be several divergent or unique approaches to the proposed research, and some overlap. The CC will facilitate a process for identifying areas of overlap and bringing these into closer alignment, through the CP. The CC will also set up a process that allows sites to add site-specific elements at the end of the CP to address site-specific issues

Monitor the fidelity of models at study sites. When evaluating service demonstration programs, it is critical to assess the fidelity of implemented services to the intended program models. The importance of this has been demonstrated recently in the RWJ-PCMI and the CMHS ACCESS demonstrations. RWJ-PCMI evaluators suggested that the failure to implement ACT-type CM in local sites may have accounted for insignificant differences in client outcomes (Lehman, et al., 1994; Goldman, et al, 1992; Ridgly et al., 1996). To assess fidelity to ACT, an empirically validated approach to fidelity assessment was implemented in the ACCESS program. Evaluators at participating sites and at ROW collaborated with Greg Teague to extend the ACT fidelity measures developed by McGrew & Bond (1994). Critical to proper interpretation of results within this multi-site study is development of an understanding of the characteristics of the programs at each site. While all study sites will share some characteristics (i.e. all will be COSs), they may differ in program model employed and services provided. It is important to understand similarities and differences to properly evaluate the success of a particular model. The instrument utilized in the fidelity study will draw upon several recent works which have proposed models of COS. The framework proposed by Mobray and Moxley (1997) as well as the recent work looking at consumer/survivor-operated self-help programs (Van Tosh & del Vecchio, 1998) provides a fertile starting point for articulating similarities and differences in project goals and objectives, services provided, target populations, organization and administration, and implementation issues. Secondary databases of organizational level data on COSs is now being developed or already available that could be helpful as well. Most notably is the pioneer epidemiological study funded by the Center for Mental Health Services of organized self-help groups that is underway at Chilton Research Services in collaboration with the National Mental Health Consumer’s Self-Help Clearing. The Consumer Component of the State Mental Health Agency Profiling System (Campbell, 1998) also ranks state funding levels of self help, and support for consumer involvement in policy and research. Some states were identified as "Islands of Excellence". The fidelity study will begin with a process of working with each site to develop logic models that fairly represent their program’s logic or program theory. Logic models can be useful in articulating program elements which should be measured and incorporated within analytic plans. In addition, in some cases, they may serve to guide implementation and outcome analysis (Johnsen et al., 1997). During a first year site visit, one task will be a collaboration between site visitors and program staff to articulate a logic model for the consumer-operated service program at the site. During the second and fourth years, provisions will be made to update these logic models during site visits to each site. Two members of the CC team will participate in each site visit. On the basis of the logic models and other information obtained during the first site visit, the express purpose of the second year site visit will be to carry out a fidelity study. The framework provided in Mowbray and Moxley (1997) and Van Tosh and del Vecchio (1998) is recommended.

Incorporating Qualitative and Quantitative Data. To achieve the multifaceted goals of the GFA, we will use a range of data collection techniques which encompass both the qualitative and quantitative research traditions. Data collected through the various qualitative and quantitative methods will be entered into the project level database. Quantitative data obtained from sites through the CP will be analyzed to assess the effectiveness of consumer operated services. Key outcomes as measured by the CPs will be entered into the consumer-level database and analyzed using the statistical procedures described. Additional quantitative data will include individual socio-demographic and clinical characteristics; service needs; service delivery and utilization; and consumer satisfaction. Data on program implementation and fidelity, including information about the program model, the intensity of services provided, and program staffing, will be obtained through both quantitative and qualitative methods. Quantitative data will be obtained via the fidelity study instrument. Qualitative information on program fidelity will be gathered through site visits. In addition to capturing key elements of programs through the logic models, during site visits we will conduct interviews with program staff to understand the evolution of the programs. During site visits we will also obtain or confirm data about the program’s system-level context (funding, availability of services in the area, linkages with other programs and agencies, managed care penetration and the program’s orientation to other programs).

Plan for Cost Study. As articulated within the GFA, the primary question of concern for the cost study is: To what extent does participation in a consumer-operated service program affect costs for the following: inpatient hospitalization, crisis intervention, and emergency room utilization, as well as offsetting costs in housing, criminal justice, vocational rehabilitation, physical health care and income support? The GFA indicates that examination of patterns of service use in retrospective or prospective claims data will be coordinated by the CC. Study sites will be responsible for providing the information for the cost study, while the CC will be responsible for developing a design and method for collecting and analyzing all cost information. It is likely that the COS will not undertake cost studies themselves, but they will collect the information required. While the cost of many COS may be lower than other types of services, to answer the GFA’s questions it is essential to go beyond the programs, to compare the entire package of services received by persons utilizing consumer services. Because of the likelihood of encountering a number of design or implementation problems in one or more sites, we will propose a plan which has some flexibility built into it: (1) We anticipate that all sites will be able to implement a common protocol (including items required for the cost study) for both the experimental and control groups, and will be able to provide information on the costs of consumer-operated program services, including unit cost(s). From this information, we will be able to determine the extent to which outcomes in the experimental and control groups differ, and the cost of providing COS that were received. At the very least, this will provide information sufficient to develop a cost-outcome model, and will provide a metric that can be used in reporting the impact of these programs (Yates, 1995; Yates & Newman, 1980). (2) We anticipate that some study sites will reside in systems in which public sector services have been organized into managed care arrangements, and will have established relationships with managed care organizations which could potentially provide administrative data on service use. Where possible, we will work with these sites to devise plans to (a) compare service use and cost between participants in the experimental (traditional services plus self-help) and control conditions (traditional services only); and (b) compare service use and costs for participants in the experimental condition before they began to receive COS and after they received COS. In our experience the information systems of managed care organizations are relatively efficient, so it may be possible to generate this type of information fairly quickly. (3) We anticipate that some sites may have the ability to utilize Medicaid claims data. We will utilize a Cost-Procedure-Process-Outcome model to frame the development of the cost study, and integrate the outcomes and costs studies. Our approach in the cost study is informed by a considerable body of work in cost-effectiveness analysis and cost-benefit analyses by the researcher leading the TF on Cost Analysis (Yates, 1995). Constructing this cost -procedure-process-outcome (or CPPO) model begins with describing the steps of operationalizing the resources (costs), procedures, processes, and outcomes of the treatment and then collecting data on these variables. A variety of analyses, including a series of multiple regressions (e.g., Yates, Besteman, Pilipezak, Greenfield & DeSmet, 1994) or structural equation models (Joreskog & Sorbom, 1989) can be used to test the significance of each hypothesized link between each component of the model.

The result of combining these cost-procedure, procedure-process, and process-outcome analyses can be a more complete and potentially useful understanding of treatment than would have been provided by either traditional clinical research or traditional cost-effectiveness and cost-benefit analysis. By finding the significant links between costs, procedures, process and outcomes for these programs, those interested in developing consumer-operated service programs will be in a better position to select those procedures that require the least expensive resources and that produce the process that will achieve the best interim and long-term outcomes. Beyond establishing relationships between costs-procedures- process and outcomes, CPPO models can be manipulated to show how treatment costs can be minimized, treatment outcomes maximized, or both. Sometimes this manipulation is obvious, for example, when the most effective model is also the least expensive. More complex CPPO models may however, require more sophisticated manipulations to show which combination and sequencing of procedures yields the most cost-effective and cost-beneficial solution. These would not be path analyses, which show the strength of different routes between resource expenditure and outcome attainment, but linear programming and other operations research analyses (Yates, 1980).