COSP Home
Archives   Calendar    Coordinating Center    Multisite Activities    
Project Description
  COSP References  Study Sites    Search Our Site   
Table of Contents
    Upcoming Meeting Agendas

______________________________________________

 

Selection Criteria and Randomization

    June 2, 1999

The Issues

The study sites, the Coordinating Center, the federal representatives, and the Logistics Subcommittee Group members have struggled to gain agreement about the best way to define the study population and to randomize participants into the control (traditional mental health services only) and the experimental (consumer-operated services plus traditional mental health services) conditions.  Thus far,  we have not yet been able to find common ground on this very important issue.  There are at least three related but distinguishable issues here:

1.              Whether study participants will be selected from the pool of all consumers in a community who are eligible for participation in the study ("adults with serious mental illness..." "...who have actively been participating in a traditional service program for a minimum of 1 year, but have not participated for at least 6 months in the consumer-operated service program" GFA, pp.6,8) or whether this pool will be limited in some way to a sub-group of individuals more likely to participate fully in the types of programs represented in our multisite study;

2.              Whether there is a need at some sites for additional restriction to the study population which is the focus of the program at a particular site (e.g.  persons who have been "dually diagnosed");  and

3.  The timing of random assignment in relation to the selection points and the administration of the baseline interviews.

In many ways our differences on these issues are similar to the diversity of opinion we have experienced around measurement of symptoms.  There are many stakeholders involved in the COSP and we all feel passionate about our perspectives.  We also want to have the best study possible--one that produces knowledge about consumer-operated services that is accepted by the mental health research community and is compelling for policy-makers.  But we must also recognize that there are potentially many goals for a multisite study, and therefore several ways of achieving excellence and usefulness for policy purposes.  In the name of excellence we do not want to endanger the capacity of the individual study sites to participate in the cross-site study, nor do we want to embrace actions that are so contrary to the values of some of the partners involved in the COSP that we undermine our collaborative process.  Usually a compromise can be worked through if we continue to dialogue and do not abandon the faith that from our differences a quality project will emerge.

Defining the Research Questions

The GFA challenged us to demonstrate for whom, at what cost, and with what outcomes people receiving traditional mental health services can be referred to consumer-operated services.  This is certainly what managed care companies are interested in finding out before they commit to including consumer-operated services in their provider networks.  The GFA’s statement that this program is to  "focus on the evaluation of existing consumer-operated programs" might be interpreted as implying that consumer programs would not be required to significantly alter their current target populations or referral/recruiting procedures.   But the main reason for focusing on existing programs was to avoid the start-up issues for new programs, when many successful ones were already in existence.  In addition, there was a belief that once we have developed knowledge about the efficacy of particular programs we are more likely to be able to disseminate that knowledge and to replicate the program if the original program is still in existence after the grant.  It was felt that there was a much greater likelihood of continuation for consumer operated programs that were in existence prior to the study.

The GFA does not prohibit the possibility of altering recruitment or referral procedures; in fact, the purpose of the service enhancement money is to raise capacity of any single program to include more people.  This could mean including different types of people than those usually served (e.g., dual diagnosis, those not going through such extensive pre-screening, etc.)  So the GFA should be seen as neutral on this issue, allowing for the spectrum of possibilities in recruitment and referral as appropriate to answering the many questions of the GFA.

Uniformity and Generalization

Randomization is a method for assuring comparability of persons entering our control and experimental conditions.  With random assignment, every person being randomized has an equal chance of getting into either the traditional service only program or into the traditional plus consumer operated service programs.  It is this equality of chance at the outset that allows us to make comparisons of outcomes between the two groups with the greatest amount of confidence.  If the groups were to differ substantially on even one factor, for example, gender, this one factor could possibly account for all of the differences between our two conditions, and then we would not be able to make any solid conclusions about  the outcome differences being affected by the type of service program.  There are some statistical procedures such as "case-mix adjustment" that can be used to help control for some pre-existing differences, but these have limitations.  But in some cases they can't even be used --for example, if 80% of the control group were male and 80% of the experimental group female, the available statistical control procedures would tend to fall apart and we would not be able to adjust for this pre-existing difference.  Even with the use of statistical fixes our confidence in the end result is never as good as if two groups had been identical before assignment to one of the two conditions.

At issue for our group isn't whether to randomize (it’s required in the GFA), but when to randomize (the GFA didn't specify this as clearly).  The question is whether our multisite should sample from:   1) the broadest possible group of typical users of traditional mental health services and randomize at that point or, at the other end of the spectrum,  2) apply randomization to a much more select and narrow group of consumers and randomize at that level.  With the first scenario we are more able to generalize our findings with confidence to the larger universe of all consumers who use traditional mental health services, but we risk having high levels of study withdrawal, lower levels of participation in  programs, and potentially changing some ways in which the programs have been recruiting.  With the second scenario, we are less likely to have study withdrawal, and programs can pretty much conduct themselves as they have been in the past, but the group of consumers to whom we can generalize our results is much smaller.

The Essential Tension

There is an inevitable tension between the study sites and the Coordinating Center because the goals, needs, and proposed study designs take us in two different directions.  The study sites are interested in people who attend their programs and the integrity of the research they have designed to evaluate the characteristics they feel are most important in their programs.  The COSPs are differentiated based on type of COSP, traditional mental health service delivery system, and characteristics of program participants.  In the interest of making it possible to draw fair comparisons among the programs, the multi-site study seeks to standardize as much as possible the elements of the study such as the CP, the data waves, selection criteria, and how people are randomized into the study.  

The multi-site study seeks commonality because without standardization in as many areas as possible, pooled data will be more difficult (or in some cases, impossible) to interpret.   Standardization across more dimensions (such as timing of administration, and common protocol) facilitates cross-site comparisons.  For example, If selection differences result in study samples that vary greatly in composition from site to site, there will not only be limitations to the  generalizability of the overall study results as indicated above, but valid comparisons among the sites will be more difficult because substantive differences in programs will be confounded with differences in the samples (case mix).   It may help to make a distinction that the reliability/generalizability theorist Cronbach use to make as a way of promoting greater policy-relevance for evaluations.  This is the distinction between generalizing findings across a set of programs and trying to decide what populations the set of findings generalizes to.  Both are important to this study.  Every reduction in unnecessary variation across study designs will strengthen our ability to draw valid conclusions about the observed cross-site variation in patterns of outcomes and costs.

There are, of course, some things that cannot be standardized.  That is why we had to develop two tracks in the cost study.  Sometimes data analysis will enable us to adjust for differences, such as the two sites that cannot follow the baseline, 4-month, 8-month, 12-month data collection schedule.  Then there are the differences in the type of programs themselves (drop-in, peer support, education).  We cannot change or moderate those differences except by doing cluster studies, which aren’t the same thing as a multi-site study. 

To preserve the possibility that analyses based on a pooled, cross-site design will be meaningful, we have to proceed on the belief that there are important similarities among all the programs because they are all consumer-operated and people were randomly assigned from the general pool of people who are receiving traditional mental health services to the experimental conditions.  We should acknowledge, however, that generalizing to the pool of all consumers of traditional service is one possible goal, and that careful descriptions of the procedures selection and resulting characteristics of individuals in the studies at each site may make it possible to talk about efficacy for that group, even though uncertainties about the sources of cross-site variation in outcomes may have to be stated in terms of both programmatic and case-mix differences.

Recruitment and Retention Fears

Now we come to the heart of the issue of selection of study participants.  There is a growing concern among the sites that they will not be able to recruit and/or retain enough study participants to have a viable study.  Factors such as the exclusion criteria, projected withdrawal or low levels of program participation, cross-over or contamination, and a small population from which to sample challenge the capacity of the sites to achieve the projected numbers of full participants  they originally proposed.  In other words, the number of participants who receive an effective "dose" or experience of what the program has to offer could create effect sizes that are too small to maintain sufficient statistical power for the individual studies, and perhaps even underpower the cross-site study. 

The Introduction of Bias

Strategies to resolve this potential problem include:

1.              Expanding the potential population from which to sample (opening up the exclusion criteria to permit those who have had limited participation in the COSP into the study), and

2.              Recruiting people who are more likely stay in the COSP (prior to randomization, conducting some pre-selection process or offering a detailed or visual program introduction so people that are not interested in the COSP would deselect). 

There are advantages and disadvantages to both of these strategies. 

The former strategy reduces the possibility of recording the "whopper effect" on outcomes: recording those short-term outcomes that may result from people realizing for the first time that they are not alone as a consumer, and that peer support gives them hope for a better life.  This hypothesized effect is lost if  most participants have some prior familiarity with consumer-operated programs in general, and the COSP in particular.  If significant changes due to initial exposure to the COSP are expected, prior exposure will reduce the overall effects as well.  We may have to increase the number of participants to get enough power to be able to report significant differences between COSP and traditional services. 

The latter strategy could create difficulties if the point of the COSP initiative is to have a study that generalizes to all consumers receiving traditional services and may prevent us from being able to answer the questions of the cross-site study for the primary study population: people with serious mental illness in traditional mental health services who are referred to the COSP.   Selection bias can be introduced when participants are selected from a pool that is composed of people who are familiar with the COSP, as in those who are interested in participating after they have had a COSP orientation or those who meet additional selection criteria that make them a "good" COSP candidate.  These study participants would look more like those already in COSP rather than those randomly referred from a traditional service provider.   But then our study becomes one in which we are looking at people who are predisposed to stay and/or succeed in COSP.   The result could be similar to what happened in studies of Alcoholics Anonymous and was the source of  continuing severe criticism of the model's applicability.  The AA literature, until a few recent studies, had primarily looked at efficacy for very select samples, so that it was difficult to say from a scientific standpoint whether AA was appropriate for a general population of persons with alcohol problems.  We would like to avoid similar criticisms of the COSP findings.  It would be easy for others to dismiss the findings if the samples are too highly pre-selected-- especially if we don't have a good description of precisely how those samples were selected.   Because of site-to-site consumer-operated program and member/recipient variability, the bias would also vary across sites.  It would be more difficult to combine the results from such diverse populations and be able to say anything meaningful.   In other words, the policy impact of having a multisite trial would be greatly reduced, with less likelihood of dissemination and adoption of these services by others.

 

Other information could be lost as well.  Let's say that when people of color went to the COSP orientation they didn't find much cultural diversity in terms of program emphasis or members that were like them.  They may decide not to participate and would, therefore, "deselect" themselves prior to randomization.  This could be true for middle class people that are introduced to a program that caters to mainly homeless people, or elderly consumers who are introduced to a COSP that is geared to activities for younger consumers.  On the other hand, it might make no difference at all.  The only way that we can tell what programs work for whom, at what cost, and with what outcomes is to randomly assign study participants to the COSP without pre-selection and then follow the results.  It is possible that some programs may have focused on a narrow subgroup of consumers and that the potential exists that COSPs might be shown to be effective for a much broader group than they are currently serving.

Alternative Approaches

If the study sites randomize from the traditional service settings without introducing the sampling bias described above, are there ways to make sure that the effective sample sizes are sufficient to do the research, and also assure that the burden to sites and participants is not extreme?

If the pool of potential participants is large enough, sites can over-sample to replace people who stop attending COSPs after only one session.  We can keep everyone in the study, even those that leave.  There are some ethical issues in studying people who leave a study.  We must get the consumers' permission to be interviewed even though they have declined services.  Also, combined data will tend to show less effect because people who quit attending the experimental condition may not have outcomes that show significant change from baseline. 

You could also increase the number of data waves and therefore increase statistical power with a reduced sample size.  However, there is only so much time for data collection and increasing the number of data points might mean that people recruited later in the study do not have sufficient time to complete all data collection waves.  That could create missing data.  Intensification of data collection within a shorter period could also create burden for the respondents.  In both of these strategies, there may be study cost impacts as well.

One thing we must recognize is that the projected rate of withdrawal of participants is hypothetical, not a fact.  There may be ways to reduce withdrawal from the programs that are consistent with the way they currently operate.  Sites were provided with additional resources to increase capacity.  Could some of these resources be used to fund retention tactics (build capacity to keep more people) such as strengthening programmatic and administrative functions so the programs do what they do, only better?  This is what the federal representatives intended the service enhancement dollars to be used for--to build capacity for the purpose of increasing sample sizes in the study.  Better orientation programs, supported transportation, rigorous tracking, mentoring/tutoring and buddy systems implemented where appropriate, and enhanced follow-up are all ways to keep people connected.

Next Steps

We could be at a turning point in the multi-site study.  If we break down into clusters rather than pushing for standardization, or if we allow each site do its own thing, we could end up with eight individual studies, but with no meaningful opportunity to do cross-site data analyses.  The federal representatives would be seriously concerned if we went in this direction.  It would violate one of the parameters of the GFA in that there would not be an adequate multisite study.  Many important questions would not be answered.  There have been multi-site studies where the program types were not ready for a multi-site study because the field was too diverse.  Those multi-site studies ended up in a salvage process.  If people in the COSP believe that a multi-site study is important to the future of consumer-operated services, and if they believe in the intellectual power of collaboration, we can get beyond this impasse to a place that will accommodate rigorous science while still respecting individual study sites.  The key is to take the time to elaborate the issues and tradeoffs carefully, and to work toward a consensus on this critical issue.

Up ]

Missouri Institute of Mental HealthBullet5400 Arsenal StreetBulletSt. Louis, Missouri 63139
BulletPhone: 314-644-8787 BullletFax: 314-644-8834