A Model of Cross-Disciplinary Communication for Collaborative Statisticians: Implications for Curriculum Design

The ability to bridge multiple disciplines is critical to the successful practice of collaborative statistics, yet the literature on statistical education devotes relatively little attention to how this skill can be taught. Our goal here is to describe a general conceptual framework within which a curriculum on communication and leadership could ultimately be organized. The primary research question pertains to whether an actionable model of cross-disciplinary communication for collaborative statisticians can be developed, and our task here is to describe such a model and also to illustrate its use. Within this model most communications either share or request information. For example, statisticians might provide information about statistics (e.g., specific statistical approaches, general statistical principles), comment on the clinician’s understanding of statistics, share their understanding of clinical content, and request information (e.g., about clinical content, the design and execution of the study being discussed, etc.). Clinical investigators contribute an analogous set of components. In addition, a critical element to the interaction is the higher-level task of developing a mutually understood agreement about the work to be performed: in essence, proposing and negotiating such an agreement. The model is illustrated using a case study, and general qualitative feedback from investigators who performed the case study was obtained, commenting on both successful and unsuccessful interactions with statisticians. Implications for curriculum development are discussed.


Introduction
In biomedical applications, among many others, team science is becoming increasingly common (Bennett & Gadlin, 2012). Within the context of team science two types of collaborators are particularly valued. The first type of collaborator has deep expertise in a topic area. The second type of collaborator has the ability to bridge multiple disciplines. If these two types of collaborator can be combined within the same individual, then so much the better.
Our ultimate interest is in training collaborative statisticians who will work within biomedical research teams. How to train the first type of collaborator above is a problem which is traditional and straightforward (although not trivial): simply teach the student all you can about statistics and hope for the best. How to train the second type of collaborator is a problem which is equally important, yet has received relatively less attention. Indeed, apart from a course in consulting, the typical degree program in statistics gives relatively little systematic emphasis, and provides relatively little specific training, on how to communicate and lead. This is ironic, since such skills are acknowledged to not only be a critical element of team science (Bennett & Gadlin, 2012), but also of effective statistical practice (e.g., Baskerville, 2981;Begg & Vaughan, 2011;McCullich et al, 1985, Sharples et al, 2010Taplin, 2003;Taplin, 2007;VanBelle, 1982;Winkvist, 1990).
Within the applied statistics community there are three general models for training collaborative statisticians. The first model involves teaching both statistics and specific subject-matter expertise in another scientific area. An example of the first model is programs in statistical genomics, and the intention is to provide deep training in both statistics and genomics, and also to provide hands-on opportunities to integrate the two. An advantage is that the student gains deep knowledge of genomics in addition to statistics, and thus is ideally suited to statistical genomics. A disadvantage is that the student runs the risk of over-specialization and will not be able to easily transition to other areas of statistical application.
The second model involves teaching both statistics and general knowledge about biology and medicine. Within our graduate programs, an example of this model is a core course on biology and medicine for statisticians. The organizing principle of this course is evolution, the emphasis is on basic information and terminology, and also practice in techniques for acquiring additional information as needed. An advantage of this model is that even a small bit of basic knowledge can be helpful, especially when that basic knowledge is embedded within a robust conceptual framework, and also that this provides maximum flexibility for students who are uncertain which specific areas of biology and medicine are of most interest to them. A disadvantage is that coverage must, of necessity, be rather selective and shallow.
A third model involves developing skills in cross-disciplinary communication and leadership. It is agnostic about the level of training that the student receives about biomedical content, can be applied in conjunction with the other two models and, indeed, can supplement them and increase their effectiveness. Model three directly focuses on skills which will help students to actively participate and lead collaborative teams, and is the conceptual framework here.  This is a study about patient distress during initial treatment for Hodgkin's Lymphoma. The clinical investigator ("the clinician") has collected information about distress using an instrument called the "distress thermometer" (DT). The DT, which is recorded in integer values, ranges from 0 (no distress) to 10 (maximum possible distress). Associated with the DT is a 39-item problem list, which links DT readings with potential causes of distress. The problem list is not considered here. A similar use of the DT is described elsewhere (Troy et al., 2018). Figure 2 illustrates the data collection schedule for an illustrative patient.
What follows is a somewhat stylized version of an actual conversation between the statistician and the clinician (in the actual application: an experienced investigator attentive to the nuances of cross-disciplinary communication), and is intended to illustrate various elements of cross-disciplinary communication. The content of the conversation is labeled A-J: for now, those labels should be ignored.

Interaction between the Statistician and the Clinician
Element 1: Statistician (C, D, I): "I understand that the overall goal is to describe distress during therapy. I propose that our task is to translate this goal into more specific study questions which are both clinically important and statistically actionable. We'll probably have to simplify the questions a bit, but they should still maintain clinical suitability. But, first, please tell me more about how patients are treated, and what you're thinking as a clinician when you see a patient, and what our data can ultimately do to make those clinical encounters more effective." Element 2: Clinician (E, J): "That seems like a good overall plan. The diagnosis of lymphoma can be particularly challenging for a patient, since it is so unexpected. One day the patient receives blood work for symptoms that are often rather like anemia, and instead they receive a diagnosis of cancer, and moreover need to quickly start a difficult program of chemotherapy. Fortunately, for most patients the chemotherapy is successful, and they might go into remission for decades.
Even under the best of circumstances the chemotherapy has various side effects -extreme fatigue, for example -but these typically aren't life threatening. But we do have to keep an eye out for dangerous adverse events -including damage to the heart and lungs -and, also, the more chemotherapy we provide the greater the risk of cancer and other unfortunate things happening in the future. Sometimes the therapy becomes so hard on the patient that we have to pause. Indeed, on a chemo day the patient first has some lab work, a DT is requested, and we wait for the lab results (and also the patient's report of how they're feeling) to make the final decision about whether to proceed with chemo on that particular day.
Over time we've been providing less and less intensive chemotherapy to patients, and the goal is to get by with as little as possible, consistent with achieving remission. We typically prescribe 6 cycles of therapy, although it can be as many as 8 or more in some cases, and as few as 4 or even 2 if imaging shows that the patient's cancer burden has decreased sufficiently. Also, therapy can be delayed because of holidays or other circumstances that aren't directly related to the patient's clinical condition." Element 3: Statistician (C): "I understand that, even though the schedule for therapy is tentatively set beforehand, in reality it's flexible. A major component of that flexibility is that if the patient is suffering from serious side effects you'll delay it. Also, you're trying to give as little chemotherapy as possible so long as it's doing its' job, and so if things are going well you'll end chemo as early as possible. Is that roughly correct?" Element 4: Clinician (F): "Yes." Element 5: Statistician (D): "Is it safe to assume that, even though patients receive different number of cycles of treatment, and even though those cycles might be differently spaced, everyone eventually receives all the chemo you think they need?" Element 6: Clinician (E): "Yes. Occasionally there are disasters where the patient doesn't respond to chemotherapy, or there are such severe adverse events that we have to stop, but none of that happened in this particular study." schedule for one of the patients. Could you please walk me through it?" Element 10: Clinician (E): "Certainly. There are 12 chemotherapy visits, divided among 6 cycles of 2 visits each. For example, the two visits within cycle 1 are labelled 1A and 1B. This particular patient received 6 cycles of chemotherapy, followed by radiation. There is also a standard pre-chemo visit, which is the natural point to provide information about what the patient is likely to experience. The standard time between chemo visits is 2 weeks (14 days), although it isn't critical to maintain exactly equal intervals between doses if circumstances require slight modification of the schedule. For example, we might reschedule visits because of an upcoming holiday. If the patient is doing poorly we might also delay chemo. We try to record a DT at every clinic visit as for this patient, although in reality we don't always succeed in doing so. For example, the second DT in cycle 5 is missing." Element 11: Statistician (A, D): "Do you conceptualize therapy in terms of cycles of therapy or in terms of visits? The reason I ask is that we have the option of conceptualizing the statistical analysis in either fashion." Element 12: Clinician (E): "We usually conceptualize therapy in terms of cycles, but the therapy is delivered on a visit by visit basis. For example, if the patient is doing poorly at a particular visit we'll delay chemo, even if it's in the middle of a cycle." Element 13: Statistician (A, D): "Missing data can be a major issue with longitudinal datasets such as this. It's a particular problem when it's systematically missing -for example, if the DT is more commonly missing when the patient has a high degree of distress. How much of a problem is that likely to be here?" Element 14: Clinician (E): "It probably isn't a big problem here. On the days when the patient is feeling so poorly as to not receive chemo we'll still usually have a DT recorded. Even though it's only an approximation, I think you could safely assume that there's no particular pattern to the missing DT data." Element 15: Statistician (D): "Returning to the idea of what you could tell your patients before beginning treatment, would it be helpful to tell your patients about the maximum level of distress that patients typically experience?" Element 16: Clinician (E, G, H): "Certainly. And I see that what you've done is to boil all the data from each patient into a single number. Is that what we'll be doing in general?" Element 17: Statistician (A, B, D): "Sometimes our summaries might have to be more complicated than a single number, but you're quite right that what we're trying to do is to select summary measures which are both clinically relevant and as simple as possible, and of course the simplest possible summary measure is a single number. What else would you like to be able to tell your patients? You could translate things into a single number if you like, or you could just talk me through the issue and we could derive a summary measure together." precise. Why not try 10 charts as a pilot and then let's look at the results?" Element 28: Clinician (E): "I anticipate that we'll find that 9 or more out of these cancellations will be for clinical reasons, and that would probably be all our readers need to know to interpret our results." Element 29: Statistician (D): "Anything else you'd like to be able to tell your patients?" Element 30: Clinician (E, H): "Is there enough of a pattern in the DT values over time for me to describe a likely trajectory or trajectories?" Element 31: Statistician (A): "That's potentially complicated -let me sleep on that one." Element 32: Statistician (A, I): "From an operational perspective, what I'll be doing is to create a data array which will first translate the information in the figure we just reviewed into a flat file with one record per patient and whose columns will be the various DT values taken over time. Then I'll add some new columns to the data array which are derived from the existing columns -for example, one such derived column would be the maximum DT value for that patient. Finally, I'll use that new data array as the input file to create the tables and figures which we'll include in our manuscript -for example, I can create a frequency table of the maximum DT values for each patient, and then translate that table into a histogram.
To illustrate, the first row in Table 1 is a simplified version of the data array for our example patient. I've labeled the observations by cycle -for example 1A and 1B are the first and second DT values within cycle 1. There is space for up to 8 cycles. As derived variables, I've counted the number of cycles, calculated the maximum DT, and also the cycle in which that maximum value was achieved. I haven't used the pre-chemo DT within the calculations, but have included it in the data array to remind us that we need to eventually figure out how to include it in the analysis plan. The data array would actually have multiple rows -one row per patient -and I've added data values for 3 additional patients.
One way we'd use the data array is to produce analysis tables. For example, I've illustrated how we'd use the 4 illustrative patients to start to populate some frequency tables (or histograms if you prefer).
Is this description of my task consistent with your understanding of what we're trying to achieve?" Element 33: Clinician (J): "Yes. Did I happen to mention to you the deadline date for the abstract I'd like to submit on this? It's coming up soon." And the conversation proceeds…  Max cycle  129  8  7  7  3  2  2  1  0  0  9  .  3  1  6  9  5  140  9  3  3  2  2  1  0  0  0  0  0  0  0  0  0  0  0  8  3  1  142  2  3  2  2  2  1  1  1  0  4  3  1  144  8  3  1  1  1  1  1  1  0  4  5  1   Table 1 is the data array for an illustrative patient. The last 3 columns represent derived values -that is, the number of cycles, the maximum DT value and the cycle during which that maximum value is achieved.  In addition, a critical element to the interaction is the development of a mutually understood agreement about the task at hand:  I: Proposing a description of the task at hand (this could be a counter-proposal, if the previous proposal was either insufficiently understood or unacceptable)  J: Agreeing to the description of the task at hand For example, in element 3 the statistician shares their understanding of the clinical content being studied, whereas in element 5 the statistician prompts the clinician for information about clinical content. In element 9 the statistician describes a general statistical principle (i.e., thinking in terms of data structures), whereas in element 19 the statistician proposes specific analytical approaches.
The most common actions by the statistician are, as might be expected, A and D: that is, providing information about statistics and prompting for information. (Similarly, the most common actions by the clinicians are E and H.) Of interest, under A: the statistician doesn't merely recommend particular statistical procedures, but also shares general principles of statistics to help the clinician place more specific recommendations into better context, and also to help educate the clinician about those principles as this can help the clinician to more effectively communicate with statisticians in the future. (In fact, this particular clinician is quite experienced with working with statisticians: for example, as illustrated by describing clinical issues in lay terms.) Although less common, the statistician describes their understanding of the general clinical content, which the clinician subsequently validates -this helps reassure the clinician that the basic elements of biomedical science are understood and the statistician's recommendations are likely to be on point. The clinician acts similarly. It is the development of this mutual understanding that is a key element of successful communication across disciplines.
Finally, we highlight items I and J, which describe the development of mutual understanding about the task at hand on a higher level of abstraction. That is, above and beyond achieving consensus about particular analyses, the overall task of writing a scientific paper using this longitudinal dataset can be fruitfully conceptualized in terms of a conversation between a clinician and a patient about the magnitude of distress they are likely to experience during therapy, the inputs to which would be single measures (i.e., single numbers, such as the maximum value of the DT) derived from analysis of this more complicated longitudinal array of data.

Discussion
Our primary question was whether an actionable model of cross-disciplinary communication for collaborative statisticians can be developed. We have described such a model, and also illustrated its use.
For this model to be "actionable", it should among others be useful in curriculum development. We argue that this model can assist in the conceptualization of a curriculum, by replacing the general and nebulous construct of "communication skills" with more specific sub-constructs such as "providing information about statistics", "commenting on the clinician's understanding about statistics", "sharing their understanding of clinical content", "prompting for information (e.g., about clinical content, study design, etc.), and "clearly outlining expectations".
We further argue that these sub-constructs can be useful in developing classroom exercises. Example exercises might, for example, be organized around the following:  Practice summarizing information about scientific and clinical content. A natural progression would be to begin with written content -for example, creating short summaries of review articles, creating short summaries of the background sections of grants -then move to summarizing the content of videotaped interactions between statisticians and clinicians -and then finally to providing such summaries within the structure of real-time conversations.  Practice summarizing information about statistical content to other statisticians. Rubrics can be particularly helpful -for example, some of the typical elements to be communicated include the study question, the study design, the type of variables (e.g., predictor, outcome, covariate), the scale of measurement of the variables, the structure of the data array, the statistical methods, etc. Indeed, much of the communication from statistician to statistician can occur even in the absence of clinical content. For example: "this is a longitudinal study design, with an irregular number of observations per patient, usually but not always collected at regular intervals, with an outcome variable (on a 0-10 integer scale) which can be treated as continuous, and for which the primary task is to derive (ideally, scalar) summary measures at the level of the patient."  Practice providing statistical information and interpretations to clinicians. Many elements of the rubric for communicating with statisticians can be helpful -for example, restating the study question, describing the data array, etc. What differs is the de-emphasis on statistical jargon and an increased emphasis on "show and tell", for example, an example of the data array (as illustrated by Figures 2 and 3), simplified numerical examples, data tables (as illustrated by Table 2), etc.  Practice prompting for information. Often, an effective way to do so is to frame such prompts in terms of "this is what I think I know and this is what else I need to know".  Integrating all of the above -for example, in simulated interactions with clinical investigators. These interactions can be videotaped, and the exercise might begin by having trainees grade various elements of videotaped interactions selected from a library.
We argue that one advantage of the proposed model is that its steps are explicit. In particular, it recognizes that every component of a typical interaction between a clinician and a statistician, in effect, either provides information or prompts for information. It also clarifies that the information in question either pertains to statistical content, clinical content, or an integration of the two. Finally, the model clarifies that the ultimate goal is a shared understanding of the task at hand. While a classic communication or consulting course could help provide the skills to gather information in a meeting and develop an analysis plan from it, different and more extensive exercises are likely needed to help teach students to accomplish the higher-level tasks of shaping project expectations and building a plan in concert with other collaborators.
Variation is a fundamental element of biology, and it is true that, due to variation in personality and life experiences, some students of statistics will enter their training with better overall "communication skills" than others. And, it must also be acknowledged that the stereotype of statisticians (i.e., math nerds whose first language might not be English) as poor communicators has some basis in fact. Nor are statisticians the first choice for stand-up comedy: In the same spirit of separating art from science, we also believe it is possible to assess communication more systematically than "I know a good communicator when I see one". For example, a rubric for "prompts for information regarding study design" might be:  Excellent: Audience has a clear and specific understanding of what is requested. This might include placing the request within the clinical and scientific context of the study, and using figures and tables for clarification. Example: "I want to better understand the process by which DT measurements are either present or absent, since if DT measurements are systematically missing this could induce a bias (context) in the analysis. I see that in Figure 2 most but not all visits have a DT (use of figures and tables). Why is the second DT in cycle 5 missing (specific request)?  Good: Audience has a general understanding of what is requested, but might have some residual confusion about details or context. Example: "I see that one of the DT measurements is missing. Why is that?" (A generally clear request but no explanation of why the information is needed. Pointing to the second DT in cycle 5 would have but been more specific).  Fair: Audience understands some but not all of what is requested. Example: "The statistical techniques I'm using assume that there isn't any pattern to the missing data. Is that OK here?" (Some context and a request, but "pattern to the missing data" doesn't capture the entire construct and, although the audience will understand that the request is about the pattern of missing data, precisely what is desired isn't necessarily clear.)  Poor: Audience has little to no idea what is requested. Example: "I'm going to treat missing data as completely at random." ("Missing completely at random" is jargon which is likely unfamiliar to the audience. There is no actionable request.) So long as these ratings have an adequate degree of reliability (e.g., within-and across-raters), such qualitative components need not induce unease, especially if they are accompanied by illustrations of the type of work products which are placed into the various categories.
Our goal here is not to provide a detailed description of a curriculum which is based upon the conceptual model discussed here, but instead to describe a general conceptual framework within which such a curriculum could be developed. Especially helpful in curriculum development will be some practice in operationalizing its elements. Indeed, a particularly promising target audience for preliminary testing are the masters-trained statisticians within our in-house consulting group. In contradistinction to graduate students, practicing statisticians will be regularly encountering the issues in question and, for example, case studies can be based upon students' actual experiences.
Also helpful at an early stage of curriculum design is general qualitative feedback about the content area in question, whether provided in writing, via individual interviews, via focus groups, etc. In that spirit, we asked the collaborative team to provide comment on their interaction with the statisticians during the project used as our illustrative case study. The questions were posed broadly: "What went particularly well (or poorly) for this project?" and, more generally, "What has gone well (or poorly) with your interactions with (or as) statisticians in collaborative projects?" Some of the insights from this feedback include:  Figures, tables and similar illustrations are an effective aid to communication. Indeed, the basic training of statisticians might include standard templates for describing studies --for example: the research question, study design (e.g., Figure 2), data array (e.g., Table 1), analysis plan as illustrated through analysis tables (e.g., Table 2), etc.  An important element of cross-disciplinary communication, and one which is sometimes overlooked, links the data collectors to the other investigators. Among others, the data collectors understand crucial nuances about data quality -for example, when and why data values might be missing, when and why recorded data values might be suspect, etc. Indeed, in our case study the graphical description of the data collection schedule (i.e., Figure 2) -a figure which was extraordinarily helpful to all the investigators including the statistician -was produced by one of the individuals who collected the data.  In a cross-disciplinary collaboration including "why" when asking for more information is helpful for all parties. For example, this will help the clinician assess the statistician's understanding of the clinical issues, the clinician's understanding of the statistical issues, etc.  Not every question requires an immediate answer. For example, in the case study element 31 illustrates tabling a question for later consideration, allowing the current focus to be maintained. Indeed, as the writing of a collaborative manuscript proceeds things should gradually come into focus, so there is something to be said for performing a few analyses, interpreting the results, and then using this as information to help inform subsequent efforts.  Explicit discussion of the process of collaboration is helpful. In particular, it is often helpful to agree ahead of time about the likely number and content of meetings. For example, the first meeting might discuss the study questions, the study design, and a rough outline of the analysis plan. The second meeting might discuss preliminary results, their interpretation, and revisions to the analysis plan (and perhaps revisions to the research questions as well). The third meeting might flesh out the story, and also discuss which additional analyses can add color to that story. The final meeting might be a final check on the study results before submission to a journal (reproducibility being critical), as well as thoughts about further publications and studies, a discussion of lessons learned and what can be improved in the future, etc. Of course, not every collaboration will follow this particular schedule -instead, what is important is that the plan for collaboration be explicit, thus helping everyone to understand where things currently stand and what is being asked of them.  Our model of the components of interactions within collaborative statistical practice (i.e., elements A-J) is, in turn, embedded within a broader model of how this practice is organized (i.e., Figure 1). The success of this latter model is predicated on an a priori peer relationship between statistical and clinical collaborators, and the early integration of statistical support into the research project. Of course, not every project actually operates that way, and in our experience, when things go wrong the root cause is usually a combination of lack of knowledge, miscommunication and unrealistic expectations. However, we would argue that the basic elements of our model -for example, focusing on making discipline-specific knowledge explicit and sharing it with others -can also be effective when trying to recover a project that's taken a turn for the worse. Indeed, there's much to be said for including dispute resolution within curricula for those who plan to pursue cross-disciplinary collaboration.  In a cross-disciplinary collaboration everyone is responsible for its success. Our model focuses on the role of the statistician, and this would be particularly so within educational programs which train statisticians, but in actual practice the "process management" described here could potentially be led by others such as the more experienced of the investigators, the project manager, etc. Indeed, tracking the quality of the collaboration and intervening if necessary is one of the characteristics of excellent project managers -for example, an experienced project manager can help bridge gaps in understanding among the parties and keep the work on track.  The previous comment points to the role of leadership in a collaborative research project -it shouldn't be limited to that of the statistician but statisticians should be trained to provide their share, and also what to do if things go wrong. Conceptualizing a curriculum for collaborative statisticians in terms of communication and leadership rather than just communication has a lot to recommend it.
Our model for cross-disciplinary communication is fundamentally constructivist in nature (Fosnot, 1995;Phillips, 1996). Indeed, by explicitly classifying the elements of a successful conversation between a statistician and an investigator, the intention is to assist the student to build the scaffolding needed to create a sound mental map of this element of their discipline. Once such scaffolding is created, the student can tailor the details of that mental map to their own circumstances and strengths: for example, although every statistician must explain statistical concepts to others precisely how they do so is a highly individualized skill, developed over time.
In closing, experience suggests that the "silo" version of biomedical science -that is, where the statistician is only responsible for analyzing the data as data but without context, where the clinician is unaware of how the analyses are performed and entirely delegates this task, etc. -can easily lead to "the right answer to the wrong question". Indeed, it can be rather like the children's game of telephone where a message is whispered from one person to another and is usually quite humorously mangled in the process. We have argued that cross-disciplinary science involves a number of exchanges of information --for example, between the clinician and the statistician -and that effectively managing these exchanges is key to developing effective bridges between disciplines. We have additionally argued that a simple model of how cross-disciplinary communication should occur could potentially assist in the development of curricula in communication and leadership, and we look forward to developing additional experience with this model.

Conclusions
Our answers to the questions of "can a model of cross-disciplinary communication for statisticians be developed?" and "is such a model actionable?" are "definitely yes" and "tentatively yes", respectively. These are intended to be initial and formative efforts, albeit informed by years of experience in collaborative research, and the next steps to turn the answer to the second question to "definitely yes" are to develop educational materials based upon this model, and to submit those materials to formal evaluation.