A Course in Biology and Communication Skills for Master of Biostatistics Students

We describe an innovative, semester-long course in biology and communication skills for master’s degree students in biostatistics. The primary goal of the course is to make the connection between biological science and statistics more explicit. The secondary goals are to teach oral and written communication skills in an appropriate context for applied biostatisticians, and to teach a structured approach to thinking that enables students to become lifelong learners in biology, study design, and the application of statistics to biomedical research. Critical evaluation of medical literature is the method used to teach biology and communication. Exercises are constructivist in nature, designed to be hands-on and encourage reflection through writing and oral communication. A single disease area (cancer) provides a motivating example to: 1) introduce students to the most commonly used study designs in medical and public health research, 2) illustrate how study design is used to address questions about human biology and disease, 3) teach basic biological concepts necessary for a successful career in biostatistics, and 4) train students to read and critically evaluate publications in peer-reviewed journals. We describe the design and features of the course, the intended audience, and provide detailed examples for instructors interested in designing similar courses.


Introduction
The successful practice of biostatistics requires a unique intersection of skills, including analytical skills, biological knowledge, and effective oral and written communication skills . The primary focus of most graduate level training programs in biostatistics is on the analytical content, and with good reason. Indeed, biostatisticians cannot be effective without a well-developed analytical skillset. On the other hand, biostatisticians who have successfully integrated with multidisciplinary scientific teams realize that having biological knowledge and being able to effectively communicate or translate between statisticians and non-statistician scientists are also pre-requisites for effective collaboration (Zapf et al., 2019). In fact, one of the primary competencies required for collaborative biostatisticians is defined to be "Clinical and Domain Knowledge", a broad understanding of specific and relevant biomedical areas (Pomann et al., 2020). Given the importance of these skills to the success of future collaborative biostatisticians, university faculty who design biostatistics curricula often debate two key questions: 1) how much biology should we teach our students, and 2) how should we help our students communicate better?
Through our experience teaching biostatistics to master's degree students, the majority of whom join the workforce after graduation, we have come to a different point of view regarding these prototypical questions. Recently, we posited a model of biostatistical practice that conceptualizes the biostatistician as a life-long learner who is equipped not with some critical mass of encyclopedic knowledge, but who is an expert at connecting their domain-specific knowledge to new problems in the biological sciences through the establishment of a mental map; i.e., an approach to thinking (Troy et al., 2021). Our philosophy, therefore, is not to teach students "all they need to know" but rather to teach them how to gain knowledge efficiently and effectively in the future. teaches how biostatisticians approach problems and navigate through the collaborative process (Samsa, 2018). The course is designed to use cancer as the primary application but employs methods that teach skills so they can be applied to any problem in the biomedical sciences. Importantly, the course uses critical evaluation of medical literature as a method for teaching biology and communication. This method is consistent with constructivist principles in that it follows a "see-one and do-one" approach and encourages reflection. The reflection is purposefully designed to be both written and oral to provide practice at communication, contextualized for the biomedical sciences.
This article describes the goals of the course and the student audience, gives an overview of the course design, and reviews the prominent features of the course. Specific examples are given in the appendix for instructors who are interested in developing similar courses. Finally, we share student feedback from end-of-semester surveys and discuss opportunities for expanding the course in the future.

Course Goals
Our primary goal for this course was to make the connection between biological science and statistics more explicit. Our secondary goals were to teach oral and written communication skills in an appropriate context for students who would be future applied biostatisticians, and to teach students a structured approach to thinking that enables them to become self-learners in biology, research study design, and the application of statistics to answer research questions in the biomedical sciences. The course is the first in a two-part, required sequence in the practice of biostatistics that students complete in their first year. The second course in the sequence covers the practical aspects of advising investigators on study design, writing analysis plans, conducting analyses, and reporting results.

Student Population
The course was designed for our two-year Master of Biostatistics program, which is housed within the School of Medicine and has the primary objective of training students to enter the workforce as highly effective collaborative biostatisticians. A secondary objective of the program is to prepare students who are interested in pursuing a PhD in Biostatistics or another related quantitative science. Historically, about 25% of students who complete our master's program will enter PhD programs after graduation. Admission to our master's program requires, among other things, a minimum of two semesters of calculus and one semester of linear algebra. Most of the students in this program are moving directly from undergraduate to graduate training, although some enter with a few years of work experience. We typically admit students who have undergraduate degrees in subjects like mathematics, biology, and public health. Students have the option of electing one of three specialty tracks in their second year: biomedical data science, mathematical statistics, or clinical and translational research. Our intent was to create a course that would be informative for students interested in any of these areas, and that would be positioned in the curriculum in the first year before students choose a specialty track.

Course Design
The course design process started in November 2017 and extended through July 2018, with the first iteration of the course offered in the Fall of 2018. The course design was accomplished by committee, led by the Director of Graduate Studies (DGS). Two active, collaborative researchers were selected as co-instructors for the course, one with expertise in biology and the other with expertise in biostatistics and research study design. The co-instructors were responsible for drafting a curriculum with input from the DGS and other, advisory committee members who represented expertise in specific areas of biostatistics that the curriculum was likely to cover, e.g., clinical trials. Committee meetings were held one to two times per month. Agreement on the course curriculum and instructional approach were arrived at through an informal, iterative consensus process.
The course consists of two core components: a 3-credit classroom experience and a required discussion section (see Appendix A for the course syllabus, and Appendix B for an example discussion section). The over-arching theme of the course is to explicitly link statistical methods to the biological basis for research. The approach is to introduce students to important concepts in research study design and biology using concrete examples from peer-reviewed literature using hands-on exercises that encourage critical thought, discussion, and application. The course design committee made a deliberate decision to avoid creating a course that provides comprehensive education in biology per se, but instead teaches students how to learn about the connection between biological sciences and statistics. Toward that end, the course is oriented around a single disease area, head and neck cancer, which serves as a motivating example to: 1) introduce students to the most commonly used research study designs in medical and public health research, 2) illustrate how study design is used to address questions about human biology and disease, 3) teach basic biological concepts necessary for a successful career in biostatistics, and 4) train students to read and critically evaluate scientific research publications from peer-reviewed journals.

Why Cancer?
Cancer was selected as a motivating example because of its public health importance, with cancer being the second leading cause of death in the United States (US) behind heart disease with ~600,000 deaths from cancer expected in the US during 2020 (Siegel et al., 2020). Cancer also represents a numerical enigma. For example, less than 1% of the US population is diagnosed with cancer each year, qualifying it as a "rare" disease. Yet, men and women of all ages in the US are faced with a 1 in 2 to 1 in 3 chance of being diagnosed with cancer during their lifetime, and there are currently over 15 million cancer survivors in the US (The National Cancer Institute, 2020). Cancer is also biologically puzzling. Cancer is fundamentally a disease of dysregulated cell growth, but it's occurrence in different tissues yields vastly different phenotypes and prognoses for patients, and the molecular mechanisms of cancer are even more vast and complex than its varied presentation (Allison & Sledge, 2014). There are environmental causes of cancer as well as endogenous causes (Ames et al., 1995). Cancer is typically a disease of old age, yet children are sometimes diagnosed (Bhakta et al., 2019). Some cancers are easily treated while others continue to present therapeutic challenges (Siegel et al., 2020). For all these reasons cancer is also a central organizing principle for understanding many central aspects of biomedicine, including molecular biology, genetics, cell biology, and evolution.
Head and neck cancers were selected because of the breadth of existing knowledge. Specifically, the disease has well-known etiology (Hashibe et al., 2009;Herrero et al., 2003), a well-researched molecular biology (Leemans et al., 2018), there are effective treatments (Lee et al., 2018), and excellent examples of all the major study designs exist in the literature. Because many students who take the course are unfamiliar with cancer, we ask students to read The Emperor of All Maladies (Mukherjee, 2010) as a gentle introduction to cancer from a layperson's perspective as part of their summer pre-orientation course, which they complete before taking this course (Neely et al., 2022). This book provides an excellent motivation for understanding the biological basis for research as it discusses many ground-breaking discoveries, as well as stunning failures, in the long history of cancer treatment. The first homework assignment in this course consists of open-ended questions about topics covered in the book that are intended to stimulate thinking about the link between biological sciences and statistical methods.

Teaching Students How to Think
The course materials are organized around a paradigm for evaluation of causal associations in medical research studies (Elwood, 2017). For example, this paradigm teaches students to evaluate medical evidence by first describing what they see (e.g., patient population, study design, stated hypothesis vs. tested hypothesis, and the main result) and then considering alternate explanations for the results (bias, confounding, etc.). In this way, students learn how to think deeply about medical evidence and avoid dogmatic approaches such as stereotyping clinical trial evidence "good" and evidence from observational studies as "bad". Under this paradigm of critical review, useful evidence can be generated by any study design and, likewise, any study design can be executed poorly.
The course provides students the opportunity to apply this "mental map" for evaluation of medical literature in various contexts, focusing on statistical thinking without doing much hands-on statistics (applied data analysis is covered in a second course the students are concurrently enrolled in). For example, concepts surrounding systematic error receive greater emphasis than random error, although the course includes a gentle conceptual introduction to the intersection of these two types of error in research studies. Finally, the approach is rooted in the use of published literature as examples. Importantly, both good and bad practices are highlighted in the examples.

Course Structure
The course is offered in the first semester of the two-year master's degree program. The classroom component requires active participation twice per week for 1.25 hours. Students are separated into small groups for the discussion sections, with each small group meeting every other week. The course follows the standard semester, which includes 13 weeks of instruction followed by a reading period and final exam. As shown in Figure 1, the course is composed of three integrated components: 1) research study designs commonly used in biomedical research, 2) biological concepts (essential biology, cancer biology, and specifics of head and neck cancers), and 3) critical review of medical literature. The content in each of these components is delivered in blocks that are integrated such that biological concepts are taught in the context of study designs appropriate for addressing questions related to the biology.

Figure 1. Design of a Course to Teach Biology and Communication to Graduate Students in Biostatistics Using
Head and Neck Squamous Cell Carcinoma as an Example For example, as shown in Figure 2 and described in detail in Appendix C, the fundamentals of clinical trial design are taught simultaneously with concepts around proteins and cell signaling pathways as drug targets. The drug cetuximab is used as an example, which targets the epidermal growth factor receptor (EGFR) pathway in head and neck cancers. After covering clinical trial design concepts and the scientific basis for EGFR as a drug target, students read and critically review the primary publication from the Phase III trial leading to approval of cetuximab as a treatment for head and neck cancer. Using this approach, we can explicitly teach students how and why an in-depth understanding of the biological mechanism impacts study design and application of statistical methods.
A mixture of learning methods is used, including traditional didactic instruction, flipped classroom, and discussion sections. The mixture of learning methods serves to address some of the primary goals of the course surrounding development of communications skills. For example, the flipped classroom activity for the clinical trials module requires students to discuss a problem in small groups, write a solution to the problem, and present that solution to the class. This exercise emphasizes the importance of oral communication within the students' peer group (i.e., other statisticians), to an external audience, and provides an opportunity to practice concise scientific writing.

Evaluation of Student Performance
Student evaluation is based on homework assignments, participation in the discussion section, and written exams. Homework assignments and exam questions are open-ended to encourage critical thinking and to allow students the opportunity to practice scientific writing skills (see Appendix D for an example). Students can work together but are asked to write answers in their own words. The penultimate class activity is a group project that requires students to apply what they've learned to critically review research related to another disease and biological mechanism that was not covered during the class. This assignment allows students to apply what they have learned about study design and biology to a new scientific area, but also gives them the opportunity for self-discovery as they learn about a new topic on their own. Numbers in the table are the number reporting being satisfied or very satisfied (numerator) over the number responding to the question (denominator) followed by the proportion in parentheses.

Student Feedback
a The class size was 28, 26, and 47 students in 2018, 2019, and 2020 respectively. The class size was expanded in 2020 in accordance with the program's growth objectives. The 2020 class participated entirely online due to COVID-19.
In accordance with standard departmental practice, students are surveyed at the end of the semester prior to taking the final exam. Each year we made incremental improvements in the course in response to student feedback, e.g., by adding more homework assignments, incorporating a mid-term exam in addition to a final exam, and re-organizing the course website. Table 1 shows responses to the core survey questions from the first three offerings of the course. Students are asked to respond to a series of questions on a 5-point Likert scale with the top two rankings being Satisfied or Very Satisfied. In general, student satisfaction is high and has improved markedly over time. The one area that we note for improvement relates to the discussion sections. While most survey respondents agreed that the discussion sections integrate well with the classroom component of the course, only ~2/3 of respondents reported satisfaction in 2019 and 2020 as compared to 80% reporting satisfaction with the discussion sections in 2018. The inaugural offering of the course in 2018 included 7 discussion topics whereas the course offerings in 2019 and 2020 included only 5 topics. One of the 2 discussions that were dropped was incorporated into the classroom component of the course, and the other (an exercise in which the students collaboratively designed a study during the discussion section) was dropped from the curriculum due to time constraints. This suggests that students may value some discussion topics more than others. In the future, we plan to ask for feedback with a series of targeted questions at the end of each discussion section in the future so that we might better calibrate the discussion topics to the needs of our students.

Summary
In this article we describe an innovative, graduate level course on biology and communications skills for biostatistics students. Our approach is highly contextualized, using critical review of medical literature in a well-researched area of human disease to teach core concepts and facilitate practice with oral and written communications. Importantly, our approach emphasizes how to think rather than what to know. We accomplish this by teaching students how to establish mental maps, or ways of thinking about problems using a set of common knowledge regarding study design and biology, combined with a paradigm for evaluating medical evidence. The course offers students the chance to learn through discussion and writing, which sharpens communication skills as they learn to think about complex scientific problems from the standpoint of a collaborative biostatistician.
Much has been written about the characteristics of successful collaborative biostatisticians that training programs should focus on (Begg & Vaughan, 2011;DeMets et al., 2006;Perkins et al., 2016;Van Steen et al., 2001). Nearly all emphasize the importance of excellent written and oral communication skills, and many have emphasized the need to be conversant in the clinical or biological domain within which the collaborative biostatistician is working (Pomann et al., 2020). Our course is a concrete example of how these core competencies can be taught. Although we offer the course to master's degree students, it could also be adapted to PhD programs with more depth to the content, e.g., by adding coverage of mathematical or theoretical foundations, or conducting computer simulations. Based on student feedback, our next steps in developing this course include upgrading the discussion sessions. Our next steps in overall curriculum development center around applying lessons learned in delivering this course throughout the curriculum.
In summary, the core focus of our course is to create lifelong learners who, when presented with unfamiliar scenarios will rely naturally on their critical thinking and communications skills to help them apply appropriate statistical methods for a given biological problem.

Appendix A: Course Syllabus
The following syllabus was used for the Fall 2020 course which was held during the COVID-19 pandemic. University and departmental policy required all learning to be remote and so the syllabus includes instructions for participation in live class sessions using Zoom.

Course Overview
This course will introduce biostatistics students to important concepts in research study design and biology using concrete examples of research studies published in peer-reviewed literature. The course uses head and neck cancer as a motivating example to: 1) Introduce students to the most commonly used research study designs in medical and public health research; 2) Illustrate how study design is used to address questions about human biology and disease; 3) Teach basic biological concepts necessary for a successful career in biostatistics; and 4) Train students to read and critically evaluate scientific research publications from peer-reviewed journals.

Class Format and Instructions for Participation
The course consists of a lecture and discussion lection. The lecture meets twice per week and each student attends a discussion section every other week. Students are assigned to a discussion section by the instructors. All lectures and discussion sections will be held virtually via Zoom at scheduled times. Attendance at all sessions is required unless extenuating circumstances prevent active participation. All sessions will be recorded. Use the course schedule (below) as a guide. Links are provided to the Sakai site where all course materials are kept.
Note that the class is divided into four discussion sections such each student attends a discussion every other week. You will either have a Tuesday afternoon section or a Friday morning section. Assignments to Tuesday or Friday are based on your time zone. Instructions for each discussion will be posted on Sakai in advance. You will have to spend some time preparing for each discussion, usually by reading an article and/or answering some questions. Everyone is expected to participate in the discussion, and participation is part of your course grade.
Most of the class sessions are didactic. A flipped classroom approach will be used for some sessions, particularly those that cover the example study designs -cohort, clinical trial and case-control studies. For these sessions you will have to do some work outside of class to prepare for an in-class discussion. In addition, during the last few weeks of class, we will switch from a didactic lecture format to an active learning exercise in which you work with your classmates to complete a case study and present the results to the class.

Homework Assignments
There will be 4 homework assignments. In general, the homework assignments will cover the topics presented in the 2-3 lectures preceding the date the homework is assigned. The only exception to this is the third homework, administered approximately in the middle of the semester, which will serve as a cumulative mid-semester review that is weighted more heavily than the other 3 homework assignments. You may work together on all homework assignments and use the textbook or online resources to answer the questions. However, you must write your answers in your own words. Do not copy or quote phrases from the textbook or any other resource.

Final Exam
The final exam for this course will be a take-home exam. You may work with your fellow students on this exam but the answers you turn in must be in your own words. Although you may use textbooks or online resources to answer the exam questions, you must write your answers in your own words. As with the homework assignments, do not copy or quote phrases from the textbook or any other resource. Recommended: The Cartoon Guide to Biology by Larry Gonick and David R Wessner (ISBN-10: 0062398652). Yes we are serious! This book is excellent background reading for all of the basic biology topics we will cover. Unfortunately, the The Cartoon Guide to Head and Neck Cancer is not out yet so we will depend on other sources for cancer specific readings. We will recommend specific sections of this book to accompany most of the biology lectures. If you understand biology at the level that is covered in this book you should well in the biology portions of this course and the masters qualifying exam, and you will feel comfortable discussing the basic biology of biostatistical projects with collaborators who are biologists and physicians. If you took an introductory biology course as an undergraduate, you might not need this book.

Evaluation:
Your course grade will be determined from the following components.

Appendix C: Example Module on Clinical Trials
The following outline relates to Weeks 6-8 in the example course schedule. The content covers an introduction to clinical trials in the context of drug development for treatment of head and neck cancer. The example uses cetuximab, an FDA-approved drug for treatment of head and neck cancer, as a tool for teaching students how clinical trials are designed to test biological hypotheses. There are two lectures each on clinical trial design and biology, as well as a flipped classroom session that requires students to read and discuss the published report of a Phase III trial of cetuximab.
After completing this module of the course students will be able to demonstrate the following knowledge related to clinical trials and cellular/molecular biology: 1. Discuss the regulatory environment for drug development in the United States.
2. Describe the typical phases of drug/device development.
3. Understand the commonly used parallel group clinical trial design.
4. Know the difference between objectives, endpoints, and outcome measures.
5. Define blinding (or masking) and understand why it is needed in clinical trials.
6. Understand the purposes of randomization in the context of treatment selection bias.
7. Describe how blocked and stratified randomization work, and why they are used.
8. Discuss how randomization might be implemented in a real study.
9. Know the difference between intention-to-treat and per-protocol analyses.
10. Discuss the motivations for addressing Type I error in trials and identify the pros and cons of some common approaches to Type I error control in trials with multiple endpoints.
11. Explain the roles of DNA, RNA, and proteins in cells, and how information flows among these macromolecules.

12.
Give examples of some proteins and what they do.

Explain what a receptor is, what a ligand is, and their roles in cancer
14. Explain what Cetuximab is and how it works to treat cancer The following readings are assigned for this module. Students may complete these at their own pace but are encouraged to read in parallel with the delivery of the course content.

Clinical Trials, Day 2: Outline
Important Topics in the Design and Analysis of Clinical Trials  Giralt Jordi, et al. N Engl J Med 2006;354:567-578) and watch a short video prepared by the instructors. The video is intended to guide the students through some of the more difficult parts of the reading assignment, e.g., by explaining in more detail what immunohistochemistry is and how it works. Slides are provided along with the video.
Students are given the following questions to review in advance of the class. Students are told that when they arrive for the class session they will be randomly divided into the breakout rooms. Students are given 20 minutes to discuss and write answers for their assigned questions. During the activity instructors travel between the breakout rooms to provide assistance and guide the discussion as necessary. At the end of the 20-minute period the class reconvenes, and students present the answers to the questions in front of the class. Some students nominate a single person from their breakout room to make the presentation, and in other cases multiple students present answers. The choice of how this is done is left entirely to the students. Breakout Room 4. The locoregional control endpoint was evaluated on an intention to treat basis.

Questions for the Breakout Rooms
a. Explain what "intention to treat" means b. How many patients were included in this analysis, and of that number how many followed the protocol with respect to the use of radiation and cetuximab?
c. Why did the authors not focus the analysis only on those patients who complied perfectly with the protocol?
Breakout Room 5. Answer the following questions about the results for the locoregional control outcome.
a. Describe the result presented in Figure 1. Give both a description of what you see in the graph and an interpretation of the hazard ratio and 95% confidence interval discussed in the footnotes to the graph.
b. The authors give a long description of compliance with the assigned treatments in the Results section and in the Discussion the authors say the following: "The superiority of the cetuximab-plus-radiotherapy regimen we used cannot be attributed to underperformance in the radiotherapy group." Explain what this means and why it is important to mention.
c. Patients were excluded from this study if they previously had cancer, received chemotherapy within the preceding three years, or had previously received surgery or radiotherapy for head and neck cancer. Why did the investigators decide to exclude such patients?
d. Briefly discuss internal and external validity of the results in terms of participant selection. Specifically, say whether and why you think the results apply to each of the following populations: • Everyone who joined the study a. Write a brief summary of the primary result of the study, including an interpretation of the relative risk and 95% confidence interval.
b. Fill in the following diagram depicting the selection of subjects for this study.

Enrolled (write the number of people)
Early Development 6. (2 points) The following refers to the ice hockey study described in question 5. Assume 10/25 participants who were lost to follow-up had resolution of concussion symptoms at Day 30 (4/10 of the losses from the Early group, and 6/15 losses from the Late group). What is this pattern of loss-to-follow-up called, and would you expect it to cause bias in the RR? If bias is present, in what direction would the bias be with respect to the value of the RR under the null hypothesis? Hint: try drawing a diagram similar to the ones we used during the lecture to visualize where participants drop out during follow-up.