A Comparison of Selected Assessment Results before and during the COVID-19 Pandemic in One University in South Africa

The study examined the impact of coursework-only assessment, as made necessary at the onset of the COVID-19 pandemic, adopting a quantitative research approach with 1013 students. The data obtained were analysed using SPSS version 27.0 to obtain descriptive and inferential statistics. The results revealed significant differences between the 2019and 2020 marks for the same courses. In two of the science courses (T2 and T3), the mean scores for 2019 were significantly higher than the mean scores for 2020. In the mathematics course, the 2020 marks were significantly higher than the marks for 2019. While a normal distribution was assumed for the science courses, the mathematic course showed marks that were skewed to the right. A higher number of distinctions in the F1 course and a significant decline in the mean scores for T1 and T2 implies that there is a need for professional development of lecturers teaching in the online space. It is, therefore, recommended that higher education lecturers need adequate professional development on setting and administering online assessments. The assessment should test adequate lowerand higher-order cognitive skills for sufficient testing of student knowledge during online assessments. Furthermore, a variety of assessment methods and a diversity of tasks may be used to ensure the reliability of the assessment outcomes.


Introduction
The onset of the COVID-19 pandemic brought several challenges to the teaching and learning fraternity. The coronavirus pandemic brought abrupt changes in instructional practices as well as assessment methods. Guangul, Suhail, Khalit, and Khidhir (2020) observed challenges of academic dishonesty, coverage of learning outcomes and lack of commitment by students to submit assessment tasks. Mafugu (2020) noted challenges of poor attendance as well as network challenges resulting in posting PowerPoint presentations with audio. However, monitoring the learning process after posting the audio was not possible. Poor attendance and failure to interact with the learning material has a significant impact on the assessment outcome.
Assessment is a process that follows careful planning and teaching; a process to evaluate the student's performance. It is an orderly process that is done during teaching, at the end of a chapter, end of a term and end of a year. According to Black and William (1998), assessment refers to activities undertaken by both teachers and students that provide feedback that can be used for student learning, and to modify teaching. Assessments are usually grade-based and they include exams, portfolios, final projects, and standardised tests.
Two broad groups of assessment include formative and summative assessment. Formative assessments give insights into student learning and shape the learning process (Kampen, 2020). Formative assessments also help teachers to meet the learning needs of students and to provide remedial measures for those who are struggling with content comprehension. It also gives an indication whether what the student has learnt, is in line with the curriculum (Kampen, 2020). Formative assessment is done to check learners' understanding of concepts, while summative assessment can be used to report learners' performance, and to progress students into the next level (LaDue, Libarkin, & Thomas, 2015). In addition, formative assessments provide essential feedback to improve one's instructional strategy, and help to teach critical thinking skills as well as problem-solving skills (Kampen, 2020).
In a study by Bakhru and Mehta (2020), students affirmed that group work assignment and presentations assisted them in understanding the topic because they were able to get effective corrections and guidance during the presentation. Students also echoed that the presentations assisted them in developing confidence. Presentations and online-based computer testing can help to develop a better learning environment at institutions of higher learning (Bakhru & Mehta, 2020). According to several studies Calvo, 2004 and2006;Ellis, Goodyear, Calvo & Prosser, 2008), learning through discussion deepens understanding of concepts.
Even when instruction is planned with great care, delivered effectively, and in a way that engages students, the learning outcomes often bear little or no relation to what was intended. According to (William, 2011), careful planning of the lesson delivery strategy, effective lesson delivery, and student engagement, does not guarantee the total grasp of all the concepts taught. Students comprehend the concepts at different rates, hence, the need for assessment. Assessment will assist in identifying areas that have not been understood by students, and assist in identifying misconceptions and areas that need remediation (Bakhru & Mehta, 2020;Ghaica, 2016).
The study is underpinned by the social cognitive theory which views learning as an active process of knowledge construction based on social interaction and the environment (Kay & Kibble, 2016). In the current study the construction of knowledge occurs through the interaction among the lecturer, students, and the environment (including learning platforms). During the learning process, Murray (2019) views assessment as a key pillar of evaluating the effectiveness of the pedagogical instructional strategies and learners' comprehension of the content. The interactions and feedback on the individual assessment activities facilitate the learning process. According to Ghaica (2016) pedagogical instruction and assessment are complementary processes that are critical in providing knowledge and support. Continuous assessment which involves both formative and summative assessment can be used to get feedback necessary to track students at risk, monitoring progress, modifying pedagogical practices and certification (Ghaica, 2016). Assessment assists the lecturer to reflect and engage in student-lecturer discussion on how to improve teaching and learning and achievement scores. According to Jimaa (2011), the methods used to assess students influence the learning process. The assessment methods influence how learners study the content in preparation for the assessment.
The environmental factors that influence the pedagogical practices and assessment methods include the availability of training for the online platforms used, contextual constraints for the organisation, lecturers and students (Tinoca, Oliveira & Pereira, 2014). The constraints may include availability of resources such as laptops, internet, electricity, class size, and other key result areas in which the lecturer is expected to perform by the institution. Palm (2008) and Pereira, Tinoca and Oliveira (2010) indicate the need to align the assessment design with competencies intended to be developed and the learning situations experienced by the student during the teaching process. Furthermore, Dierick and Dochy (2001), and Herrington and Herrington (1998) emphasise the need to employ a variety of assessment methods and diversity of tasks to ensure the reliability of the assessment outcome.
When setting assessments, it is important to consider Bloom's taxonomy of cognitive domains to ensure that learning outcomes are aligned with the assessment's methods. The revised Bloom's taxonomy has four knowledge dimensions namely factual, conceptual, procedural, and metacognitive; and six cognitive process dimensions as: to remember, understand, apply, analyse, evaluate, and create (Krathwohl, 2002;Dalton, 2003). Bloom's taxonomy include lower order cognitive skills development, which require simple recall and a minimum level of understanding (Jideani & Jideani, 2012). The lower order cognitive skills focus on understanding and remembering facts, concepts, procedures, and metacognitive strategies. The revised Bloom's taxonomy also focuses on higher-order skills, where students are tested on their ability to apply the knowledge gained to analyse-, evaluate-, and create data. The higher-order cognitive skills require deep conceptual understanding, and therefore, most students struggle to perform well in the latter (Crowe, Dirks, & Wenderoth, 2008). Tests must focus on both lowerand higher-order cognitive skills. For students to perform well in higher-order cognitive skills, they must be afforded enough practice to develop a deep conceptual understanding of the material (Crowe, Dirks, & Wenderoth, 2008;Jideani & Jideani, 2012).
According to LaDue et al. (2015), scientific and engineering practices emphasise the importance of developing and using models, analysing and interpreting data, using mathematics and computational thinking, and obtaining, evaluating, and communicating information. LaDue et al. (2015) also claim that assessment in science should incorporate disciplinary core ideas, cross cutting concepts, and scientific and engineering practices. Assessment should contain multiple components to adequately assess the standards of scientific knowledge and process skills components. Visual representations such as graphs and tables are highly rated in science education (LaDue, Libarkin, & Thomas, 2015). Due to the value ascribed to visuals, they often form part of science assessment activities. The ACT science tests provide a good example of science tests because they measure science process skills, which include higher cognitive order skills data interpretation, analysis, evaluation, and problem solving (ACT, 2021). The 40-question ACT science test measures science process skills, which include data interpretation, analysis, evaluation, reasoning, and problem solving (ACT, 2021 Previous studies focused on assessment where examinations were written under controlled conditions, with invigilators monitoring to ensure that there are no reference books or access to the internet. The current study explores a unique and unusual situation where students complete assessments with all books, notes, and internet sources available for reference, with no physical invigilation. This requires new questioning techniques to ensure that the desired learning outcomes are achieved, and that learning is reflected by the achievement scores. For many lecturers in tertiary institutions, this was not an effortless process.
The study, therefore, intended to answer the research question: How do the achievement scores of selected undergraduate courses in which there were no final examinations during the COVID-19 period in 2020 compare with the results of the pre-COVID-19 period in 2019?
The following hypotheses were tested at 5% probability levels: 1) H 0 : There is no significant difference in the student mean achievement scores of T1 course between 2019 when both coursework and examination were written and 2020 when only coursework was considered.
2) H 0 : There is no significant difference in the student mean achievement scores of T2 course between 2019 when both coursework and examination were written and 2020 when only coursework was considered.
3) H 0 : There is no significant difference in the student mean achievement scores of T3 course between 2019 when both coursework and examination were written and 2020 when only coursework was considered.
4) H 0 : There is no significant difference in the student mean achievement scores of F1 course between 2019 when both coursework and examination were written and 2020 when only coursework was considered.

Research Methodology
The study was underpinned by a quantitative research approach. The marks compared were only for lecturers who gave consent for the marks to be analysed. Consent was sought from two lecturers; one with three science courses, and one with one mathematics course.
The final year marks for three science courses (T1, T2 & T3) and one mathematics course (F1) for 2019 and 2020 were downloaded onto an excel sheet from the gradebook. The marks were transferred into SPSS version 27.0. The marks were categorised into the following sections: 0-39; 40-49; 50-59; 60-69; 70-74; 75-100. The frequencies of marks in each category were obtained in a table in the SPSS output (Table 2). Furthermore, descriptive statistics (mean and standard deviation, Table 3) were obtained for each set of results for 2019 and 2020.
Inferential statistics: The Independent sample t-test was used to determine if the difference in the mean scores for 2019 and 2020 were significantly different. This was only done for the set of data that met the condition of normality after the Kolmogorov-Smirnov test. For the set of data where the condition for normality was not met (F1), the non-parametric independent samples t-test (Mann-Whitney U test) was used in the comparison. Levene's test for equality of variances was also performed using SPSS. Levene's test for equality of variances was used to determine whether equality of variance should be assumed or not, in the independent samples t-test. Where the Levene's test indicated no significant difference (p > 0.05), equal variance was assumed in the independent samples t-test, while equal variance was not assumed where the Levene's test indicated significant difference (p < 0.05). All tests were conducted at 5% significant level.
The researchers obtained ethical clearance from the University Ethical Clearance Committe, and the lecturers who taught the modules from 2019 to 2020 gave informed consent. The course data were coded T1, T2, T3 and F1 to ensure anonymity and confidentiality.

Results
This section presents the frequencies of marks that were obtained in four courses during 2019 and 2020. In 2019, the students were assessed using the coursework marks obtained through mini-assessment tasks during the term, and an examination that students wrote at the end of the semester. The coursework marks contributed 50% to the final mark, with the other 50% reflecting the final semester examination. In 2020, only coursework marks were considered. The COVID-19 pandemic's restrictions required social distance to minimise the spread of the coronavirus, and therefore dictated that students could not write venue-based examinations with physical invigilation.  (100) 260 (100) The values in brackets (…) represent %.
The majority the students for T1, 59(50.4%) and 71(56.6%) in 2020 and 2019 respectively, obtained marks between 60-69, while the majority 128(48.3%) in F1 course obtained marks between 60-69 in 2019 with a few students in the extremes (Table 2). In 2019 most of the students (eleven, therefore 68.8%) of the class undertaking T2, had marks above 70%. For T3, the majority of the students had marks between 50-69 in 2020 and 2019, with few students in the extremes. In F1, the majority of students obtained distinctions (marks above 75%). The absence of examinations may have contributed to the reflected performance difference. For T1 and F1, the mean marks for 2020 were higher than the mean for 2019 (Table 3). However, in courses T2 and T3, the mean marks for 2019 were higher compared to the marks for 2020, where the number of learners was more than double. Levene's test for equality of variances indicates that the difference in variance for the T1 2019 and 2020 marks was significant (Table 4). Equal variance was not assumed in the independent sample t-test at 5 % level of significance. The was no significant difference in student mean marks for T1, t(209) = 1.12, p = 0.26 despite students in 2020 (M = 62.85, SD 12.06) attaining higher scores than 2019 students (M = 61.34, SD = 8.60). Levene's test for equality of variances indicates that the difference in variance for the T2 2019 and 2020 marks was not significant (p = 0.26) at 5 % level of significance, hence, equal variance was assumed in the independent sample t-test (Table 5). The 16 students for T2 in 2019 (M = 71.38, SD = 9.32) compared to 42 students in 2020 (M = 60.81, SD 13.51) obtained significantly higher scores, t(56) = -2.87, p = 0.006. Similarly, Levene's test for equality of variances indicate that the difference in variance for the T3 2019 and 2020 marks was not significant (p =0.90) at 5 % level of significance, hence, equal variance was assumed in the independent sample t-test (Table 6). The 60 students for T2 in 2019 (M = 65.07, SD = 11.88) compared to 133 students in 2020 (M = 59.16, SD 10.89 obtained significantly higher scores, t(191) = -3.39, p = 0.00. The achievement scores for 2020 (Mdn = 359.3) were higher than those for 2019 (Mdn = 159.3). A Mann-Whitney test indicated that this difference was statistically significant, U(N 2020mark = 260, N 2019mark = 260) = 7478.00, z = -15.24, p < 0.00 (Table 7).

Discussion
Both 2019 and 2020 results for the T1 to T3 courses and FI course results for 2019 indicate clear discrimination and a good distribution of students' marks with few student scores in the extremes, and the majority (60-69%) of the scores clustered around the mean. T2 results for 2019 and 2020 show a similar trend with the highest frequencies recorded in the categories (70-74%) and (50-59%) respectively. Attainment of a normal distribution curve in students' performance scores, according to Krathwohl (2002), can be achieved when the assessment tasks are carefully designed to include discriminatory activities or questions that focus on judging, critiquing, constructing, designing, and hypothesising ( Figure 1). In designing assessments, it is essential to focus on cognitive process dimensions. The diagram ( Figure 1) can assist one in designing appropriate questions that fall into specific cognitive process dimensions. Figure 1. The cognitive process dimensions (Krathwohl, 2002) The pattern of the T1, T2 and T3 results reflect that the learners were heterogeneous in ability and the assessment activities adequately discriminated the gifted students from the average students in both 2019 and 2020. The pattern of results displayed by both 2019 and 2020 T1 and T3 courses and F1 course results for 2019, could also be explained by possible good alignment of the assessment design with competences intended to be developed, and the learning situations experienced by the student during the teaching process (Palm, 2008;Pereira, Tinoca & Oliveira, 2010).The other dimension that could explain the results, according to Dierick and Dochy (2001) and Herrington and Herrington (1998), was a possible use of a variety of assessment methods and diversity of tasks to ensure the reliability of the assessment outcome.
The significantly higher mean scores of the 2019 T2 and T3 courses than similar courses in 2020 could have been attributed by the challenges in the novel online teaching and assessment strategies that were abruptly introduced in the tertiary institutions. Guangul et al. (2020) observed challenges of coverage of learning outcomes and lack of commitment by students to submit assessment tasks. Mafugu (2020) noted challenges of poor attendance as well as network challenges which negatively affect assessment outcomes. The onset of the pandemic could have resulted in limited interaction due to challenges such as network. According to the social cognitive theory, learning is more effective when learners are actively involved in the process of knowledge construction through peer-peer interaction or student-lecture interaction as well as interaction with the environment (Kay & Kibble, 2016;Mafugu, 2021). Li and Konsantoppoulos, (2017) observed no significant effect of class size on student performance. Hence, we can assume that the class size had no effect on performance. In the course F1, results for 2020 student scores show the majority 114(43.8%) of students in the 75-100% category in a course with a total of 260 students. Furthermore, the scores for F1 in 2020 were significantly higher than the scores for 2019. The 2020 F1 results indicate a lack of adequate discrimination questions, resulting in all students' final scores clustered in the upper extreme. This could have been the effect of the rapid change to online, making it difficult for some lecturers to produce relevant strategies that allow adequate discrimination of student scores. The skewed results could have been a resulting lack of higher-order questions that demanded application of knowledge under conditions, where students have access to all resources. According to Crowe, Dirks and Wenderoth (2008), the higher-order cognitive skills require deep conceptual understanding; hence most students struggle to perform well. According to Rush, Rankin and White (2016), an increase in item complexity increases the index of difficulty and discrimination value of the item. Guangul et al. (2020) observed challenges of academic dishonesty among students when writing assessments where there is no one to monitor any form of cheating. In the F1 course, the assessor who was used to giving a final year examination that would be written under strict conditions, possibly had difficulties for setting challenging activities that were written by students who were not monitored. Lemons and Lemons (2013) highlight that it is important to consider Bloom's Taxonomy, as well as the amount of the time required to complete the task and the student experience, in practising similar assessment tasks to determine the index of question difficulty (Figure 2). Higher-order questions may require more time to complete, while increased student experience with a particular type of assessment makes it easier for students to complete it (Kotovsky, Hayes, & Simon, 1985;Sweller & Chandler, 1994). This study further adds another principal factor to consider when conducting assessments online: the availability of resources. A question that is classified under higher-order critical skills may be easy when students have access to resources where they can get examples that guide them to the answer. Such guidelines may be on the internet, or in other reference resources. Figure 2. Modified Framework for research showing how one can conceptualise questions intended to assess higher-order critical skills during online teaching and learning. Adapted from: (Lemons & Lemons, 2013) An understanding of the subject-specific skills is also essential and should be considered during the process of assessment. For example, in science subjects, specific skills that are linked to the subjects should be assessed. The assessment should include application of knowledge, data analysis and interpretation, converting one form of data into another, and communication, which are important during online teaching and learning, as they assist in discriminating the average students from the gifted. This is in line with LaDue, Libarkin and Thomas (2015) who emphasise that scientific practices should highlight the importance of developing and using models, analysing, and interpreting data, using mathematics and computational thinking, and obtaining, evaluating, and communicating information. Ensuring that lecturers understand the critical skills will assist in setting questions that meet the subject standards. Higher order questions provide students the opportunity to develop one of the 21st century skills, namely critical thinking (Tyas, Nurkamto, Marmanto, & Laksani, 2019). This, in turn, results in ensuring that the students have developed the relevant skills needed in the job market.

Implications for Teaching and Professional Development
Findings presented in this study have implications for designing teaching activities and assessment tasks suitable when the face-to-face examinations are not possible. Assessment must include higher-order cognitive skills. Bloom's taxonomy, the amount of time required to complete the task, students' experience with the particular type of assessment, as well as the availability of resources must be considered when determining the difficulty index of a question. Assessment should discriminate gifted students from mediocre students by including discriminatory activities or questions that focus on judging, critiquing, constructing, and designing. A well-set formal task should produce a normal distribution curve with two extremes, with the majority getting closer to the mean mark. This rule is particularly important for large classes. It is critical to ensure that lecturers are aware of Bloom's taxonomy, as well as other factors that can be used to evaluate whether a question elicits higher-order cognitive skills. More research needs to focus on the type of questions that are administered by lecturers when students are assessed under conditions where they have access to all resources, and lack the test or examination type-controlled conditions.

Limitations of the Study Results
The assessments made in 2020 and 2019 were of a different nature due to the COVID-19 pandemic. While there were examinations and continuous assessment marks in 2019, there were no examination in 2020 due to the pandemic that called for social distancing worldwide. However, the comparison was facilitated by the need to see if the lack of examinations did not result in either exaggerated high marks, or too low in student performance. Another limitation is that the study did not focus on the specific type of questions-, and the knowledge dimensions and Factors to consider when evaluating higher-order cognitive skills questions.

Questions intended to assess Higher Order
Cognitive Skills

Difficulty
Bloom's Taxonomy Access to resources e.g. internet, textbook, lecture notes