Development of Foreign Language Lesson Satisfaction Scale (FLSS): Validity and Reliability Study

In this research it is aimed to develop an instrument that could be used to measure university students' satisfaction with foreign language lessons in a valid and reliable manner. The research was conducted on three separate study groups consisting of 460 students in the spring semester of the 2017-2018 academic year. In the research, firstly, an expert opinion was applied for the content and face validity of the scale prepared in accordance with literature review, student and expert opinions. In the final form, exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were applied. The EFA yielded a construct that consisted of 28 items and three factors that explained the 59.97% of the total variance. These factors were named as follows: the Curriculum, Teaching Staff and Physical Conditions. CFA was performed in the final application phase and the fit indexes were determined to be acceptable. For the criterion validity, the correlations of the factors with the total points of the scale were calculated and a high level of meaningful correlation was observed between these values. Internal consistency and split half test values for the whole scale show that the scale is highly reliable. Based on these findings, it can be said that this scale is a valid and reliable measurement tool for university students to measure their satisfaction with foreign language lessons.

• Self-confidence: Positive self-perception is very effective in learning foreign languages.
• Anxiety and Satisfaction: While high or low anxiety can cause negative effects on foreign language learning, control of anxiety level and level of satisfaction in language learning can positively contribute to foreign language learning.
As Krashen (2009) has mentioned, the concept of satisfaction has an important place in the education process. The concept of satisfaction can be explained as the meeting the perception and expectations that occur in individuals as a result of any goods or services offered (Oliver, 1999, p. 34;Robins & Coulter, 2009;Robinson, Decenzo & Coulter, 2011). According to Alsadoon (2018, p. 226), Cobb (2009, p. 242), Reio and Crim, (2013, p. 123) the level of satisfaction of learners has a key role in learning. In other words, student satisfaction is an important part of the self-assessment of educational institutions (Bengi & Özberk, 2017, p. 75;Saydan, 2008, p. 65).
Foreign language education, which starts from primary school to graduate level, is compulsory in two semesters with the following article of the law (yok.gov.tr/documents).
In Higher Education Law No. 2547 dated 6/11/1981 chapter 2 article 5 in (ı) clause: ı) (Amended: 29/5/1991 -3747/1 article.) In higher education institutions, Atatürk's principles and history of revolution, Turkish language, foreign language are compulsory courses. Also, one of the lessons in physical education or Fine Arts is taught, but not compulsory. All these lessons are programmed and applied for at least two semesters.
Student satisfaction is the sum of the views of the students about the teaching process (Elliott & Healy, 2001, p. 2;Elliot & Shin, 2002, p. 198;Grossman, 1999, p. 49). The importance of learner satisfaction on learning has become a topic that is more and more focused on day by day (Baykal, Sökmen, Korkmaz & Akgün, 2002, p. 24). For this reason, it was aimed to determine the level of satisfaction of the students who took the Foreign Language lesson in the universities and it was deemed necessary to develop the "Foreign Language Lesson Satisfaction Scale".

Study Group
In this study, the study group in which data were obtained during the pre-application and final application phases was selected among different students. The Study Group consists of students studying at a State University vocational school in Turkey. The data used in the study were collected during spring semester of 2017 -2018 academic year. 150 students were reached in the pre-application of the research and in the final application 280 students were reached, which is 10 times the number of scale items. In both applications, students were given brief information about the research subject and awareness was created before the data was collected. The data collection tool (Foreign Language Satisfaction Scale-FLSS) applied to the study groups and the operations performed on the data obtained from each group are summarized in Table 1.

Data Collection Instrument
At this stage of the scale development study, variables that could affect student satisfaction were determined, similar scales were examined, students and experts were consulted. These variables have been collected under three sub-themes through the control of various educational science experts. These sub-themes are defined as the foreign language lesson curriculum (aim, subject matter, learning experiences, and evaluation), the foreign language lesson instructor and physical (hardware) conditions. 68 items of three sub-themes were created and after the expert opinions, the scale pre-application form was reduced to a total of 38 items in the final stage. The main reason for the reduction of the number of items is to ensure that participants can give as stable a response as possible without distracting them. The distribution of 38 items according to sub-themes was as follows: Foreign Language Lesson Curriculum = 13 items, Foreign Language Lesson Instructor = 16 items, Physical (Hardware) Conditions = 9 items.
Before the data collection, the scale, with the pre-application form, was sent to two foreign language and Turkish language instructors who had more than 10 years of work at the university and 2 educational sciences experts and necessary arrangements have been made. Finally, the items in the pre-application form were read to a class of 30 people and their intelligibility by the students was assessed. A 5-point Likert-type was chosen as the scale type [Strongly Agree (5), Agree (4), Somewhat Agree (3), Disagree (2), Strongly Disagree (1)]. With this method, the participant is presented with a series of expressions and is asked to select any of them (Erkuş, 2016, p. 78).

Data Analysis and Procedure
The suitability of the study data for the Exploratory Factor Analysis (EFA) can be explained by the Kaiser -Meyer -Olkin (KMO) and Barlett test (Büyüköztürk, 2006, p. 126;Tabachnick & Fidell, 2001). First, Barlett test with Kaiser -Meyer -Olkin (KMO) test was performed to determine the adequacy of sample size and compliance with EFA. Afterwards the EFA was performed for the construct validity and to identify the similarities between the variables. EFA can provide information to the researcher about the factors to be measured (Tavşancıl, 2006). In other words, the EFA is used to determine how many sub factors the variables in a measurement tool are aggregated and the relationship between them (Seçer, 2015, p. 78). In the presence of a positive relationship between the factors, the Direct Oblimin rotation technique is used in EFA (Büyüköztürk, 2002, p. 477;Costello & Osborne, 2005, p. 3;Tabachnick & Fidell, 2001). In this study, a relationship between the factors was determined and the Direct Oblimin rotation technique was used. Factor loadings in EFA evaluations are considered to be at least 0.30 (Büyüköztürk, 2004;Cathell & Baggaley, 1960;Neale & Liebert, 1980). For this reason, items with a factor load of less than 0.30 were removed from the scale. Necessary applications should be made on the factors with an eigenvalue greater than 1 in EFA (Tabachnick & Fidel, 2001). Therefore, factors with an eigenvalue less than 1 were not considered. Factors emerging as a result of factor analysis were then checked by the parallelism test. The Parallelism Test is based on a comparison of the eigenvalues of the data with the other eigenvalues obtained randomly (Pallant, 2017, p. 202). For this reason, the EFA results are also compared with the Monte Carlo Parallelism Test.
Confirmatory factor analysis (CFA) was performed to verify the data obtained as a result of the final application of the scale. When CFA is conducted, participants with the same characteristics as the pre-application but to different individuals have been reached. In order to verify the data obtained by EFA with CFA, it is necessary to calculate some basic values in order to verify the data obtained by EFA with CFA, it is necessary to calculate some basic values (normality, multicollinearity, sample size) (Kline, 2005;Tavşancıl, 2014, p. 51). For this reason, first, normality, multicollinearity and sample size analysis were performed. The values were tested with RMSEA, SRMR, GFI, AGFI, NFI, χ2 / sd, TLI and CFI compliance measures. In CFA, > 0.90 can be considered an acceptable value for CFI, GFI, AGFI, NFI and TLI, while > 0.95 is considered as an extremely good value. While,> 0.80 is considered as an acceptable value for SRMR and RMSEA, > 0.50 is considered an extremely good value. In addition, χ2 / sd value in CFA should be less than 5 (Cole, 1987;Hu & Bentler, 1999;Kline, 2011;Marcoulides & Schumacher, 2001;Özdamar, 2017;Schumacher & Lomax, 2004;Seçer, 2015, p. 98). Reference values of fit indexes are shown in Table 5. In order to verify the criterion validity of the scale, the total score of the scale and the correlation coefficients of the factors were analyzed.
In order to calculate the internal consistency of the scale, Cronbach Alpha coefficients of both the scale and the subscales were calculated. The Cronbach Alpha value is in the range of 0 to 1 and the closer this range is to 1, the higher the reliability and internal consistency of the scale. The evaluation intervals taken into consideration in the evaluation of Cronbach -Alpha coefficient are as follows: In addition, split half tests were conducted to determine the reliability of the scale. When the test is divided into split-half, it is assumed that both halves of the test are parallel. In other words, the mean and variance of the split-half are considered equal (Erkuş, Sünbül, Sünbül, Yormaz & Aşiret, 2017, p. 26). Statistical package programs were used in the analysis of data.

Results Related to Exploratory Factor Analysis (EFA)
The pre-application of the scale was carried out with a 68-item form in line with the expert and student opinions and with 38 items required for not to distract attention of the participants and three sub themes (Curriculum=13 items, Instructor=16 items and Physical Conditions= 9 items). Pre-application was applied to 150 randomly selected students who were studying in different departments at the Vocational School of a State University. The data were analyzed in the statistic software package. The Cronbach alpha value of the entire scale was calculated to determine Scale Pre-application reliability. The Cronbach Alpha coefficient is considered to be highly reliable in the range of 0,80 -1,00 (Özdamar, 1997, p.500). Pre-application Cronbach alpha value 0.97 indicates that the data obtained from pre-application of the scale is highly reliable. In the pre-application of the scale, the split half-test techniques were used to re-evaluate the reliability of the scale. The split half-test values are shown in Table 2.  Table 3. KMO values of 0.60 and above indicates that the sample size is sufficient (Büyüköztürk, 2006, p. 126;Tabachnick & Fidell, 2001). As shown in Factor loadings in EFA are considered to be at least 0.30 and multiple factorials are removed from the scale (Büyüköztürk, 2004;Cathell & Baggaley, 1960;Neale & Liebert, 1980). For this reason, the factor loads (item total test correlation) of less than 0.30 and multiple factorials were extracted from the scale by consulting the expert opinion. The total number of items subtracted as a result of the pre-application of the scale is 10. Factor analysis of the pre-application of the scale is shown in Table 3 and Scree-Plot Graph is shown in Figure 1.
Factor loads obtained as a result of EFA performed after the scale pre-application are divided into 3 factors as shown in Figure 1 and Table 3. The 10 items that constitute the 1 st factor, consist of expressions related to the foreign language lesson curriculum and explain 42.78% of the variance. The 14 items that constitute the 2 nd factor, consist of expressions related to the teaching staff and explain 8.80% of the variance. And, 4 items constituting 3 rd factor consist of expressions related to physical (hardware) conditions and constitute 8,39% of the variance. The total variance value of the three factors of the scale was 59,97%. Variance loads ranged between 40% and 60% in social sciences indicates that the scale is sufficiently exploratory (Pallant, 2017, p. 222;Tavşancıl, 2014;Thurstone, 1947). For this reason, it can be concluded that the variance load of the scale is sufficiently exploratory. Table 4 shows the results of the parallel analysis of principal components analysis and the comparison of the critical values with the eigenvalues. The Monte Carlo parallelism test (Watkins, 2000) also provides compatible results with the scale factors. Table 4 shows the parallel analysis criterion values with Principal Component Analysis (PCA). And, according to Monte Carlo parallelism test values, it was concluded that the scale is in three-factor structure. For the final application, the number of items in the scale was determined as 28.

Results Related to Confirmatory Factor Analysis (CFA)
It is important that the sample can ensure that relationships are reliably determined. In other words, for the reliable results, the sample must consist of quantitative and qualitatively sufficient individuals. Here, sample size can be considered sufficient to be ten times greater than observed number of variables (Büyüköztürk, 2002, p. 480;Erkuş, 2016, p. 60;Kurnaz & Yiğit, 2010, p. 33;Stevens, 1996). For this reason, a Foreign Language Lesson Scale consisting of 28 items and three factors was applied to 280 students, ten times the number of items, and was tested with Confirmatory Factor Analysis (CFA). CFA examines the relationship between existing variables and latent variables in developing a scale (Şencan, 2005, p. 408). 280 students were chosen as different from the students that were reached in the pre-application and selected randomly from different students in the same Vocational School of the same State University. Those with factor loadings of 0.50 and above of the items of research factors are considered as important items for the scale (Güngören, Bektaş, Öztürk & Horzum, 2014, p. 74;Jöreskog & Sörbom, 1996;Naktiyok, 2015, p. 118). For this purpose, the fit indexes of the items were determined first. As shown in Figure 2, as a result of CFA, the standard solutions of the Curriculum factor (SubD.1) were determined respectively as . 72,.77,.78,.76,.71,.74,.73,.79,.78,.74; the standard solutions of the Instructor factor (SubD.2) were determined respectively as . 71,.71,.76,.72,.77,.75,.77,.76,.68,.72,.77,.66,.74,.69 and the standard solutions of the Physical condition factor (SubD.3) were determined as .77,.75,.76,.61. Factor loadings for research dimensions should be higher than .50 (Yaşlıoğlu, 2017, p. 78). Since the factor loadings are higher than 50, it can be said that each item is important for the scale. The fit indexes of the CFA and the reference ranges (Hooper, Coughan, & Mullen 2008;Meydan & Şeşen, 2011;Schermelleh, Moosbrugger, & Müller., 2003;Yıldırım & Naktiyok, 2017, p. 297) specified in the literature are shown in Table 5.  Table 5 in the direction of CFA results. In CFA, a need to make an adjustment to strengthen the fit indexes between items M5 -M6 of curriculum factor (SubD.1) and M14 -M15, M19 -M20 and M20 -M21 of instructor factor (SubD.2) occurred and necessary arrangements were made. These results show that when the fit indexes of the FLSS and the CFA results are examined the three factorial structure of the scale meets the reference values. To determine the criterion validity of the FLSS, the total scores of the scale and the correlation coefficients (r) of the factors are evaluated separately. The correlation coefficient takes a value between +1 and -1 and this value gives information about the direction and strength of the relationship. According to Cohen (1998, pp. 79-81) correlation coefficient between 0 and 1 are as follows; low relationship: r = .10 -.29, moderate relationship: r = .30 -.49, high relationship: r = .50 -1.0. It can be said that there is a high level of correlation between the total score obtained from the scale and the correlation values of the factors at "01" level. Correlation coefficients between the factors of the FLSS vary between .50 and .75, as shown in Table 6. It can be concluded that the factors are in accordance to the correlation values. As shown in Table 7, Cronbach Alpha values of both the whole scale and the three dimensions were calculated in order to determine the reliability of the data belonging to the final application of the scale. The Cronbach Alpha coefficient is considered to be highly reliable in the range of 0,80 -1,00 (Erkuş et al., 2017, p. 26). As shown in Table 7, the Cronbach Alpha values for the sub-dimensions of the final application of the scale vary between ,81 and ,93. The Cronbach alpha value of the whole scale is ,96, indicating that the final application of the scale is highly reliable. In the final application of the scale, split half test technique was used to re-evaluate reliability. The split half test values are shown in Table 8. According to the results of the split half tests, Cronbach Alpha value of the first part is ,92 and Cronbach Alpha value of the second part is ,94. According to the results obtained from the split half test results applied at final application, it was concluded that the scale is highly reliable.

Discussion and Conclusions
In the world of 21 st Century, where the fourth industrial revolution is under way, as knowledge continues to increase, the value given to human capital continues to increase. In other words, well educated people are one of the foremost needs of societies. Individuals can transfer their learnings through various channels of communication to their surroundings. Language is at the head of these communication channels. Communities that want to create well educated individuals want to support their mother tongue education with a second language education in their early ages. In Turkey, the language education that started during the Ottoman Empire period, continues today. Foreign language lessons start in the second year of the primary school since 2013-2014 and continues to higher education. Although it seems that foreign language education is given importance in quantitative terms, Turkey is not at the desired level of proficiency in a foreign language and stays below international standards. Looking at Turkey's English proficiency index (EPI) for the year 2017, it is observed that it has a very low proficiency range with 47.79. According to the EPI data for the year 2017, Turkey ranks 62 in 80 countries.
Student satisfaction in education has an important place in student achievement. The concept of satisfaction can be explained as the meeting the perception and expectations that occur in individuals as a result of any goods or services offered. The aim of this study is to develop a "Foreign Language Lesson Satisfaction Scale" to determine the level of satisfaction from the compulsory foreign language lessons of approximately 7 million students studying at higher education institutions. In this study, the scale development process consists of 3 parts as pre-application, EFA and CFA.
A 5-point Likert-type scale was chosen as the scale type. Before the item pool was formed, the required literature was reviewed and the opinions of experts and students were taken. In order to determine foreign language satisfaction and through expert opinions, the scale item pool was prepared in accordance with three main themes. These main themes were determined as the foreign language lesson curriculum, the foreign language lesson instructor and physical conditions. 68 items were reduced to 38 items according to expert opinion. Sub-themes of the items were determined as follows: Foreign Language Lesson Curriculum = 13 items, Foreign Language Lesson Instructor = 16 items, Physical (Hardware) Conditions = 9 items.
Scale pre-application was carried out at a vocational school of a state university with 150 randomly selected students studying in different departments. In order to determine the reliability of the pre-application of the scale, the Cronbach Alpha values of both the subscales and the scale were determined and these values indicate that the pre-application of the scale is highly reliable. The split half test values applied to the pre-application of the scale also show that the pre-application of the scale is highly reliable. In order to determine that the sample size's sufficiency, Kaiser-Meyer-Olkin (KMO) and the Barlett test were performed to pre-application data. The KMO and Barlett test values in the pre-application of the scale indicate that the scale is appropriate for the EFA. Due to the relationship between the factors in EFA, Direct Oblimin rotation technique was used and the items with factor loadings below 30 were removed from the scale by consulting the expert opinion. In addition, items directed to both factors were removed from the scale. The factor loads obtained after the pre-application of the scale were collected in 3 dimensions. The 10 items that constitute the 1 st factor, consist of expressions related to the foreign language lesson curriculum and explain 42.78% of the variance. The 14 items that constitute the 2 nd factor, consist of expressions related to the teaching staff and explain 8.80% of the variance. And, 4 items constituting 3 rd factor consist of expressions related to physical (hardware) conditions and constitute 8,39% of the variance. The total variance value of the three factors of the scale was 59,97%. The scale factors give compatible results with The Monte Carlo Parallelism Test. According to Monte Carlo Parallelism Test values, it was concluded that the scale should be in three-factor structure. The number of the items on the scale for the final application was determined as 28.
Foreign Language Lesson Scale consisting of 28 articles and three factors was applied to 280 students, ten times the number of items, and was tested with CFA. As the factor loadings of the items of research factors are above 0.50, they were considered as important items for the scale. According to the CFA, the results were determined as follows: χ2/sd=2,092, RMSEA=.063, SRMR=.045, CFI=.95, GFI=.91, AGFI=.92, NFI=.91, TLI=.92. The values obtained show acceptable fit in the direction of CFA results. In CFA, a need to make an adjustment to strengthen the fit indexes between items M5 -M6 of curriculum factor (SubD.1) and M14 -M15, M19 -M20 and M20 -M21 of instructor factor (SubD.2) occurred and necessary arrangements were made. These results show that when the fit indexes of the FLSS and the CFA results are examined the three factorial structure of the scale meets the reference values. To determine the criterion validity of the FLSS, the total scores of the scale and the correlation coefficients (r) of the factors are evaluated separately. According to the correlation values, it can be concluded that the factors were compatible. It can be said that there is a high level of correlation between the total score obtained from the scale and the correlation values of the factors. In order to determine the reliability of the final application of the scale, Cronbach Alpha values of both the subscale and the scale were determined and these values indicate that the final application of the scale was highly reliable. The split half test values for the final application of the scale reveal that the pre-application of the scale is highly reliable.
With this study, "Foreign Language Lesson Satisfaction Scale", that has testable reliability and validity, was developed in order to determine the satisfaction of higher education students from compulsory foreign language lesson. With the Foreign Language Lesson Satisfaction Scale, similar studies can be conducted in different schools or departments. The external validity can be tested with similar measuring tools to reveal the scale's external criterion validity. Since this study is conducted at associate degree level in higher education, the validity of the scale can also be tested on students at undergraduate and graduate level.