Absolute and relative reliability of SCRuM test battery components assembled for schoolboy rugby players playing competitive rugby in low-resource settings: A pragmatic in-season test-retest approach

Background Schoolboy rugby is a popular sport which forms the bedrock of rugby development in many African countries, including Zimbabwe. With burgeoning talent identification programmes, the development of multi-dimensional, logically-validated, and reliable test batteries is essential to inform the objective selection of potentially talented young rugby athletes. Objectives This study sought evidence on the absolute and relative test-retest reliability of the component test items in the newly-assembled SCRuM test battery. Methods Utilising a pragmatic test-retest experimental design, a sample of 41 Under-19 schoolboy players playing competitive rugby in the elite Super Eight Schools Rugby League in Harare, Zimbabwe, participated in the study. Results Physiological and game-specific skills tests which showed good to excellent relative reliability and acceptable absolute reliability, included: 20 m and 40 m speed, L-run, Vertical Jump (VJ), 60 s Push-Up, 2 kg Medicine Ball Chest Throw test (2 kg MBCT), Wall Sit Leg Strength test (WSLS), Repeated High Intensity Exercise test (RHIE), One Repetition Maximum Back Squat (1-RM BS) and Bench Press tests (1-RM BP), Yo-Yo Intermittent Recovery Level 1 test (Yo-Yo IRT L1), Tackling Proficiency test, Passing Ability Skill test and Running and Catching Ability skill test. Conclusion All these tests are reliable and warrant inclusion in the SCRuM test battery for possible profiling of U19 schoolboy rugby players during the ‘in-season’ phase provided there is adequate participant familiarisation and test standardisation. The test-retest ICCs and measurement errors are generalisable to other young athletes in this population, making the tests useful for the evaluation of training and developmental effects of the measured constructs.

Increased competition demands worldwide at the elite senior level have prompted professional rugby union (RU) clubs and national RU governing bodies to invest in talent identification (TID) and long-term junior development programmes. [1][2][3] These efforts have produced a pool of young rugby players with the potential to become successful future elite athletes, strengthening the growth and development of rugby. The process of TID is dependent on screening tests that measure the important characteristics of rugby players. The tests should be practically feasible and have acceptable psychometric properties. However, there are many test batteries available in the literature profiling young rugby players with heterogeneous compositions [4] and unclear details on the measurement properties of the constituent tests.
Regardless of age and playing standards, RU is a physically and technically challenging sport requiring commensurate physiological adaptations and specialised training of rugbyspecific skills for optimal performances. [1] A combination of appropriate anthropometric qualities, physiological characteristics, and rugby-specific skills defines the key attributes warranted by participants for effective performances. Test batteries that are logically validated to the needs of the young rugby players, which also contain practically feasible and reliable tests are more likely to be relevant for use in the TID programs. In addition, coaches, strength and conditioning experts and sports scientists can use them for longitudinal monitoring of athletic motor skills, technical performances, and responses to injury rehabilitation. Cross-sectionally, such test batteries can provide data on players' competency levels assisting in player team selection, and an individual athlete's profile in terms of anthropometric, physiological and game-specific characteristics. Therefore, following the development of the first version of the SCRuM (School Clinical Rugby Measure) test battery and subsequent evaluation of face recognition methods, logical validity and practical feasibility of the component test items [4][5][6] , the specific objectives of this study were to identify test items in the SCRuM test battery with an acceptable coefficient of variation and high intraclass correlation coefficient (ICC) as a measure of absolute and relative reliability among a sample of young rugby players.
playing government (n=2) and private (n=6) high schools across the country and is generally considered as the "elite" rugby-playing league. All the SESRL schools have a local reputation for a strong and long-standing culture of playing competitive rugby. [4] Annually, the SESRL produces U19 rugby players capable of joining adult professional clubs.

Data collection approach
A pragmatic "in-season" approach previously used by Enright et al. [11] was adopted for the study. Specifically, the study sought to determine the reliability of SCRuM tests when test-retest assessments are scheduled during training days without disturbing the classes, training schedules, and competitive match days. This approach was more likely to get approval from the coaching staff, parents, and school authorities given the multitude of tests in the SCRuM test battery and the repeated measures. The design required all participants to perform the SCRuM test items on two separate occasions at the same time and day.
Two familiarisation sessions were conducted to ensure sufficient exposure of the study participants to the SCRuM test items. For the second session, eligible participants completed a brief questionnaire which solicited demographic and rugby-related information. Participant testing commenced during the competitive season. This approach ensured that participants had match physical fitness and were close to peak performance. On any day of testing, participants completed the Physical Activity Readiness Questionnaire (PAR-Q) and were excluded if they reported injuries, illness, or health-related conditions aggravated by exertion. Subsequently, eligible participants completed a standardised warm-up procedure before testing. The order of testing was as indicated in Supplementary File 1. A recovery period of 10 minutes was allowed between tests to minimise fatigueinduced effects. The re-test assessments were conducted after seven days, at the same time for each participant.
Two well-trained research assistants conducted all the SCRuM tests, except for skinfolds and game-specific skills. The latter tests were conducted by a purposively-recruited anthropometrist and rugby coaches respectively. Each assistant always assessed the same athlete. Testing occurred on a natural grass pitch for field tests and in a gymnasium for strength/power-based tests with participants who were requested to wear the same clothing each time. The researchers provided similar verbal encouragement to all participants during the test. Test results were deliberately withheld from the participating athletes to avoid influencing re-test performances. Additionally, participants were unaware of the seven-day interval for the re-test assessments and were advised to maintain a normal diet, adequate hydration, and to avoid taking ergogenic aids during the experimental period.

The SCRuM test battery
The SCRuM test battery was composed of (i) anthropometric variables (height, sitting height, body mass, seven skinfold measurements which included biceps, triceps, subscapular, suprailiac, abdomen, thigh, and calf measures), (ii) physiological characteristics (speed, agility, upper-and-lower muscular strength and power, prolonged high-intensity intermittent running ability, muscle flexibility and repeated high-intensity exercise performance ability) and (iii) rugby-specific game skills (tackling proficiency, passing ability, passing-for-accuracy, and running-and-catching ability). The full description of the SCRuM test battery and methodological procedures followed for the test execution are included as Supplementary File 2. Briefly, linear speed was measured using the 5 m, 10 m, 20 m, and 40 m speed tests. Agility was assessed using the L-run agility test. Upper and lower body muscular strength was assessed using the One Repetition Maximum Bench Press test and One Repetition Maximum Back Squat test, respectively. Two further tests were also included to assess upper and lower body muscular strength: Wall Sit Leg Strength, and 60s-Push Up. Upper-and lower muscular power were assessed using the Vertical Jump and 2 kg Medicine Ball Chest Throw tests. Prolonged highintensity intermittent running ability was evaluated using the Yo-Yo Intermittent Recovery Level 1 test. Lower back and hamstring muscle flexibility were assessed using the sit-andreach test. The repeated high-intensity exercise performance ability of the participants was evaluated using the Repeated High Intensity Exercise (RHIE) tests.
The development of the SCRuM test battery was based on recommendations from the literature on instrument or test battery development. Briefly, the development process followed a multi-phased approach which involved conducting: i. A narrative literature review to establish what is known about the key requirements of rugby, specifically targeting anthropometric, physical or physiological characteristics, and rugby-specific game skills in the literature ii. A qualitative exploratory study to gather the perceptions of rugby coaches on the key attributes or qualities and game skills needed in rugby and should be incorporated in test batteries for TID programs. This part of the approach also sought commonly used test(s) for the identified attributes and skills used in the local context iii. A systematic literature review to determine the physical or physiological characteristics and rugby-specific game skills frequently covered in the literature and their corresponding tests. Furthermore, the evaluation of the psychometric properties of each identified test per construct was also undertaken.
The above-mentioned processes engendered the first version of the test battery which was subsequently evaluated for face validity, logical validity and practical feasibility. Therefore, this present study aims to evaluate the reliability of the content-validated and practically-feasible version of the SCRuM test battery.  [10] To test for absolute reliability, the standard error of measurement (SEM) was calculated for each test. The SEM provides expected trial to trial measurement error and was computed as a standard deviation of the differences (SDdifferences) between test-retest assessments divided √2. [11] To facilitate the comparison of test reliability values between studies, the coefficient of variation (CV %) expressed the SEM as a percentage of the grand mean [12] , and an arbitrary CV boundary of <10% was considered acceptable [12] . The smallest detectable change (SDC95%) for each test was calculated by SEM X 1.96 X √2. [13] Results Table 1 shows the demographics and rugby-related information of all participants. The mean age of the participants was 17.5±0.9 years. The median years of experience playing schoolboy rugby for the participants were five years (Interquartile range, IQR four-five years). There was an equal representation of forward (49%) and backline (51%) players in the sample population.

Discussion
The purpose of this present study was to provide contextual evidence on the test-retest reliability of each of the component test items in the newly-assembled SCRuM test battery. The establishment of reliability is an extremely important step in test battery development as it provides information on the capacity of test items to differentiate participants or maintain the same relative order of participants in replicate measures under similar conditions. [14] The ICC is the most commonly reported sample statistic providing evidence of relative reliability in the literature. It thrives on increased variability in the sample population for the measured construct and decreased measurement error. Among 41 U19 schoolboy players, most SCRuM test items demonstrated no systematic bias, low CV% values, and high ICCs, suggesting absolute and relative reliability when the assessments are made during the 'in-season' phase. These results reflect the careful manner in which SCRuM test items were implemented as well as temporal stability in the construct over the interval measured. Overall, the high ICCs could be attributed to the large between-subject variability observed for most test performances. This variability could potentially stem from natural differences in participant abilities, player position heterogeneity, or varied rugby experience.
As expected, good to excellent ICCs were shown for all anthropometric variables. However, 12 of the 14 physiological tests administered to U19 schoolboy rugby players showed good to excellent relative reliability. The tests included the following: 20 m and 40 m speed, modified L-run agility, VJ, SR, 60 s push-ups, 2 kg MBCT, WSLS, RHIE, 1 RM BS, 1 RM BP,  [15] . Dobbin et al. [11] reported ICC (CV %) of 0.69 (4.9) for 10 m speed test among 50 U19 academy rugby league players. However, besides differences in sample size and sport, there were methodological differences between the Dobbin et al. [11] study and our study (i.e. use of timing gates vs an electronic handheld stopwatch; three repeated measures vs two repeated measures). In contrast, Gabbett et al. [16] reported high ICCs (CV %) for 5 m and 10 m speed tests of 0.84 (3.2) and 0.87 (1.9) respectively among 42 adult rugby league players. This shows that methodological, sport and population differences partly explain differences in the ICC results between studies. Reliability parameters depend on variations in the population sample for the measured construct, and the results have an external validity to populations with similar variations. [13] Another key but unexpected finding was the low relative reliability for the passing-for-accuracy seven m test. This is explained by the lower variability between participants evidenced by smaller standard deviations in test and re-test scores. No previous study has reported the relative reliability of the passing-for-accuracy seven m test for U19 schoolboy rugby players referencing ICCs values. Pienaar et al. [17] reported test-retest correlations (r=0.66) and 95% Limits of Agreement (LoA), suggesting moderate reliability among thirty-six 10-year-old schoolboys with varied rugby experience. Nonetheless, the use of r has been criticised in contemporary literature since it evaluates the linearity of test scores in repeated measures. [18] Instead, the ICC is frequently reported for relative reliability. [19] Nonetheless, the low reliability of the passing-for-accuracy seven m test in the present study could be linked to test novelty. Unlike previous tests which had stationary rugby participants passing a ball to a static object placed seven m away, and judging the accuracy of hitting the target [17,20] , the present study had a dynamic recipient catching of a pass from a running player. The test also uniquely included a research assistant offering standardised defensive play to the tested player. All this was designed to test passing-for-accuracy as an open skill simulating real game situations. However, given the low reliability, it is possible that the test was relatively easy for U19 rugby players to achieve consistent discriminative performances. To minimise measurement errors, critical test elements, such as the running velocity of the tested and target player and positioning in the passing grid zone for executing the pass, may need careful consideration in future modifications of the test.
All SCRuM tests showed acceptable variability (CV<10%), indicating good agreement between test-retest scores, except for the sit-and-reach test among U19 rugby players. The sitand-reach test showed the greatest variability (CV=17.3%) and paired samples t-test results showed almost statistically significant differences between test-retest assessments (p=0.05). Thus, it is possible that the sit-and-reach test lacked standardisation resulting in the observed mean scores between test-retest assessments or more careful standardisation with the warm-up is required before the test is undertaken.With a mean difference between trials of -0.63, the learning effect could have potentially influenced testretest results for the sit-and-reach test. This possibly creates a need for an extra familiarisation trial for the sit-and-reach test in future studies or more than two repeated measures.

Critical assessment of the study
The study utilised a relatively larger sample size than commonly used in similar studies reporting the reliability of anthropometrical and performance tests in rugby. The response and test completion rates were high, eliminating the effect of non-participation bias and missing information on test results. However, the study had some limitations.
i. We chose a pragmatic approach involving one age category of participants purposively selected from one school and conducted the study during the competitive rugby season. The residual fatigue from training, especially from previous day and competitive matches, could have affected optimal performance from participants. ii. During the test-retest study, no attempts were made to standardise the timing, type and quantity of food/fluid intake.

Conclusion
Among U19 schoolboy rugby players involved in competitive rugby, good to excellent intraclass correlation coefficients were shown for all anthropometric variables. The SCRuM physiological and game skills tests administered which showed good to excellent relative reliability and acceptable absolute reliability included: 20 m speed, 40 m speed, L-run, Vertical Jump, 60 s Push Up, 2 kg Medicine Ball Chest Throw, Wall Sit Leg Strength, Repeated High Intensity Exercise, One Repetition Maximum Back Squat, One Repetition Maximum Bench Press, Yo-Yo Intermittent Recovery Level 1, Tackling proficiency, Passing Ability and Running-and-Catching Ability. All these tests warrant inclusion in the SCRuM test battery for possible profiling of U19 schoolboy "elite" rugby players during the 'in-season' competitive phase provided there is adequate participant familiarisation and test standardisation.

Conflict of interest and source of funding:
The authors declare no conflict of interest and no source of funding.
Availability of data and material: All relevant data are within the paper and its supporting information and supplementary files.

Acknowledgements:
The authors would like to acknowledge all the high school male adolescent rugby who participated in this study. The lead author thanks research assistants who collected part or whole data on this project. The Zimbabwean Ministry of Primary and Secondary Education, the headmasters, school sports directors, and rugby coaches who provided permissions to access schools. Further, we would like extend our gratitude to the parents and guardians who gave informed consents for their children to participate in the study. Also, the authors thank rugby expert coaches who rated the participants on game-specific skills, the anthropometrist who performed the skinfold measures, former U19 adolescent rugby players used as 'dummy' players for the assessment of game specific skills, and content experts for validating the data collection instruments.
Author contributions: MC, BS-E and GF contributed to conception, design of the study, data analysis and interpretation. MC also conducted the literature review, recruited and trained research assistants and participants with variable assistance coming from other people acknowledged in the acknowledgment section. MC supervised the data collection and did the analysis. MC drafted the manuscript for publication and acted as the corresponding author. BS-E, CT, JMD and NSM performed critical revision of the manuscript, and provided extensive revisions prior to submission to the journal for review. All the authors read and approved the final version of the manuscript.