Collagen gene interactions and endurance running performance

SAJSM VOL. 26 NO. 1 2014 9 Background. Although variants within genes that encode protein components of several biological systems have been associated with athletic performance, limited studies have investigated the collagen genes that encode the structural components of connective tissues. Objective. To investigate the association of variants within collagen genes with endurance performance in South African (SA) Ironman triathletes. Methods. A total of 661 white, male participants were recruited from four SA Ironman triathlon events for this genetic case-control association study. All participants were genotyped for COL3A1 rs1800255 (G/A) and COL12A1 rs970547 (A/G). Results. No independent associations were identified between COL3A1 rs1800255 and COL12A1 rs970547 and overall finishing time or time to complete any of the individual components (3.8 km swim, 180 km bike or 42.2 km run) of the 226 km event. The major G+A-inferred pseudo-haplotype, constructed from COL3A1 rs1800255 and COL12A1 rs970547, was, however, significantly (p=0.010 and p=0.027) overrepresented in the fast run tertile (58.7%) compared with the middle (53.5%) and slow (49.5%) run tertiles, respectively. The major G+T+Ainferred pseudo-haplotype, constructed from COL3A1 rs1800255, COL5A1 rs12722 (T/C) and COL12A1 rs970547, was again significantly (p=0.022) over-represented in the fast run tertile (35.2%) compared with the slow run tertile (28.9%). Conclusion. Our main novel finding was that the COL3A1 rs1800255 and COL12A1 rs970547 variants interacted to modulate endurance running performance in the four SA Ironman triathlons investigated. In addition, the interaction between these variants and COL5A1 rs12722 appeared to modulate endurance running performance.

[7] Furthermore, the COL5A1 TT genotype of single nucleotide polymorphism (SNP) rs12722 C/T and the COL6A1 TT genotype of SNP rs35796750 T/C have been associated with improved endurance running and endurance cycling performance, respectively, during the South African (SA) Ironman triathlon. [4]The association between the COL5A1 rs12722 TT and rs71746744 (-/AGGG) AGGG/AGGG genotypes and improved endurance running performance was later replicated in a road running event. [4]In addition, it has been proposed that both COL5A1 variants, located in a functional region of the COL5A1 3'-untranslated region (UTR), regulate type V collagen production. [8]Specifically, the rs12722 T and rs71746744 AGGG allele of COL5A1 are associated with increased COL5A1 mRNA stability, which may lead to increased levels of type V collagen α1 chain synthesis. [8]Increased type V collagen production may affect normal collagen fibrillogenesis and alter the mechanical properties of the tissue, leading to improved endurance performance. [9]][12] The α1 chains of types III and XII collagen are encoded by the COL3A1 and COL12A1 genes, respectively.The non-synonymous COL3A1 rs1800255 A/G and COL12A1 rs970547 A/G variants within these genes are also associated with a number of multifactorial soft tissue phenotypes. [4,13,14]Furthermore, COL3A1 rs1800255 and COL12A1 rs970547 are both proposed to be functional. [14,15]Specifically, the alanine to threonine change at position 698 of the α1(III) chain, as a result of COL3A1 rs1800255, could affect the tensile strength of type III collagen fibres. [14]In addition, functional bioinformatics analysis of COL12A1 rs970547 revealed that the resulting glycine to serine change is potentially damaging to the α1(XII) chain. [15]Therefore, since types III and XII are implicated in fibrillogenesis like types V and VI, it may be proposed that common, potentially functional variants within the COL3A1 and COL12A1 genes may also be associated with athletic endurance performance.

Objectives
The primary objective of our study was to determine whether COL3A1 rs1800255 and COL12A1 rs970547, like COL5A1 rs12722 and COL6A1 rs3579 6750, are associated with athletic endurance performance in the participants of four SA Ironman triathlon events.We hypothesised, due to the proposed functional effects of these variants, that the COL3A1 rs1800255 GG and COL12A1 rs970547 AA genotypes are associated with improved endurance performance.
The secondary objective was to investigate gene-gene interactions between COL3A1 rs1800255 and COL12A1 rs970547, and previously associated collagen genes where appropriate, and endurance performance.We hypothesised that the G+A pseudo-haplotype is associated with improved endurance performance, and that the COL5A1 rs12722 T and COL6A1 rs35796750 T alleles, if included in gene-gene interactions with COL3A1 rs1800255 and COL12A1 rs970547, contributes to interactions for endurance running and cycling, respectively.

Methods
A total of 661 white, male participants were recruited from four SA Ironman triathlon events for this genetic case-control association study, using previously outlined recommendations. [16,17]Participants were recruited at the registration of either the 2000 (n=96) and 2001 (n=294) events held in Gordon's Bay (~50 km from Cape Town) or the 2006 (n=219) and 2007 (n=52) Port Elizabeth (PE) events (~750 km east of Cape Town).All participants were required to complete the event for inclusion in the study.For participants who entered more than one event, only data from one race year was used, since their overall finishing times were similar (data not shown).
Race results were obtained from the race organisers and participants were divided into three equal tertiles based on their finishing times for the 3.8 km swim, 180 km cycle, 42.2 km run and overall race.The fastest triathletes were placed into the fast tertile, those who finished in the mid-field were placed in the middle tertile, and the slowest triathletes were placed into the slow tertile.
Study approval was granted by the Human Research Ethics Committee, Faculty of Health Sciences, University of Cape Town, and the race organisers.All participants completed informed consent forms and a physical activity questionnaire.Participants of the PE subgroup completed training history questionnaires; this was not documented at the events in Gordon's Bay.Since training data were obtained during the PE events, the event priority for the participants who had completed more than one event was 2006, followed by 2007 and finally 2001, which had a larger, more complete dataset than the 2000 event.

Blood collection and DNA extraction
At event registration, ~4.5 ml of venous blood was collected from each participant into an ethylenediaminetetraacetic acid vacutainer tube by venipuncture of a forearm vein.Samples were stored at 4°C until DNA was extracted, as previously described, with minor modifications. [18]All analyses were performed at the UCT/MRC Research Unit for Exercise Science and Sports Medicine, University of Cape Town.

COL3A1 rs1800255 genotyping
Genotyping of COL3A1 rs1800255 was performed using a customdesigned, fluorescence-based Taqman polymerase chain reaction (PCR) assay (Applied Biosystems, USA).Allele-specific probes and flanking primer sets (sequences available on request) were used along with a pre-made PCR mastermix containing ampliTaq DNA polymerase Gold (Applied Biosystems, USA) in a final reaction volume of 8 µl.The PCR cycling comprised a 10 min heat activation step (95°C) followed by 40 cycles of 15 s at 92°C and 1 min at 60°C.The reactions were performed using a XP Thermal Cycler (Block model XP-G, BIOER Technology Co., Japan).Genotypes were determined by end-point fluorescence using a 7900 HT Fast Real-Time PCR System and SDS software (version 2.3).COL12A1 rs970547 genotyping COL12A1 rs970547 was genotyped as previously described. [19]Briefly, fragments containing COL12A1 rs970547 were amplified by PCR.The PCR products were digested with AluI to produce 599 and 16 bp fragments for the G allele and 460, 139 and 16 bp fragments for the A allele.The fragments were resolved, together with a 100 bp DNA ladder, on a 6% non-denaturing polyacrylamide gel and visualised by SYBER Gold staining (Invitrogen Molecular ProbesTM, USA).The gels were photographed under ultraviolet light using a Uvitec photodocumentation system (Uvitec Limited, UK).

Statistics
Continuous variables were compared between genotype groups using one-way analysis of variance (ANOVA) tests.Chi-squared or Fisher's tests were used to compare categorical variables.Basic descriptive statistical analysis and frequencies were determined using Statistica (version 11) and GraphPad InStat (version 6).Inferred pseudohaplotypes between gene variants were tested using Hapstat (version 3.0).Hardy-Weinberg equilibrium status was determined using Genepop (version 4.0.10;http://genepop.curtin.edu.au).Statistical significance was assumed at p<0.05.

Participant training history
Table 1 summarises self-reported training history data, characterising the 15 weeks prior to each event, collected at the 2006 and 2007 PE SA Ironman triathlon events.Although probably not biologically relevant, the COL3A1 rs1800255 variant was significantly (p=0.002)associated with swim training duration (h/week).Participants with a COL3A1 rs1800255 GA genotype (3.4±1.6 h/week) trained significantly (p=0.001) more than participants with a COL3A1 rs1800255 GG (2.8±1.0 h/week) or AA (2.3±0.9 h/week) genotype.The distance (km/week) and duration (h/week) trained for the cycle, run and combined components (swim, cycle and run) were not significantly associated with COL3A1 rs1800255 (Table 1).Furthermore, no significant associations were identified between COL12A1 rs970547 and distance or duration trained for the swim, cycle, run or combined tertiles (Table 1).

COL3A1 rs1800255 and COL12A1 rs970547 and performance
The COL3A1 rs1800255 and COL12A1 rs970547 variants were not significantly associated with overall finishing time or time taken to   2).Furthermore, when participants were grouped into performance tertiles, no significant differences were identified for COL3A1 rs1800255 or COL12A1 rs970547 genotype distributions between the groups in terms of the overall finishing time or time taken to complete any of the individual components of the triathlon (Table 3).

Gene-gene interactions and performance
Since there were no independent associations of the COL3A1 and COL12A1 variants with endurance performance, inferred pseudohaplotypes between COL3A1 rs1800255 G/A and COL12A1 rs970547 A/G were constructed.All four inferred pseudo-haplotypes were identified for the overall finishing time, as well as for the time taken to complete the individual components of the triathlon.For the overall tertiles, the major G+A-inferred pseudo-haplotype was significantly (p=0.007 and p=0.029) over-represented in the fast tertile (58%; n=149) when compared with the middle (55%; n=140) and slow (50%; n=127) tertiles, respectively (Fig. 1d).When the individual components of the triathlon were analysed, the major G+A-inferred pseudo-haplotype was significantly (p=0.010 and p=0.027) over-represented in the fast run tertile (58.7%; n=144) when compared with the middle (54%, n=131) and slow (50%; n=114) run tertiles, respectively (Fig. 1c).No significant associations were identified between the inferred pseudohaplotypes and the swim (Fig. 1a) or cycling (Fig. 1b) components of the triathlon.Since this association was identified for the run component of the triathlon, and COL5A1 rs12722 was previously associated with the run component in this cohort, [4] inferred pseudo-haplotypes between COL3A1 rs1800255 G/A, COL5A1 rs12722 T/C and COL12A1 rs970547 A/G were constructed (Fig. 2).All eight inferred pseudo-haplotypes were identified.The major G+T+A-inferred pseudo-haplotype was again significantly (p=0.022)over-represented in the fast run tertile (35%; n=86) compared with the slow run tertile (29%; n=67) (Fig. 2).
Furthermore, when the cycling component of the triathlon was investigated with inferred pseudo-haplotypes constructed from COL3A1 rs1800255, COL6A1 rs35796750 and COL12A1 rs970547, no significant associations were identified (Fig. 3).

Discussion
The main novel finding of this study was that the COL3A1 rs1800255 (G/A) and COL12A1 rs970547 (A/G) variants interacted to modulate endurance running performance in the four SA Ironman triathlon events.No significant independent associations were identified between these gene variants and the time taken to complete the overall race, or the 3.8 km swim, 180 km cycle or 42.2 km run components.Previously, we showed the association of COL5A1 rs12722 (T/C) and COL6A1 rs35796750 (T/C) with endurance running and endurance cycling performance, respectively, in the SA Ironman triathlon. [4]urthermore, variants within the COL5A1 3'-UTR, including rs12722, are proposed to alter the expression of type V collagen, thereby modulating normal fibrillogenesis and resulting in changes to the collagen fibril architecture, structure and mechanical properties. [9]Similarly, COL6A1 rs35796750 is proposed to result in aberrant splicing of COL6A1 mRNA, which may also affect the role of type VI collagen in normal fibrillogenesis. [20]Both type III and XII collagens are also implicated in fibrillogenesis. [10,12]][12] Therefore, we proposed that common variants within the COL3A1 and COL12A1 genes, namely rs1800255 and rs35796750, could be associated with endurance performance in the SA Ironman triathlons, in a similar manner proposed for COL5A1 rs12722 and COL6A1 rs35796750.Despite the rationale outlined above, no independent associations were identified between COL3A1 rs1800255 or COL12A1 rs970547 and endurance swimming, cycling, running and overall performance in the triathlons.However, when inferred pseudo-haplotypes were constructed from COL3A1 rs1800255 and COL12A1 rs970547, significant genegene interactions were identified.Specifically, participants with the major G+A pseudo-haplotype were significantly over-represented in the fast tertile, compared with the middle and slow tertiles, for overall finishing time, as well as for the running component the triathlon.
Furthermore, since the COL5A1 rs12722 variant was previously associated with endurance running, [4] additional gene-gene inter actions between COL3A1 rs1800255, COL5A1 rs12722 and COL12A1 rs970547 were investigated.Again, participants with the major G+T+A pseudohaplotype were significantly over-represented in the fast tertile, compared with the slow tertile, for only the running component of the triathlon.This implicates COL3A1 and COL12A1, as well as their interaction with COL5A1, as potential markers for endurance running performance.Additional studies should investigate these genes in true endurance running events, such as marathons, to confirm the findings of our study.Furthermore, since no single variant-independent associations were identified for COL3A1 rs1800255 and COL12A1 rs970547, these findings highlight the importance of gene-gene interactions when investigating multigenic complex traits such as endurance performance.
Finally, no significant associations were identified between the cycling component of the triathlon and inferred pseudo-haplotypes constructed from COL3A1 rs1800255, COL6A1 rs35796750 and COL12A1 rs970547.

Study limitations
Study limitations include the lack of training data for the 2000 and 2001 Gordon's Bay events, as well as the lack of data on other important

Conclusion
Our main novel finding was that the COL3A1 rs1800255 and COL12A1 rs970547 variants interacted to modulate endurance running performance in the four SA Ironman triathlons investigated.Furthermore, these variants also interacted with COL5A1 rs12722 to modulate endurance running performance.This implicates COL3A1 and COL12A1 as potential markers for endurance running performance.

Fig. 1 .
Fig. 1.Frequency distributions of inferred pseudo-haplotypes constructed from COL3A1 rs1800255 and COL12A1 rs970547 between the fast, middle and slow tertiles in terms of: (a) time taken to complete the swim component of the triathlon; (b) time taken to complete the cycling component of the triathlon; (c) time taken to complete the run component of the triathlon; and (d) overall time taken to complete the triathlon.The number of participants is indicated above each column.(* Fast v. slow tertile; † Fast v. middle tertile.)

Funding acknowledgements.
This research was supported in part by the National Research Foundation (NRF), the Medical Research Council of South Africa and the University of Cape Town.MP was supported by the Thembakazi Trust.

Fig. 2 .Fig. 3 .
Fig. 2. Frequency distributions of inferred pseudo-haplotypes constructed from COL3A1 rs1800255, COL5A1 rs12722 and COL12A1 rs970547 between the fast, middle and slow tertiles in the time to complete the run component of the triathlon.The number of participants is indicated above each column.(* Fast v. slow tertile.)

Table 1 . Self-reported training history for the COL3A1 rs1800255 and COL12A1 rs970547 genotypes of the PE subgroup
* Statistically significant (p<0.05).† Combined = swim, cycle and run.