Study design and sample collection
The initial study involved recruitment of infants born at <37 weeks completed gestation and with a birth weight <1850 g, and was conducted in five neonatal units in the UK between 1982 and 1985.11 The study design is outlined in figure 1. The trial protocol and subsequent follow-up evaluations have been published extensively.2 9–11 13–16 18 No registration is available as the original trial was conducted in the early 1980s, at a time when registration was not common. A balanced randomisation sequence was prepared for each centre, within strata defined by birth weight: <1200 g and 1200–1849 g. Treatment assignments were held in sealed numbered envelopes.11
Figure 1Original cohort details and recruitment data. Subjects were enrolled into one of two parallel trials and subsequently randomised to one of three different diets, preterm formula (PTF), term formula (TF), banked breast milk (BBM). Urine samples were collected during a follow-up study at age 20.
Following parental consent, infants were randomised to one of three diets within 48 hours of birth: a) BBM, b) PTF (Osterprem, Farley’s Health Products, Kendal, UK) or c) TF (Ostermilk, Farley’s Health Products, Kendal, UK). Infants received this as their sole diet until their weight reached 2000 g or they were discharged home (median age 4 weeks). If a mother did not wish to provide her own breast milk (trial 1), infants (260 boys, 242 girls) were randomised to receive, as their sole diet, BBM or PTF (in Cambridge, Ipswich, King’s Lynn: study 1); or to receive TF or PTF (in Norwich, Sheffield: study 2). If the mother wished to provide her own breast milk (trial 2), infants (203 boys, 221 girls) received as much maternal breast milk (MBM) as was available each day, and the rest of the diet was made up by randomising to the same options as trial 1.
Of the 926 infants enrolled in the neonatal trials, 831 (89.7%) survived to discharge. Twenty years later, 272 young adults from this cohort were successfully contacted and invited to participate in a follow-up study investigating the effects of early diet on bone mass18 (70 declined and the remainder could not be traced or did not reply); 202 participants were included in this study, representing 22% of those originally enrolled and 24% of known survivors. Urine samples were requested from all participants and obtained from a total of 197 (mean (SD) 20.25 (0.55) years). Urine samples were collected in standard NHS urine specimen pots without addition of antibacterial substances and stored immediately at −80°C.
Participants providing urine samples had been randomised for 1 month postnatally to BBM (n=55; 28 men), TF (n=48; 14 men) or PTF (n=94; 40 men). Participants in the follow-up study were more likely to be women and to have been classified in the highest social class category at birth. A greater proportion had been breast fed after discharge, although the median duration of any or exclusive breast feeding was not significantly different between follow-up participants and non-participants. There were no differences in indicators of neonatal disease severity or the proportion with a birth weight <1250 g or below the 10th centile. Participants had a significantly reduced height SD score (SDS) and significantly higher body mass index SDS18 in comparison with the UK population reference values.
High-resolution 1H-NMR spectroscopic analyses of urine samples
Untargeted global metabolic profiles were acquired from 197 urine samples using 1H-NMR spectroscopy. Samples were prepared and standard one-dimensional spectral data acquired using a Bruker DRX-600 spectrometer (Bruker Biospin, Karlsruhe, Germany) operating at 600.29 MHz, according to published protocols.19 Samples were run at a temperature of 300 K. Spectral preprocessing (phasing, baseline correction and reference to the TSP (trimethylsilyl-2,2,3,3-tetradeuteropropionic acid) singlet peak (at δ 0.00)) were performed using the Bruker TopSpin V.3.1 programme. Spectral data were imported into Matlab (V.R2014a, The MathWorks, Natick, Massachusetts, USA), and regions occupied by water and urea (δ 4.45–6.00) removed. Data points were then aligned and normalised using the probabilistic quotient normalisation method20 to reduce the effects of differential dilution or concentration, on the data analysis (using scripts coded in-house in Matlab).
Data analyses
Spectroscopic data were analysed using both unsupervised and supervised multivariate statistical modelling methods (using proprietary Matlab scripts). Principal component analysis (PCA), an unsupervised approach, was conducted initially to discern the presence of inherent similarities or differences in urinary spectral profiles, and to identify outliers (based on positioning of points on the PCA scores scatter plot). The model’s R2 statistic provides an indication of how much of the variation within a data set can be explained by the model components, with PC1 (principal component 1) explaining the greatest source of variation, followed by PC2, then PC3, and so on. The Q2 statistic is an indicator of a model’s predictive ability, indicating how accurately the data, either classed or non-classed, can be predicted. The number of principal components used to build the models were based on optimal R2 and Q2 values (both of which vary between 0 and 1). PCA was followed by supervised modelling using orthogonal projections to latent structure discriminant analysis (O-PLS-DA), using the NMR spectroscopic data as X variables, and diet as the Y variable describing class ownership, to optimally model class differences corresponding to diet and to identify the metabolites contributing to this difference.21 Group comparisons were calculated for individuals on (1) high-calorie (PTF) versus low-calorie (TF and BBM combined) diets, (2) PTF vs BBM, (3) PTF vs TF for both the whole dataset and for men and women separately. In addition to these discrete models, we also conducted continuous modelling to identify metabolites correlated with birth weight, gestational age and 2-week weight z score. Multivariate modelling was conducted both on the whole dataset and on sex-stratified data.