Array images can be extracted/analysed by any image quantification software including analysis software provided by scanner manufacturers, third-party software or open source programs.
A GenePix Array List (GAL) file for each array is generated to aid image analysis. Please note that GAL files are grid file specific to the Genepix software and may not be compatible with any other software. GAL automatically generates grids on the array slide for auto spot detection supporting image analysis.
Yes, we share our data analysis pipeline with our customers and the pipeline is written in R language which includes pre-processing and normalisation of data as well as algorithm for biomarker discovery.
Functional protein microarrays differ in many respects from DNA or RNA microarrays. Unlike DNA microarrays, functional protein microarrays often aim to discover global interactions of a single probe (protein) in a single colour-channel, which results in a relatively small selection of specific proteins showing strong signals for a given sample. In this regard, we have designed a robust data pre-processing method to ensure that each reported signal intensity is highly accurate and significant.
Each replica spot on the array is subject to multiple threshold variables for quality control purposes. These quality control steps ensure replica spots from proteins showing high variance are flagged to report outliers.
Our normalisation step uses both quantile-based and total intensity-based methods which utilises common underlying distribution of control probes on the array to correct for any technical variance whilst conserving the biological differences between samples.
The biomarker discovery step implements protein-specific threshold calculated based on mean signal intensities from healthy controls, thus highlighting case specific responses.
Each step of data analysis pipeline maintains the quality of data to report only true positive signals.
Our protein array slides can be scanned on any microarray scanner which is able to detect Cy3 fluorescence e.g. Perkin-Elmer, Axon etc. We would, however, recommend using the Agilent scanner as we’ve found that it is more stable over time compared to the other scanners we have tested.
The horizontal distance between protein spots is 405μm and the vertical distance between protein spots is 360 μm.
The diameter of individual protein spots is 150μm.
There are more than 403 Kinases, 600 cancer related proteins, 380 transcription factors, 190 signalling proteins, 20 cytokine/chemokines and 360 proteins from other classes.
Data extracted using the image quantification software is subjected to preprocessing where the percentage of coefficient of variation (CV %) of intra-protein, intra-slide and inter-array for each replica spot is calculated. CV% > 20% is used as quality benchmark for the replica spots.
Please refer to the IMMUNOME protein array protocol for more information on how the data is preprocessed.
The data is normalised using a combination of quantile and intensity-based methods. Please refer to the IMMUNOME protein array protocol for detailed calculations.
Data was analysed via a combination of penetrance fold change and penetrance frequency. Please refer to the IMMUNOME protein array protocol for detailed calculations on how the penetrance fold change is derived.
With the availability of sample cohorts i.e Case and Control, we adapt penetrance-based fold change (pFC) analysis method to identify proteins with high intensity in each case sample. Utilising this method will eliminate any false positive signals from the analyses. This is achieved by implementing a protein-specific threshold i.e. background threshold.
The per-protein background threshold is calculated based on the signal intensities for each specific protein measured for a given cohort of healthy control samples.
Putative biomarkers are identified and ranked according to the following criteria:
1. Penetrance fold change for case ≥ 2
2. % Frequency for case ≥ 10%
The t-test takes into account the intrinsic variability in the positive signals across the samples as a function of the fold change between case and control. However, we’d expect the intrinsic variability in true positive signals to be quite high, even though the intrinsic variability in the signals from the controls should be low, so it may be that a t-test isn’t the most appropriate statistical test to run here.
Multiple testing correction then adds a further layer of stringency, essentially trying to turn the initial p-value into a false discovery probability. Multiple testing correction assumes that the individual tests are independent but pathway analyses often show that the identified antigens are in fact not truly independent i.e. their expression is linked somehow.
There are two positive controls to ensure the quality of experiment.
- The IgG dilution series acts as a positive control for assessing the binding capacity of fluorescent-conjugated secondary incubation. Accurate serial dilution quantification is used as a benchmark for ensuring that labelling efficiency and spot detection pass quality control thresholds.
- Cy3-BSA controls act as positive controls for each array on the slide. 23 Cy3-BSA markers are present on each slide and each of their concentrations are kept constant throughout the experiment. Hence, it is considered as a housekeeping probe for normalisation of the signal intensities.
For a detailed explanation about controls, please refer to the IMMUNOME README.
Age-matched healthy controls are important to determine baseline immune responses which can differentiate disease specific responses. KREX technology is sensitive enough to highlight age-specific immune responses in healthy controls which can be used as background threshold to identify biomarkers in case samples.