Supplementary web site for

Kevin Y. Yip and Mark Gerstein,

Training Set Expansion: An Approach to Improving the Reconstruction of Biological Networks from Limited and Uneven Reliable Interactions

Jump to: Datasets Supplementary Tables Supplementary Figures

Datasets

Dataset S1: BioGRID-10
This dataset contains all BioGRID interactions of Saccharomyces cerevisiae (version 2.0.44) that satisfy the following criteria:
  1. Having one of the following physical interaction types:
    • FRET
    • Protein-peptide
    • Co-crystal Structure
    • Co-fractionation
    • Co-purification
    • Reconstituted Complex
    • Biochemical Activity
    • Affinity Capture-Western
    • Two-hybrid
    • Affinity Capture-MS
  2. From one of the small-scale studies, defined as studies that report less than 10 physical interactions to BioGRID
  3. The involving proteins/genes have valid values from all the features for learning
The dataset contains 5,126 interactions that involve 2,328 yeast proteins. Download

Dataset S2: BioGRID-200
This dataset is similar to dataset S1, except that small-scale studies are defined as studies that report less than 200 physical interactions to BioGRID. Notice that since the four high-throughput datasets used as data features all have more than 200 interactions, they are not included in this dataset.

The dataset contains 12,155 interactions that involve 3,222 yeast proteins. Download

Dataset S3: DIP_MIPS_iPfam
This dataset contains the union of all interactions from DIP (7 Oct 2007 version), MIPS (18 May 2006 version) and iPfam (version 21 of Pfam) that satisfy the following criteria:
  1. For interactions in DIP, only those identified in small-scale experiments or multiple experiments are considered
  2. For interactions in MIPS, only the physical, non-Yeast two hybrid and non-TAP-MS ones are considered
  3. The involving proteins/genes have valid values from all the features for learning
The dataset contains 3,201 interactions that involve 1,681 yeast proteins. Download
 

Supplementary Tables

Supplementary table S1: complete set of prediction accuracies on BioGRID-10 (in percentage of AUC)
Color code: red - rank 1, green - rank 2, blue - rank3 of each mode
  phy loc exp-gasch exp-spellman y2h-ito y2h-uetz tap-gavin tap-krogan int
Mode 1
                 
direct 58.04 66.55 64.61 57.41 51.52 52.13 59.37 61.62 70.91
kCCA 65.80 63.86 68.98 65.10 50.89 50.48 57.56 51.85 80.98
kML 63.87 68.10 69.67 68.99 52.76 53.85 60.86 57.69 73.47
em 71.22 75.14 67.53 64.96 55.90 53.13 63.74 68.20 81.65
local SVM 71.53 71.17 70.35 68.98 67.26 67.25 64.59 67.48 74.77
local+pp SVM 72.07 69.64 76.02 73.54 71.50 71.46 74.41 71.09 82.94
local+ki SVM 71.72 71.15 75.84 71.00 69.32 69.03 70.66 71.89 81.75
local+pp+ki SVM 71.78 70.40 76.73 71.37 70.42 70.43 73.49 72.47 83.19
local SVR 71.67 71.41 72.66 70.63 67.27 67.27 64.60 67.48 75.65
local+pp SVR 73.89 75.25 77.43 75.35 71.60 71.51 74.62 71.39 83.63
local+ki SVR 71.68 71.42 75.89 70.96 69.40 69.05 70.53 72.03 81.74
local+pp+ki SVR 72.40 75.19 77.41 73.81 70.44 70.57 73.59 72.64 83.59
Mode 2
                 
direct 59.99 67.81 66.18 59.22 54.02 54.64 62.28 63.69 72.34
Pkernel 72.98 69.84 78.61 77.30 57.01 54.65 71.16 70.36 87.34
local SVM 76.17 78.68 76.07 73.46 72.26 72.23 68.39 72.48 81.29
local+pp SVM 75.85 73.66 79.71 75.61 74.05 73.80 75.89 75.10 87.80
local+ki SVM 76.06 78.70 79.02 73.32 72.68 72.03 71.22 75.55 85.53
local+pp+ki SVM 76.32 73.73 79.99 75.48 73.58 73.35 74.98 75.87 87.62
local SVR 76.89 78.73 79.72 77.32 72.93 72.89 68.81 73.15 82.82
local+pp SVR 77.71 80.71 82.56 80.62 74.74 74.41 76.36 75.12 88.78
local+ki SVR 76.76 78.73 80.62 76.44 73.39 72.76 72.42 76.22 86.12
local+pp+ki SVR 77.45 80.57 81.93 78.92 74.14 74.01 75.59 76.59 88.56
Mode 3 (mode 2 with
self-interactions removed)
                 
direct 57.72 66.69 64.23 56.86 51.36 52.01 60.10 61.60 70.75
Pkernel 72.01 68.89 77.89 76.37 56.24 53.97 71.48 69.67 87.13
local SVM 76.47 78.56 76.27 73.88 72.57 72.54 68.64 72.81 81.39
local+pp SVM 75.84 73.41 79.93 76.16 74.48 74.21 76.38 75.63 87.79
local+ki SVM 76.40 78.57 79.56 73.90 72.92 72.35 71.63 75.94 85.43
local+pp+ki SVM 76.51 73.43 80.32 75.66 73.70 73.60 75.62 76.24 87.62
local SVR 77.17 78.71 79.87 77.56 73.21 73.18 69.05 73.44 82.97
local+pp SVR 78.18 80.44 82.57 80.41 75.05 74.83 76.76 75.70 88.87
local+ki SVR 77.10 78.71 80.74 76.41 73.51 72.97 72.72 76.53 85.96
local+pp+ki SVR 77.52 80.51 81.73 78.51 74.27 74.09 76.10 76.85 88.55

Supplementary table S2: complete set of prediction accuracies on BioGRID-200 (in percentage of AUC)
Color code: red - rank 1, green - rank 2, blue - rank3 of each mode
  phy loc exp-gasch exp-spellman y2h-ito y2h-uetz tap-gavin tap-krogan int
Mode 1
                 
direct 58.89 66.32 65.44 59.68 51.87 51.28 63.98 64.56 71.59
kCCA 69.14 66.36 72.30 62.74 53.52 50.85 63.23 58.49 85.73
kML 65.86 68.57 73.79 73.41 55.00 56.12 64.41 62.67 68.82
em 73.60 75.78 68.66 67.55 56.10 53.47 68.76 70.48 80.89
local SVM 76.67 76.78 78.92 77.49 75.08 75.07 71.24 75.34 82.56
local+pp SVM 76.35 75.85 80.02 78.29 75.86 76.48 77.63 76.51 85.36
local+ki SVM 75.88 76.71 80.42 78.04 75.55 75.27 75.15 76.91 85.46
local+pp+ki SVM 76.51 75.73 80.68 78.00 75.91 75.83 76.83 77.10 85.57
local SVR 77.18 76.48 80.23 79.02 75.08 75.07 71.91 75.34 83.09
local+pp SVR 77.60 78.92 81.98 80.59 76.10 76.48 76.67 76.54 85.98
local+ki SVR 75.79 76.50 80.87 78.59 75.59 75.33 75.03 76.96 85.42
local+pp+ki SVR 76.06 78.94 81.71 79.58 75.98 75.94 76.73 77.15 85.83
Mode 2
                 
direct 60.52 66.81 66.97 61.41 54.01 53.70 65.19 65.81 72.50
local SVM 83.37 83.96 84.94 83.22 81.74 81.75 75.47 81.95 88.76
local+pp SVM 83.26 83.14 86.15 84.23 81.68 81.89 81.34 82.85 91.37
local+ki SVM 81.84 84.00 86.02 82.77 81.30 80.98 78.31 82.63 89.99
local+pp+ki SVM 82.20 83.06 86.17 82.38 81.54 81.54 80.91 82.76 91.16
local SVR 83.88 83.30 86.79 85.54 82.68 82.71 76.36 82.89 89.92
local+pp SVR 84.37 85.62 88.12 87.00 82.43 82.87 80.61 83.65 91.82
local+ki SVR 82.31 83.29 86.93 84.16 82.29 81.99 79.02 83.65 90.17
local+pp+ki SVR 82.63 85.55 87.02 85.03 82.44 82.54 81.09 83.80 91.51
Mode 3 (mode 2 with
self-interactions removed)
                 
direct 58.91 65.99 65.61 59.81 52.10 51.79 63.74 64.40 71.37
local SVM 83.63 84.16 85.15 83.55 82.04 82.06 75.72 82.25 88.90
local+pp SVM 83.32 83.70 86.71 84.60 82.33 82.45 81.79 83.59 91.46
local+ki SVM 82.07 84.20 86.54 83.22 81.78 81.49 78.77 83.11 90.05
local+pp+ki SVM 82.50 83.55 86.66 82.75 82.10 82.02 81.64 83.45 91.28
local SVR 84.10 83.51 86.99 85.78 82.99 83.01 76.61 83.20 90.09
local+pp SVR 84.74 85.71 88.21 87.00 82.82 83.37 81.29 84.31 91.88
local+ki SVR 82.49 83.51 87.35 84.27 82.65 82.39 79.41 84.03 90.18
local+pp+ki SVR 82.67 85.74 87.43 85.11 82.79 82.87 81.67 84.25 91.55

Supplementary table S3: complete set of prediction accuracies on DIP_MIPS_iPfam (in percentage of AUC)
Color code: red - rank 1, green - rank 2, blue - rank3 of each mode
  phy loc exp-gasch exp-spellman y2h-ito y2h-uetz tap-gavin tap-krogan int
Mode 1
                 
direct 63.09 64.23 68.60 62.24 53.40 57.34 63.46 64.58 73.68
kCCA 68.78 62.24 70.93 66.85 55.25 56.70 62.88 62.59 74.45
kML 65.04 67.58 70.09 69.80 58.12 59.90 63.72 61.19 77.58
em 63.22 67.90 65.15 61.74 56.23 58.31 68.02 62.92 78.46
local SVM 72.45 69.90 71.45 69.02 66.56 66.53 64.95 66.92 74.28
local+pp SVM 73.00 70.38 75.69 74.12 72.10 72.10 75.84 71.83 83.26
local+ki SVM 73.67 69.89 76.89 72.01 69.80 69.25 72.75 72.41 82.44
local+pp+ki SVM 72.93 70.76 77.46 72.17 70.86 70.78 74.47 72.81 83.20
local SVR 72.85 70.50 72.89 70.60 66.58 66.56 64.97 66.93 74.76
local+pp SVR 74.48 74.99 78.09 75.89 72.02 72.09 75.88 71.56 83.72
local+ki SVR 74.03 70.47 76.87 72.87 69.88 69.39 72.80 72.43 82.41
local+pp+ki SVR 73.62 74.92 78.35 75.08 70.93 70.97 74.48 73.01 83.39
Mode 2
                 
direct 67.57 66.48 71.54 66.24 57.74 61.52 67.46 68.86 76.53
Pkernel 73.51 68.24 78.91 77.08 58.10 58.51 72.65 69.98 85.04
local SVM 77.78 77.79 76.67 73.99 72.93 72.98 68.68 73.23 81.10
local+pp SVM 77.42 75.15 79.94 77.10 76.21 76.20 78.45 76.28 87.10
local+ki SVM 78.31 77.80 80.86 75.24 75.52 73.99 74.65 77.51 85.95
local+pp+ki SVM 77.71 74.93 81.38 75.46 75.61 75.83 77.37 78.03 86.75
local SVR 78.78 77.80 79.84 77.38 73.46 73.49 69.01 73.72 82.12
local+pp SVR 79.25 81.65 83.01 81.67 76.76 76.88 79.75 76.99 88.26
local+ki SVR 78.88 77.80 81.55 77.83 76.11 74.62 75.56 78.07 86.51
local+pp+ki SVR 78.78 81.68 82.60 79.90 76.08 76.20 77.79 78.72 87.68
Mode 3 (mode 2 with
self-interactions removed)
                 
direct 63.77 64.30 68.19 62.24 52.71 56.94 63.61 65.17 73.78
Pkernel 72.66 66.76 78.42 75.88 57.31 56.90 73.18 69.76 85.63
local SVM 77.91 77.88 76.83 74.23 73.34 73.40 68.58 73.68 81.18
local+pp SVM 77.81 75.17 79.91 76.99 75.93 75.76 78.28 77.45 86.80
local+ki SVM 78.49 77.88 80.82 75.64 74.94 73.45 73.96 77.05 85.30
local+pp+ki SVM 78.17 75.32 81.02 75.28 75.25 75.04 76.87 77.53 86.58
local SVR 78.95 77.89 80.09 77.66 73.95 74.00 69.02 74.28 82.40
local+pp SVR 79.05 80.91 82.42 80.88 76.33 76.30 79.10 77.45 87.54
local+ki SVR 78.86 77.90 81.36 77.39 75.34 73.87 74.60 77.45 85.74
local+pp+ki SVR 78.34 80.85 82.15 79.01 75.39 75.26 77.22 77.92 87.32
 

Supplementary Figures

Supplementary Figure S1: Prediction accuracy of local modeling with and without training set expansion trained by different sub-samples of the BioGRID-10 dataset.
Download
Supplementary Figure S2: For each interaction in the BioGRID-10 dataset, we computed the difference between the rank of it among all predictions given by training set expansion (local+pp or local+ki), and the best rank among those given by the four methods em, kCCA, kML and local. A positive rank difference indicates that training set expansion was able to rank the correct interaction higher than any of the four methods in comparison. This figure shows the correlations between the positive rank differences and 1) minimum node degree, 2) average node degree, and 3) similarity (inner product according to the integrated kernel) of the two interacting proteins. Correlations and the corresponding p-values are computed using both Pearson and Spearman correlation functions.
Download