This documentation file was generated on 2019-08-27 by Dariia Vyshenska ------------------- # GENERAL INFORMATION ------------------- 1. Title of Dataset Identification of Cervical Cancer Key Regulators using Network Biology Approach - Supplementary tables 2. Creator Information Name: Dariia Vyshenska Institution: Oregon State University College, School or Department: College of Pharmacy Email: vyshensd@oregonstate.edu Name: Andrey Morgun Institution: Oregon State University College, School or Department: College of Pharmacy Email: andriy.morgun@oregonstate.edu Name: Heidi Lyng Institution: Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway College, School or Department: Institute for Cancer Research 3. Collaborator information Name: Natalia Shulzhenko Institution: Oregon State University College, School or Department: Name: Richard Rosario Rodrigues Institution: Oregon State University College, School or Department: Name: Anja Nilsen Institution: Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway College, School or Department: Name: Kaito Hioki Institution: Oregon State University College, School or Department: Name: Kimberly White Institution: Oregon State University College, School or Department: 3. Contact Information Name: Andrey Morgun Institution: Oregon State University College, School or Department: College of Pharmacy Email: andriy.morgun@oregonstate.edu ------------------- CONTEXTUAL INFORMATION ------------------- 1. Abstract for the dataset This data contains large supplementary tables for the PhD dissertation of Dariia Vyshenska. Title of the Dissertation: "Identification of Cervical Cancer Key Regulators using Network Biology Approach" Dataset contains data from two researches. Research #1: Identification of bacterial regulators of cervical cancer gene expression. It contains microbial community's data from cervical cancer samples of human patients. Prevotella bivia was identified as a key bacterial regulator of cervical cancer gene expression using reconstruction of transkingdom network from patient's gene expression and bacterial abundance data. The data contains transkingdom network, information about bacteria found in each sample of cervical cancer, qRT PCR primers that were used to test host genes for being regulated by P. bivia. Research #2: Identification of key cervical cancer driver genes and their combinations that are critical for cancer proliferation. We identified 9 host genes responsible for cancer cell growth and identified the pairs of these driver genes that have the highest impact on the cancer proliferation. To achieve this goal, we identified targets of each driver gene and reconstructed gene co-expression network out of the union of these targets. The data contains gene co-expression network, qRT PCR primers for 34 tested potential drivers, network measurements for each confirmed driver gene (average shortest path, number of proliferation associated targets), and proliferation measurements for single driver or driver pairs knock downs. 2. Context of the research project that this dataset was collected for. All files are related to the Cervical Cancer project, Morgun & Shulzhenko Lab. 3. Date of data collection: 4. Geographic location of data collection: 5. Funding sources that supported the collection of the data: -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: The data is published under a Creative Commons Attribution 4.0 (CC-BY) license. https://creativecommons.org/licenses/by/4.0/ Different parts of the dataset are also published in several journals and in the dissertation (Dariia Vyshenska (2019) Identification of Cervical Cancer Key Regulators using Network Biology Approach (Doctoral dissertation)) Before using any part of this dataset please check licenses in all versions of the dataset (this information can be found in dissertation of Dariia Vyshenska (2019) Identification of Cervical Cancer Key Regulators using Network Biology Approach (Doctoral dissertation)). 2. Links to publications related to the dataset: This dataset supplements the dissertation: Dariia Vyshenska (2019) Identification of Cervical Cancer Key Regulators using Network Biology Approach (Doctoral dissertation). Oregon State University, Corvallis, USA. Vyshenska D, Lam KC, Shulzhenko N, Morgun A: Interplay between viruses and bacterial microbiota in cancer development. Seminars in immunology 2017, 32:14-24 doi: 10.1016/j.smim.2017.05.003 Lam KC, Vyshenska D, Hu J, Rodrigues RR, Nilsen A, Zielke RA, Brown NS, Aarnes EK, Sikora AE, Shulzhenko N et al: Transkingdom network reveals bacterial players associated with cervical cancer gene expression program. PeerJ 2018, 6:e5590 doi: 10.7717/peerj.5590 3. Recommended citation for the data: 4. Dataset Digital Object Identifier (DOI) 10.7267/f4752p203 -------------------------- VERSIONING AND PROVENANCE -------------------------- 1. Last modification date 2019-07-21 There are other published versions of this dataset: please, refer to the Dariia Vyshenska's doctoral dissertation (see citation under "Links to publications related to the dataset") for information on different versions and links to this dataset. -------------------------- METHODOLOGICAL INFORMATION -------------------------- The methods that were followed to generate this dataset can be found in Dariia Vyshenska's doctoral dissertation (see citation under "Links to publications related to the dataset"). --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: Supplemental File S1. Gene Co-expression Network.cys Short description: A cytoscape file of the cervical cancer gene co-expression network (from Chapter 4 of the Thesis). B. Filename: Supplemental Table S1.csv Short description: total amounts of bacterial DNA and numbers of 16S reads acquired for cervical cancer patients' biopsies. C. Filename: Supplemental Table S2_file1.csv Short description: Identified operational taxonomic units (OTU) frequencies in cervical cancer patients' biopsies. D. Filename: Supplemental Table S2_file2.csv Short description: OTU taxonomical assignments and mean/median in cervical cancer biopsies. E. Filename: Supplemental Table S3.csv Short description: Transkingdom network of cervical cancer gene expression and cervical cancer microbiome, Chapter 3 (nodes table). F. Filename: Supplemental Table S4s.csv Short description: mean frequencies reported for Prevotella and Lactobacillus (as well as P. bivia and L. crispatus) in vagina and healthy cervix reported previously in the literature. G. Filename: Supplemental Table S5.csv Short description: List of qRT-PCR primers used from LAMP3,STAT1,TAP1,BTN3A2,HLA-A,IFI44L,IFI44,DDN,18S. H. Filename: Supplemental Table S6. List of siRNA used in the experiments.csv Short description: List of siRNA used in the experiments to knock down cervical cancer cell proliferation regulating gene drivers, from Chapter 4. J. Filename: Supplemental Table S7. Supplemental Table S7. List of qRT-PCR primers used in the experiments.csv Short description: List of qRT-PCR primers used to measure efficiency of siRNA knock down of cervical cancer cell proliferation regulating gene drivers, from Chapter 4.xlsx K. Filename: Supplemental Table S8. Single Driver KD Proliferation Measurements.csv Short description: Single driver knock downs proliferation measurements for ME180 cell line at 72 hours, from Chapter 4. L. Filename: Supplemental Table S9. Network Edges.csv Short description: Cervical cancer gene co-expression network (table of edges) from Chapter 4. M. Filename: Supplemental Table S10. Network Nodes With Attributes.csv Short description: Cervical cancer gene co-expression network (table of nodes) from Chapter 4. N. Filename: Supplemental Table S11. Subnetworks pathway enrichment.csv Short description: Results of molecular pathway enrichment analysis for three subnetworks identified in cervical cancer gene co-expression network, in Chapter 4. O. Filename: Supplemental Table S12. Average Shortest Path Lengths And Unique Prol. Assoss. Targets.csv Short description: List of average shortest path lengths and total numbers of unique proliferation associated targets calculated for all possible pairs of cervical cancer proliferation regulating drivers, from Chapter 4. P. Filename: Supplemental Table S13. Driver Pair KD Proliferation Measurements.csv Short description: Proliferation measurements for all possible cervical cancer gene driver pair knock downs, from Chapter 4. 2. Relationship between files: Each file corresponds to a different table of the dissertation. The visualization of the file Supplemental_File_S1._Gene_Co-expression_Network.cys can be recreated with the data in Supplemental_Table_S9._Network_Edges.csv and Supplemental_Table_S10._Network_Nodes_With_Attributes.csv 3. Formats csv cys (can be opened with Cytoscape software: https://cytoscape.org/what_is_cytoscape.html ) --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental_Table_S1.csv --------------------- 1. Variables Name of variable: description (units) PatientID:patient identifier SampleID: sample identifier Histology:histological diagnosis Values: SCC:Squamous cell carcinoma Adeno SCC: Adenocarcinoma pg bact DNA/10ng total 16S reads: picograms of bacterial DNA per 10 ng of total 16S reads 2. Abbreviations SCC:Squamous cell carcinoma 3. Null values No null values in this file. --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental_Table_S2_file1.csv --------------------- 1. Variables Name of variable: description (units) OTU_ID: operational taxonomic unit identifier P-#: columns containing information for the respective patient. 2. Abbreviations OTU: operational taxonomic unit 3. Null values NA: not available --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental_Table_S2_file2.csv --------------------- 1. Variables Name of variable: description (units) OTUID: operational taxonomic unit identifier ConsensusLineage:consensus assignations of each operational taxonomic unit otu_mean: mean of operational taxonomic unit freqencies in the sample otu_median: median of operational taxonomic unit freqencies in the sample 2. Abbreviations OTU: operational taxonomic unit 3. Null values NA: not available --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental_Table_S3.csv --------------------- 1. Variables Name of variable: description (units) name: name of the node in the network node_type: type of the node (gene or bacteria) Values: microbe: bacteria degs: differential expressed genes subnet:subnetwork name the node belongs to taxonomy:taxonomy of the bacterial node otu_mean:mean frequencies for the node in patient biopsies otu_median:median frequencies for the node in patient biopsies bc_norm_otu_antiviral:betweenness centrality calculated for antiviral genes bc_norm_otu_cellcycle:betweenness centrality calculated for cell cycle genes bc_norm_otu_degs: betweenness centrality calculated for all differentially expressed genes bc_norm_otu_epithelialcelldifferentiation:betweenness centrality calculated for epithelial cell differentiation genes 2. Abbreviations OTU: operational taxonomic units 3. Null values blank: data not available NA:data not available --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental_Table_S4.csv --------------------- 1. Variables Name of variable: description (units) PMID: pubmed identifier of the research article this data was collected from V6: variable region 6 of bacterial 16S gene Vaules: -:was not used for identification of bacterial community +:was used for identification of bacterial community V5:variable region 5 of bacterial 16S gene Vaules: -:was not used for identification of bacterial community +:was used for identification of bacterial community V4:variable region 4 of bacterial 16S gene Vaules: -:was not used for identification of bacterial community +:was used for identification of bacterial community V3:variable region 3 of bacterial 16S gene Vaules: -:was not used for identification of bacterial community +:was used for identification of bacterial community V2:variable region 2 of bacterial 16S gene Vaules: -:was not used for identification of bacterial community +:was used for identification of bacterial community V1:variable region 1 of bacterial 16S gene Vaules: -:was not used for identification of bacterial community +:was used for identification of bacterial community Sample type: type of the sample collected for the research Values: Biopsy: biopsy of cervix cervix endo: endocervical swab cervix ecto:ectocervical swab vaginal swab: vaginal swab vaginal swabs:endocervical swab Sample status: diagnosis of the patients used for the research Values: CC: cervical cancer HPV-: human papilloma virus negative Normal: healthy Prevotella Freq (%): frequency of Prevotella Prevotella bivia (%): frequency of Prevotella bivia Lactobacillus Freq (%): frequency of Lactobacillus L.crispatus (%): frequency of Lactobacillus crispatus Bacillaceae: frequency of Bacillaceae Halobacteriaceae: frequency of Halobacteriaceae Peptoniphilus: frequency of Peptoniphilus Anaerococcus: frequency of Anaerococcus 2. Abbreviations PMID: pubmed identifier CC: cervical cancer HPV: human papilloma virus 3. Null values NA: not available --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental_Table_S5.csv --------------------- 1. Variables Name of variable: description (units) Gene name: gene names for which the primer was designed. F/R: Indicator if the primer is forward or reverse. Values: Forward: forward primer Reverse: reverse primer Sequence: DNA sequence of the primer Location: location of the DNA sequence in human genome for which this primer was designed Amplicon size: size of the amplicon produced by the combination of forward and reverse primers (base pairs) 3. Null values No null values in this file. --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental Table S6. List of siRNA used in the experiments.csv --------------------- 1. Variables Name of variable: description (units) siRNA target: gene that is targeted by siRNA Supplier: supplier from which this siRNA was bought Values: TheromFisher: ThermoFisher Scientific siRNA ID: identifier of siRNA in ThermoFisher Scientific catalog siRNA type: type of siRNA (supplier's variation) Values: Silencer Select: SilenceSelect type of siRNA (has improved performance over the previous generations siRNAs) 3. Null values No null values in this file. --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental Table S7. List of qRT-PCR primers used in the experiments.csv --------------------- 1. Variables Name of variable: description (units) Target: gene names for which the primer was designed. Forward/Reverse: Indicator if the primer is forward or reverse. Values: Forward: forward primer Reverse: reverse primer Primer Sequence (5' ->3'): DNA sequence of the primer 3. Null values No null values in this file. --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental Table S8. Single Driver KD Proliferation Measurements.csv --------------------- 1. Variables Name of variable: description (units) Targeted Driver: gene that was knocked down using siRNA and for which cell proliferation was measured experiment #: measurement of cell proliferation for the particular driver in the appropriate experiment (experiments 1-8) 3. Null values blank: means this driver was not tested in this experiment. --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental Table S9. Network Edges.csv --------------------- 1. Variables Name of variable: description (units) NODE1: first node of the edge, represents gene (Ensembl gene identifier) NODE2: second node of the edge, represents gene (Ensembl gene identifier) cor_direction: sign of Spearman correlation for the edge (edge means there is a correlation between this two nodes) Values: 1: positive Spearman correlation between expression of two genes -1: negative cSpearman orrelaiton between expression of two genes corcoef_median: Spearman correlation coefficient median across all data cohorts analysed fish_pval: fisher exact p-value for correlation coefficients across all data cohorts analysed fish_fdr: false discovery rate for fisher exact p-values from previous column PUC: Proportion of Unexpected Correlations Values: 1: the edge passes proportion of unexpected correlations test blank: the gene was inconsistenly regulated (in terms of correlation coefficient) by different cervical cancer drivers and proportion of unexpected correlations test could not be performed on this edge. 2. Abbreviations PUC: Proportion of Unexpected Correlations 3. Null values blank: proportion of unexpected correlations could not be calculated for this edge. --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental Table S10. Network Nodes With Attributes.csv --------------------- 1. Variables Name of variable: description (units) NODES: node identifier, represents gene (Ensembl gene identifier) GENE_SYMBOL: official name of the gene or locus PvsGE_corDir: correlation between cell proliferation and gene expression, represents the sign of correlation coefficient Values: 1: positive correlation -1: negative correlation blank: no statistically significant correlation detected PvsGE_corMEDIAN:correlation between cell proliferation and gene expression, represents median values of correlation coefficinents Values: blank: no statistically significant correlation detected PvsGE_fish:correlation between cell proliferation and gene expression, represents fisher exact p-value of correlation coefficient between different cohorts Values: blank: no statistically significant correlation detected PvsGE_ffdr:correlation between cell proliferation and gene expression, represents false discovery rate for fisher exact p-values (previous column) Values: blank: no statistically significant correlation detected FC_consistency: upregulation or downregulation of the gene in driver knock down experiments Values: -1: negatively regulated in driver knock down experiments 1: positively regulated in driver knock down experiments blank: regulated both negatively and positively in experiments with different driver knock downs REG_DRIVERS_NUMB: number of the drivers that regulate this gene FCvsP_PUC: this column indicates if this gene passed percentage of unpredicted correlations test Values: 1: passed blank: percentage of upredicted correlations test could not be applied for this gene ANP32E_target: Column represents if the gene is the target of ANP32E driver gene Values: 1: this gene is the target of this driver gene 0: this gene is not the target of this driver gene CDCA8_target: Column represents if the gene is the target of CDCA8 driver gene Values: 1: this gene is the target of this driver gene 0: this gene is not the target of this driver gene DTL_target: Column represents if the gene is the target of DTL driver gene Values: 1: this gene is the target of this driver gene 0: this gene is not the target of this driver gene EXO1_target: Column represents if the gene is the target of EXO1 driver gene Values: 1: this gene is the target of this driver gene 0: this gene is not the target of this driver gene ITGB3BP_target: Column represents if the gene is the target of ITGB3BP driver gene Values: 1: this gene is the target of this driver gene 0: this gene is not the target of this driver gene NEK2_target: Column represents if the gene is the target of NEK2 driver gene Values: 1: this gene is the target of this driver gene 0: this gene is not the target of this driver gene S100PBP_target: Column represents if the gene is the target of S100PBP driver gene Values: 1: this gene is the target of this driver gene 0: this gene is not the target of this driver gene TPX2_target: Column represents if the gene is the target of TPX2 driver gene Values: 1: this gene is the target of this driver gene 0: this gene is not the target of this driver gene FC_ANP32E: for this gene expression this column represents the ratio of expression in ANP32E (driver gene) knocked down samples to scramble control transfected samples (control) Values: blank: this gene was found not to be statistically significantly regulated by this driver gene FC_CDCA8: for this gene expression this column represents the ratio of expression in CDCA8 (driver gene) knocked down samples to scramble control transfected samples (control) Values: blank: this gene was found not to be statistically significantly regulated by this driver gene FC_DTL: for this gene expression this column represents the ratio of expression in DTL (driver gene) knocked down samples to scramble control transfected samples (control) Values: blank: this gene was found not to be statistically significantly regulated by this driver gene FC_EXO1: for this gene expression this column represents the ratio of expression in EXO1 (driver gene) knocked down samples to scramble control transfected samples (control) Values: blank: this gene was found not to be statistically significantly regulated by this driver gene FC_ITGB3BP: for this gene expression this column represents the ratio of expression in ITGB3BP (driver gene) knocked down samples to scramble control transfected samples (control) Values: blank: this gene was found not to be statistically significantly regulated by this driver gene FC_NEK2: for this gene expression this column represents the ratio of expression in NEK2 (driver gene) knocked down samples to scramble control transfected samples (control) Values: blank: this gene was found not to be statistically significantly regulated by this driver gene FC_S100PBP: for this gene expression this column represents the ratio of expression in S100PBP (driver gene) knocked down samples to scramble control transfected samples (control) Values: blank: this gene was found not to be statistically significantly regulated by this driver gene FC_TPX2: for this gene expression this column represents the ratio of expression in TPX2 (driver gene) knocked down samples to scramble control transfected samples (control) Values: blank: this gene was found not to be statistically significantly regulated by this driver gene sNW1: this column represents if the gene belongs to subnetwork #1 in the network Values: 1: belongs 0: does not belong sNW2: this column represents if the gene belongs to subnetwork #2 in the network Values: 1: belongs 0: does not belong sNW3: this column represents if the gene belongs to subnetwork #3 in the network Values: 1: belongs 0: does not belong 2. Abbreviations FC: fold change GE: gene expression P: proliferation PUC: proportion of unpredicted correlations sNW1:subnetwork #1 sNW2:subnetwork #2 sNW3:subnetwork #3 3. Null values blank: no statistically significtan relationship found --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental Table S11. Subnetworks pathway enrichment.csv --------------------- 1. Variables Name of variable: description (units) GeneSetName: name of the submetwork in the full network the pathway belongs to Valies: sNW1: subnetwork #1 sNW2: subnetwork #2 sNW3: subnetwork #3 Pathway Name: name of the pathway in the database Pathway Id: identifier of the pathway in the database Source Name: name of the database the pathway was found in Pathway p-value: p-value (statistical significance level) of the pathway Pathway p-value (corrected): corrected p-value (statistical significance level) of the pathway 2. Abbreviations sNW1:subnetwork #1 sNW2:subnetwork #2 sNW3:subnetwork #3 Id: identifier 3. Null values No null values in this file. --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental Table S12. Average Shortest Path Lengths And Unique Prol. Assoss. Targets.csv --------------------- 1. Variables Name of variable: description (units) Targeted Driver Pair:two names of the drivers that were knocked down in the experiment (separated by "+") Average Shortest Path Length (all targets):average shortest path length in the full network between all targets of these two driver genes. Average Shortest Path Length (prolif. corr. targets):average shortest path length in the full network between only proliferation associated targets of these two driver genes. Average Shortest Path Length (prolif. non-corr. targets):average shortest path length in the full network between only proliferation non-associated targets of these two driver genes. # of unique targets associated with proliferation: total number of unique targets of both drivers (union) associated with proliferation. 2. Abbreviations No abbreviations in this file. 3. Null values No null values in this file. --------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Supplemental Table S13. Driver Pair KD Proliferation Measurements.csv --------------------- 1. Variables Name of variable: description (units) Targeted Driver/Driver Pair: name of the knocked down driver or a pair of driver genes (separated by "+") KD type: type of the knock down - sngle or double knock down Values: single: single knock down where single driver gene was knocked down double: double knock down where a pair of driver genes was knocked down experiment 1: experiment #1 proliferation percentage of knocked down sample from scramble control (scramplbe control sample = 100%) experiment 2: experiment #2 proliferation percentage of knocked down sample from scramble control (scramplbe control sample = 100%) experiment 3: experiment #3 proliferation percentage of knocked down sample from scramble control (scramplbe control sample = 100%) experiment 4: experiment #4 proliferation percentage of knocked down sample from scramble control (scramplbe control sample = 100%) experiment 5: experiment #5 proliferation percentage of knocked down sample from scramble control (scramplbe control sample = 100%) mean: mean of measurements in experiment 1-5 median: median of measurements in experiment 1-5 2. Abbreviations No abbreviations in this file. 3. Null values No null values in this file.