major confounding factor -- methanol (wood alcohol) from cigarettes and aspartame circulates with blood half-life 3 hours, entering every cell -- made into uncontrolled formaldehyde inside cells in human tissues with high ADH1 enzyme levels -- WC Monte paradigm: Rich Murray 2013.11.16
- major confounding factor -- methanol (wood alcohol) from cigarettes and aspartame circulates with blood half-life 3 hours, entering every cell -- made into uncontrolled formaldehyde inside cells in human tissues with high ADH1 enzyme levels -- WC Monte paradigm: Rich Murray 2013.11.16Just now, "by chance" I tried a PubMed search for "methanol formaldehyde toxicity"... I share this introduction to three recent papers that exemplify the exponential explosion of often very subtle gene expression research that traces the detailed pathways of harm from very low levels of common toxins -- the last part of this post gives many links for the methanol formaldehyde toxicity paradigm of Prof. Woodrow C. Monte, Food Science and Nutrition, Arizona State University, retired 2004, who gives a free online archive of 745 full text medical research references at his informative site WhileScienceSleeps.com ...Hum Exp Toxicol. 2013 Nov 12. [Epub ahead of print]In vitro study on cytotoxicity and intracellular formaldehyde concentration changes after exposure to formaldehyde and its derivatives.Ke Y,Qin X,Zhang Y,Li H,Li R,Yuan J,Yang X,Ding S.Hubei Key Laboratory of Genetic Regulation and Integrative Biology,College of Life Science,Huazhong Normal University, Wuhan, People's Republic of China.Shumao M Ding <dingsm@...>,Laboratory of Environmental Biomedicine,College of Life Sciences, Central China Normal University,No. 152, Luo-Yu Road, Wuhan City 430079,People’s Republic of China.Email: dingsm@...AbstractHeLa cells were exposed to formaldehyde and its metabolic derivatives, methanol, formic acid, and acetaldehyde,to investigate that the toxicity of formaldehyde is not caused by the chemical group.After 1 h of treatment with formaldehyde, mitochondrial assays showed that low concentrations (e.g. 10 μmol/L) of formaldehyde promoted growth of the HeLa cells,while higher concentrations (e.g. ≥62.5 μmol/L) inhibited cell growth;while all four chemicals at a concentration of 125 μmol/L affected cell growth, formaldehyde affected the largest.Reactive oxygen species concentration increased with the concentration of the exposure chemical.The endogenous formaldehyde content increased the most in the formaldehyde group, but in other three groups, it did not increase as the exposure concentration increased.Expression of dehydrogenase (formaldehyde dehydrogenase (FDH)) in the formaldehyde (10.40) and methanol (10.60) groups increased significantly compared with the control (1),while it was similar to the control in formic acid (0.90)and acetaldehyde (1.10) groups.Our results suggest that formaldehyde could affect cell activity and even enter cells.Exposure to formaldehyde changes the endogenous formaldehyde concentration in cells within 24 h,and this induces expression of FDH for formaldehyde degradation to maintain the formaldehyde balance.The toxicity of formaldehyde is not caused by the carbon atoms in the aldehyde, hydroxyl, or carboxyl groups.Formaldehyde is hypothesized to be an important signaling molecule in the regulation of cell growth and maintenance of the endogenous formaldehyde level.KEYWORDS:Formaldehyde, formaldehyde content, formaldehyde dehydrogenase, reactive oxygen species, real-time qPCRPMID: 24220877Abstract Full Article (HTML) PDF(583K) References Supporting Information Request PermissionsInhaled formaldehyde induces DNA–protein crosslinks and oxidative stress in bone marrow and other distant organs of exposed mice (pages 705–718)Xin Ye 1,Zhiying Ji 2,Chenxi Wei 1,Cliona M. McHale 2,Shumao Ding 1,Reuben Thomas 2,Xu Yang 1,*,Luoping Zhang 2,*Article first published online: 18 OCT 2013DOI: 10.1002/em.21821Copyright © 2013 Wiley Periodicals, Inc.Environmental and Molecular MutagenesisVolume 54, Issue 9, pages 705–718, December 2013Author Information1 Laboratory of Environmental Biomedicine,Hubei Key Laboratory of Genetic Regulation and Integrative Biology,College of Life Sciences,Huazhong Normal University,Wuhan, People's Republic of China2 Genes and Environment Laboratory,Division of Environmental Health Sciences,School of Public Health, University of California, Berkeley, California*Correspondence to:Luoping Zhang <luoping@...>,Genes and Environment Laboratory,Division of Environmental Health Sciences,388 Li-Ka-Shing Center,School of Public Health,University of California, Berkeley, Berkeley, CA 94720, USA.E-mail: luoping@...orXu Yang <yangxu@...>,Laboratory of Environmental Biomedicine,Hubei Key Laboratory of Genetic Regulation and Integrative Biology,College of Life Sciences, Huazhong Normal University,Wuhan 430079, P.R. China.E-mail: yangxu@...Reuben Thomas <reuben.thomas@...>,School of Public Health,50 University Hall, University of California Berkeley, Berkeley, California 94720-7356.E-mail: reuben.thomas@...Cliona M. McHale <cmchale@...>,Division of Environmental Health Sciences,237 Hildebrand Hall, Genes and Environment Laboratory,School of Public Health, University of California, Berkeley, CA 94720, USA.E-mail: cmchale@...Publication HistoryIssue published online: 6 NOV 2013Article first published online: 18 OCT 2013Manuscript Revised: 16 SEP 2013Manuscript Accepted: 16 SEP 2013Manuscript Received: 22 JAN 2013Funded byNational Natural Science Foundation of China. Grant Number: 51136002National Institute of Health. Grant Number: R01ES017452National Institute of Environmental Health SciencesFormaldehyde (FA), a major industrial chemical and ubiquitous environmental pollutant, has been classified as a leukemogen.The causal relationship remains unclear, however, due to limited evidence that FA induces toxicity in bone marrow, the site of leukemia induction, and in other distal organs.Although induction of DNA–protein crosslinks (DPC), a hallmark of FA toxicity, was not previously detected in the bone marrow of FA-exposed rats and monkeys in studies published in the 1980s, our recent studies showed increased DPC in the bone marrow, liver, kidney, and testes of exposed Kunming mice.To confirm these preliminary results, in the current study we exposed BALB/c mice to 0, 0.5, 1.0, and 3.0 mg m−3 FA (8 hr per day, for 7 consecutive days) by nose-only inhalation and measured DPC levels in bone marrow and other organs of exposed mice.As oxidative stress is a potential mechanism of FA toxicity, we also measured glutathione (GSH), reactive oxygen species (ROS), and malondialdehyde (MDA), in the bone marrow, peripheral blood mononuclear cells, lung, liver, spleen, and testes of exposed mice.Significant dose-dependent increases in DPC,decreases in GSH,and increases in ROS and MDA were observed in all organs examined (except for DPC in lung).Bone marrow was among the organs with the strongest effects for DPC, GSH, and ROS.In conclusion, exposure of mice to FA by inhalation induced genotoxicity and oxidative stress in bone marrow and other organs.These findings strengthen the biological plausibility of FA-induced leukemogenesis and systemic toxicity.Keywords:formaldehyde;DPC;oxidative stress;leukemia;bone marrow toxicity"By using good study designs with precise, individual exposure measurements, sufficient power and incorporation of phenotypic anchors, studies in human populations can identify biomarkers of exposure and/or early effect and elucidate mechanisms of action underlying associated diseases, even at low doses.Analysis of datasets at the pathway level can compensate for some of the limitations of RNA-Seq and, as more datasets become available, will increasingly elucidate the exposure-disease continuum."free full text HTMLReview ArticleYou have free access to this contentAnalysis of the transcriptome in molecular epidemiology studiesCliona M. McHale*,Luoping Zhang,Reuben Thomas,Martyn T. Smith <martynts@...>,Article first published online: 1 AUG 2013DOI: 10.1002/em.21798Copyright © 2013 Wiley Periodicals, Inc.Issue Environmental and Molecular MutagenesisEnvironmental and Molecular MutagenesisSpecial Issue: Special Issue on Application of Omics Techniques to Epidemiological StudiesVolume 54, Issue 7, pages 500–517, August 2013McHale, C. M., Zhang, L., Thomas, R. and Smith, M. T. (2013), Analysis of the transcriptome in molecular epidemiology studies. Environ. Mol. Mutagen., 54: 500–517. doi: 10.1002/em.21798Division of Environmental Health Sciences,Genes and Environment Laboratory, School of Public Health,University of California, Berkeley, California*Correspondence to: Cliona M. McHale,Division of Environmental Health Sciences,237 Hildebrand Hall, Genes and Environment Laboratory,School of Public Health, University of California, Berkeley, CA 94720, USA.E-mail: cmchale@...Publication HistoryIssue published online: 14 AUG 2013Article first published online: 1 AUG 2013Manuscript Accepted: 8 JUN 2013Manuscript Revised: 7 JUN 2013Manuscript Received: 20 MAR 2013Funded byNational Institutes of Health. Grant Number: P42ES004705View Full Article (HTML) Get PDF (291K)The human transcriptome is complex, comprising multiple transcript types, mostly in the form of non-coding RNA (ncRNA).The majority of ncRNA is of the long form (lncRNA, ≥ 200 bp), which plays an important role in gene regulation through multiple mechanisms including epigenetics, chromatin modification, control of transcription factor binding, and regulation of alternative splicing.Both mRNA and ncRNA exhibit additional variability in the form of alternative splicing and RNA editing.All aspects of the human transcriptome can potentially be dysregulated by environmental exposures.Next-generation RNA sequencing (RNA-Seq) is the best available methodology to measure this although it has limitations, including experimental bias.The third phase of the MicroArray Quality Control Consortium project (MAQC-III), also called Sequencing Quality Control (SeQC), aims to address these limitations through standardization of experimental and bioinformatic methodologies.A limited number of toxicogenomic studies have been conducted to date using RNA-Seq.This review describes the complexity of the human transcriptome, the application of transcriptomics by RNA-Seq or microarray in molecular epidemiology studies, and limitations of these approaches including the type of cell or tissue analyzed, experimental variation, and confounding.By using good study designs with precise, individual exposure measurements, sufficient power and incorporation of phenotypic anchors, studies in human populations can identify biomarkers of exposure and/or early effect and elucidate mechanisms of action underlying associated diseases, even at low doses.Analysis of datasets at the pathway level can compensate for some of the limitations of RNA-Seq and, as more datasets become available, will increasingly elucidate the exposure-disease continuum.Keywords:transcriptome;biomarker;long non-coding RNA;RNA-Seq;microarrayINTRODUCTIONThe transcriptome is dynamic, continuously responding to changing physiological and environmental conditions in a cell, tissue, or organism, and its analysis provides the first functional readout between the genome and the expressed phenotype.Transcriptomics, the analysis of the transcriptome, has long been a cornerstone of toxicogenomic studies and has been increasingly applied in human molecular epidemiology.Microarray analysis, a hybridization-based methodology, became the most widely used technology for analysis of known transcriptomes due to its low cost, ease of use and analysis, and optimized framework of quality control [Brazma et al., 2001].However, the rapid evolution of next generation RNA sequencing technology (RNA-Seq) [Wang et al., 2009; Pertea, 2012], which directly measures an entire transcriptome encompassing both known and novel components, has expanded our understanding of the scope and complexity of the human transcriptome and the myriad ways in which it can potentially be altered on the exposure-disease continuum.The unfolding knowledge produced by RNA-Seq offers the potential for a deeper understanding of the mechanism of action of chemical exposures as well as new opportunities to identify biomarkers of toxicity and early disease.In this review, we sought to summarize recent developments in our understanding of the human transcriptome and its analysis and to discuss key considerations in the application of transcriptomics in molecular epidemiology studies.We searched the peer-reviewed scientific literature in PubMed through February 2013 using combinations of search terms including transcriptome, transcriptomics, microarray, RNA sequencing, toxicogenomics, disease, molecular epidemiology, exposure, non-coding RNA (ncRNA), long non-coding RNA (lncRNA), small non-coding RNA (sncRNA), and human.COMPOSITION AND FUNCTION OF THE HUMAN TRANSCRIPTOMEThe current concept of a gene -- a DNA sequence that is transcribed to a functional product -- includes protein-coding genes, of which there are ∼22,000 in the human genome [Pertea and Salzberg, 2010], as well as non-protein coding genes, bringing the total number of estimated genes to 30,000–40,000 [Pertea, 2012].Non-coding RNA is broadly categorized as sncRNA (≤ 200 bp) and lncRNA (> 200 bp).This categorization is based on size rather than biological significance;the different ncRNA classes have distinct biogenesis machineries and functions.It was recently estimated that of the base pairs in the human transcriptome, 62% are in the form of mRNAs, 53% lncRNAs, and 0.7% sncRNAs (numbers do not add up to 100% as some base pairs are part of overlapping transcripts that fall into different categories) [Pertea, 2012].As ncRNAs are generally smaller than mRNAs, the vast majority of human transcripts comprise ncRNAs, with lncRNAs and sncRNAs numbering 28,191 and 10,473, respectively, compared with only 8,490 mRNAs [Pertea, 2012].Overall, ncRNAs exhibit complex patterns of expression and regulation and play important roles in transcriptional and post-transcriptional gene regulation via cis- and trans-acting mechanisms, chromatin modification, control of transcription factor binding, and regulation of alternative splicing [Pertea, 2012].sncRNAs include the well-known ribosomal RNA (rRNA) and transfer RNA (tRNA), as well as micro RNA (miRNA), small interfering RNA (siRNA), small nucleolar RNA (snoRNA), PIWI-interacting RNA (piRNA), and small nuclear RNA (snRNA) [Martens-Uzunova et al., 2013].miRNA and siRNA are post-transcriptional modulators of gene expression that bind to specific mRNA targets;snoRNAs guide the chemical modification of other RNAs;snRNA function in the processing of pre-mRNA;and piRNA form RNA-protein complexes through interactions with PIWI proteins and mediate epigenetic and post-transcriptional gene silencing of retrotransposons and other genetic elements in germ line cells.Dysregulation of ncRNA is involved in cancer and in neurological, developmental, cardiovascular and other diseases [Esteller, 2011].lncRNA in Exposure and DiseaselncRNAs are dysregulated in various types of cancer and in other diseases [Taft et al., 2010; Esteller, 2011; Prensner et al., 2011; Moran et al., 2012]and, mechanistically, may act as tumor suppressor genes, e.g., lincp21 [Huarte et al., 2010],as protooncogenes, e.g., GAGE6 [Li et al., 2009]promoters of metastasis, e.g., HOTAIR in breast cancer [Gupta et al., 2010],and as regulators of alternative splicing, e.g., MALAT1 in lung cancer [Tripathi et al., 2010].In a limited number of studies, stress and environmental exposure has been shown to alter expression of lncRNAs, e.g., SatIII in the heat shock response [Jolly and Lakhotia, 2006],several long intergenic non-coding RNA (lincRNAs) regulated by the p53 pathway in the DNA damage response [Huarte et al., 2010],Psoriasis susceptibility-related RNA Gene Induced by Stress (PRINS) by ultraviolet-B irradiation and viral infection [Sonkoly et al., 2005],and several lncRNAs by the tobacco carcinogen nicotine-derived nitrosamine ketone in normal human bronchial epithelial cells [Silva et al., 2010].sncRNA in Exposure and DiseaseExpression of sncRNAs is altered in response to environmental chemical exposures.Using bioinformatics approaches, miRNA target genes were found to be significantly enriched among genes reported to have their expression altered by environmental chemicals [Wu and Song, 2011].miRNA expression is altered by multiple environmental factorsincluding arsenic,metal-rich particulate matter,cigarette smoke,dioxins,and benzo[a]pyrene [Choudhuri, 2010; Wu and Song, 2011; Smirnova et al., 2012],and in various diseases [Rederstorff and Huttenhofer, 2010; Esteller, 2011; Law et al., 2013].ArsenicResults from studies in several human cell lines suggest a role for altered miRNA expression in arsenic toxicity.Exposure of human TK6 lymphoblastoid cells to sodium arsenite altered the expression of five miRNAs (upregulation of hsa-miR-22, hsa-miR-34a, hsa-miR-221, and hsa-miR-222 and downregulation of hsa-miR-210) [Marsit et al., 2006].Long-term exposure of TP53-knockdown human bronchial epithelial cells to low doses of sodium arsenite induced downregulation of miR-200 family members and malignant transformation [Wang et al., 2011].Exposure of human umbilical vein endothelial cells to sodium arsenite induced upregulation of five miRNAs and downregulation of 52 miRNAs, suggesting a role in arsenic-induced vascular injury [Li et al., 2012].Exposure of T24 human bladder carcinoma cells to arsenic trioxide, an anticancer agent, was found to downregulate expression of oncogenic miR-19a and to upregulate expression of PTEN, a target of miR-19a [Cao et al., 2011].Exposure of the acute promyelocytic leukemia cell line, NB4, to pharmacological concentrations, of arsenic trioxide induced apoptosis and upregulation of the expression of 48 miRNAs with tumor and metastatic suppressor function [Ghaffari et al., 2012].Particulate MatterSteel workers at a production plant in Italy (n = 63), who were exposed to particulate matter (PM) and metallic PM components including arsenic, exhibited significantly increased expression of blood leukocyte miR-222 and miR-21 expression after 3 days of work, compared with baseline expression levels [Bollati et al., 2010].Exposure of human bronchial epithelial cells grown at an air–liquid interface, to diesel-generated PM, induced altered expression (>1.5-fold) of 197 miRNAs (130 upregulated, 67 downregulated) [Jardim et al., 2009].Cigarette SmokeCigarette smoke exposure has been shown to alter miRNA expression in human subjects and human cell lines.In 20 healthy subjects (10 active smokers compared with 10 never smokers), Schembri et al.  found that expression of 28 miRNAs was altered (80% downregulated) in the bronchial airway epithelial cells of smokers compared with non-smokers.Follow up studies on miRNA-218 (downregulated 4-fold in the smokers), showed that downregulation of miR-218 induced a number of smoking-related genes in airway epithelium, revealing a potential mechanism of smoking-induced disease risk [Schembri et al., 2009].In a separate study by Mascaux et al. , downregulation of miRNA expression, including miRNA-218, was detected in the biopsied bronchial epithelium of smokers with metaplasia or dysplasia compared with normal epithelium of nonsmokers.Additional RNA DiversityAs well as being comprised of multiple RNA classes, the mammalian transcriptome exhibits additional diversity.More than 90% of multiexon protein-coding genes [Kampa et al., 2004; Wang et al., 2008] and 30% of ncRNA genes [Ravasi et al., 2006; Cabili et al., 2011] undergo alternative splicing, contributing to cellular and functional diversity. RNA editing, a process by which single nucleotide changes occur after RNA has been transcribed, affects both protein-coding and ncRNA genes [Athanasiadis et al., 2004; Sie and Kuchka, 2011; Peng et al., 2012] and may generate even more transcriptome diversity than alternative splicing [Barak et al., 2009].Further complexity in the human transcriptome comes from inherited inter-individual variability in gene expression and sequence.The majority (∼90%) of disease- and trait-associated single nucleotide polymorphisms identified by genome-wide association studies are intronic or intragenic [Freedman et al., 2011].These inherited loci, expression quantitative trait loci (eQTL), may act in cis or trans and account for gene expression variation in the population [Morley et al., 2004; Goring et al., 2007; Idaghdour et al., 2010; Powell et al., 2012].All aspects of the transcriptome can potentially be dysregulated on the exposure-disease continuum and have variously been assessed globally by two main technologies, microarrays and RNA-Seq.ANALYSIS OF THE HUMAN TRANSCRIPTOMEMicroarrayThrough probe-based hybridization, gene expression microarrays analyze the expression of known transcripts at the resolution of genes or, in the case of exon or tiling arrays, of exons and known splicing isoforms [Kirby et al., 2007].The main advantages of microarrays are their affordability and low computational complexity.However, this hybridization-based method suffers from background issues leading to a noisy output signal and cross-hybridization issues leading to false positive signals.Furthermore, microarrays are semiquantitative, have a limited dynamic range due to signal saturation and lack of sensitivity to detect low abundant transcripts.Quality control practices designed to overcome biases and experimental variability and to validate the technology and data analysis in risk assessment have been developed by the MicroArray Quality Control (MAQC) Consortium and have been widely accepted by the scientific community [Shi et al., 2006, 2010].Despite these improvements, microarrays still suffer from limitations including an inability to comprehensively detect novel transcripts or splice variants, which can be detected by RNA-Seq.The contribution of the microarray era to toxicogenomics is considerable as evidenced by repositories of data stored at free, publically accessible databases:Gene Expression Omnibus (GEO) [Barrett et al., 2009, 2011, 2013],CEBS [Waters et al., 2008b],ArrayExpress [Brazma et al., 2003; Parkinson et al., 2005, 2007; Rustici et al., 2013],Japanese Toxicogenomics Project (TGP) [Uehara et al., 2010]and DrugMatrix [Ganter et al., 2006].Microarray analysis has been applied to discovery of biomarkers of exposure and early effect, mechanism of action and risk assessment [Cui and Paules, 2010; McHale et al., 2010; Currie, 2012].We identified genes and pathways significantly altered by benzene exposure in the peripheral blood mononuclear cells of 125 workers exposed to a range of benzene levels (< 1 ppm to > 10 ppm) [McHale et al., 2011].Among the most significant pathways were acute myeloid leukemia (AML) and immune response.A 16-gene expression signature, with the genes having roles in immune response, inflammatory response, cell adhesion, cell–matrix adhesion, and blood coagulation, was associated with all levels of benzene exposure.Next-Generation RNA SequencingRNA-Seq allows the entire transcriptome to be analyzed in a high-throughput and quantitative manner [Wang et al., 2009].In Illumina's sequencing by synthesis (SBS) approach, during each sequencing cycle, fluorescently labeled reversible terminators are incorporated, imaged and cleaved to allow incorporation of the next base.Sequencing is conducted simultaneously on millions of different anchored template molecules in parallel.Amplification of template molecules generates a detectable signal.Unlike hybridization-based methods, RNA-Seq has no saturation bias and relatively low background noise, a much larger dynamic range than microarrays and the ability to detect low expressed genes.Since it is not limited to the detection of known transcripts, the most significant advantage associated with RNA-Seq is its ability to assess diverse aspects of the transcriptome.RNA-Seq has led to the discovery of new species of ncRNAs [MacLean et al., 2009; Zhou et al., 2010] such as piRNA and has increased understanding of the expression and regulation of lncRNAs [Atkinson et al., 2012].It also has unprecedented base pair resolution, allowing the precise identification of exon and intron boundaries.Co-analysis of genotype data and RNA-Seq data in 60–70 individuals led to the discovery of new eQTLs [Montgomery et al., 2010; Pickrell et al., 2010].RNA-Seq has revealed several classes of alterations in cancer cells, revealing pathogenic mechanisms, including fusion transcripts, alternative splicing, allelic imbalance, single nucleotide variations, and RNA editing [Morin et al., 2008; Chepelev, 2012; Pertea, 2012; Costa et al., 2013; Mutz et al., 2013].RNA-Seq has even been applied to identify known and novel microorganisms in high throughput sequencing data generated from human tissue, using a program called PathSeq, which performs computational subtraction to identify non-human nucleic acids [Kostic et al., 2011].Thus, RNA-Seq has the ability and to measure the complete transcriptomic response to stress and stimuli and to clarify the interrelationship among all aspects of gene expression regulation.Despite its significant advantages over microarrays, RNA-Seq suffers from limitations [Fang and Cui, 2011; Schwartz et al., 2011; Hansen et al., 2012; Costa et al., 2013].Standard methods of preparation of RNA for RNA-Seq use selection of polyadenylated (polyA) RNAs or rRNA depletion (ribo-depletion).Some studies have suggested that ribo-depletion may be superior for producing reliable coding and non-coding gene expression data [Cui et al., 2010; Huang et al., 2011].Experimental bias can arise at multiple steps in sample preparation for RNA-Seq, including fragmentation, cDNA amplification efficiency, and PCR amplification.Another source of bias is heterogeneity in coverage across the length of a transcript [Bullard et al., 2010; Hansen et al., 2010].Third-generation sequencing approaches are being developed that directly sequence RNA without the need for cDNA or PCR amplification, but they have high error rates and are costly [Schadt et al., 2010].Challenges in data generation, storage and interpretation also exist, as discussed previously [Costa et al., 2010, 2013; Pertea, 2012; Mutz et al., 2013; Riedmaier and Pfaffl, 2013].Up to 600 gigabases (Gb) can be generated in a single run, equivalent to 200-fold coverage of the human genome [Pertea, 2012].Data analysis consists of aligning the short read sequences to a reference genome or transcriptome or de novo assembly, counting the number of mapped reads, calculating transcript expression level and differential gene expression using normalized gene expression scores and statistical tests.Alignment can be challenging in the case of alternative transcripts with shared exons, strand-specific sequences, and low-abundance transcripts [Pertea, 2012].For differential expression, non-parametric algorithms are less dependent on sequencing depth and may achieve more robust results [Tarazona et al., 2011].The third phase of the MAQC project, MAQC-III, called Sequence Quality Control (SeQC), seeks to assess the technical performance of RNA-Seq platforms through the generation of benchmark data sets with reference samples (http://www.fda.gov/MicroArrayQC/).Tools to assess sequencing performance and library quality, which are critical to the interpretation of RNA-seq data, are being developed such as RNA-SeQC, a program which provides key measures of data quality [DeLuca et al., 2012].RNA-SeQC metrics include yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron, and intragenic), continuity of coverage, 3′/5′ bias and count of detectable transcripts.RNA-Seq has been used to analyze the transcriptome of multiple diseases [Kavanagh et al., 2012; Raghavachari et al., 2012; Costa et al., 2013; Jakhesara et al., 2013; Mills et al., 2013].Fewer toxicogenomic and molecular epidemiology studies have been published on studies using RNA-Seq [Beane et al., 2011; Su et al., 2011; Hackett et al., 2012].Studies comparing RNA-Seq and microarray data have illustrated strengths and weaknesses of both technologies enabling their optimization.Comparison of RNA-Seq and Microarray DataSeveral studies have addressed the overlap of RNA-Seq and microarray data in mammalian cells.Liu et al.  profiled gene expression in chimpanzees and rhesus macaques using high-density Affymetrix Human Exon Junction Array and Illumina RNA-Seq.They identified 40% more differentially expressed genes by RNA-Seq, reflecting the greater dynamic range.They observed a systematic increase in the RNA-Seq error rate for low-expressed genes, which they further confirmed in the comparison of two MAQC human reference RNA samples.This increased error may be due to insufficient coverage; 10–20 million reads were generated per sample in the study.Marioni et al.  reported high coefficients of variance (CV) at low read counts.As RNA-Seq is a sampling method, stochastic events/Poisson error counting are a source of error in the quantification of rare transcripts.Raghavachari et al.  compared the whole blood transcriptomes of six sickle cell disease (SCD] patients and three controls by Illumina RNA-Seq and Affy Human Exon 1.0 ST microarray.As with Liu et al., they reported a greater sensitivity and dynamic range for RNA-Seq.They reported a higher technical variability among RNA-Seq replicates, indicated by CV independent of expression level, perhaps due to an inadequate sequencing depth of 10 million reads.Nonetheless, both platforms revealed similar biology of SCD.In addition, RNA-Seq identified 16 alternatively spliced genes, as well as novel GEX from an unannotated genomic region, a novel exon in the ALAS2 gene, and mutations in transcripts.Two studies examined the alteration of global gene expression in human airway epithelium by cigarette smoking, using both RNA-Seq and microarrays.Hackett et al.  performed RNA-Seq to quantify the human small airway epithelium transcriptome (SAE) of five nonsmokers and six healthy smokers and compared the data with Affymetrix Human Genome U133 Plus 2.0 microarray data generated from 12 healthy smokers and 12 non-smokers.Beane compared Illumina RNA-Seq data and Affymetrix Exon 1.0 ST and HGU133A 2.0 microarray data in pooled bronchial airway epithelial cell samples from healthy never smoker (n = 3) and current smoker volunteers (n = 3) and smokers with (n = 8) and without lung cancer (n = 5) [Beane et al., 2011].In both studies, the two methods were well correlated at the fold change level. RNA-Seq detected many additional smoking [Beane et al., 2011; Hackett et al., 2012] and cancer-related transcripts [Beane et al., 2011].Cigarette smoke creates a field of injury in the airway epithelium of the respiratory tract [Brody, 2012], and these new data have expanded the understanding of the biology of that effect induced by smoking.In addition, Hackett et al. found that smoking had no effect on SAE gene splicing, a known feature of SAE in lung cancer [Xi et al., 2008; Misquitta-Ali et al., 2011] and Beane et al.  identified differentially expressed ncRNAs (lincRNAs, pseudogenes, and processed transcripts), which may have important gene regulatory functions in lung carcinogenesis.Beane et al. also suggested that to completely characterize the transcriptome, library preparation protocols that measure the expression of non-polyadenylated RNAs are needed, using longer read lengths and PE sequencing, both of which yield a higher percentage of mapped reads (greater than 30 M).At 30–50 M reads, Su et al.  found that Illumina RNA-Seq was more sensitive that Affymetrix Rat Genome 230 2.0 microarrays at detecting aristolochic acid-induced transcriptional changes in low-expressed genes in the kidneys of exposed rats (n = 4) compared with controls (n = 4).RNA-Seq detected ∼50% more differentially expressed genes and 300% more if multiple testing was applied to the statistical analysis.Both platforms revealed the underlying biology but RNA-Seq was more sensitive.Van Delft examined the effects of BaP in HepG2 cells after 12 and 24 hr by RNA-Seq and Affy HGU133 Plus 2.0 GeneChip array [van Delft et al., 2012].RNA-Seq detected 20% more genes and 3-fold more differentially expressed genes as well as providing more insight into biology and mechanisms, through the identification of more significant pathways and processes.In addition, RNA-Seq revealed novel isoforms, including novel exon-skipping events in 735 genes and splice variants with altered expression in 839 genes.In a pilot case study described in detail in an accompanying article in this issue [Thomas et al., 2013], we analyzed by RNA-Seq the transcriptomes of 10 workers highly exposed (> 5 ppm) to the leukemogen benzene and 10 unexposed control study subjects matched by age, sex, and smoking status that had been previously analyzed by microarray [McHale et al., 2011].We compared the data obtained by both methods.Overall correlation between RNA-Seq and microarray intensities was 0.6, comparable with published studies [Marioni et al., 2008; Bradford et al., 2010] and suggested that RNA-Seq was better able to detect low intensity gene expression.In these 20 subjects, we identified 146 statistically significant differentially expressed genes (including 29 ncRNAs) by RNA-Seq compared with 1 gene by microarray.There was overlap among the genes and pathways identified in the RNA-seq pilot study and those identified in 125 subjects by microarray.We also identified differential splicing as a potential mechanism of benzene toxicity which should be further investigated as splicing diversity is involved in hematopoietic function and differentiation [Tondeur et al., 2010], and alternative splicing is a known leukemogenesis pathway [Maciejewski and Padgett, 2012].CONSIDERATIONS IN TRANSCRIPTOME STUDY DESIGN, ANALYSIS, AND INTERPRETATIONMultiple factors influence the usefulness of transcriptomics in molecular epidemiology studies, including the type of tissue or cell type analyzed, study design, and analysis and interpretation of the results, as summarized in Figure 1 and described in detail below.imageFigure 1. Consideration in transcriptome study design, analysis, and interpretation in molecular epidemiology studies.Choice of Target Tissue/Cell Type to AnalyzeIn human studies, transcriptomic effects are typically analyzed in readily available tissues such as blood.Such studies have revealed biomarkers of exposure and mechanisms of toxicity in whole blood or peripheral blood leukocytes in populations exposed tobenzene [Forrest et al., 2005; McHale et al., 2011],arsenic [Argos et al., 2006; Andrew et al., 2008],perfluorooctanoic acid [Rylander et al., 2011],acetaminophen (APAP) [Fannin et al., 2010; Jetten et al., 2012],metal fumes [Wang et al., 2005],cadmium [Dakeshita et al., 2009],diesel exhaust [Peretz et al., 2007; Pettit et al., 2012],dioxin [McHale et al., 2007]cigarette smoke [Spira et al., 2004a-2004c; Charlesworth et al., 2010; Beineke et al., 2012; Bosse et al., 2012],polychlorinated hydrocarbons [Dutta et al., 2012; Mitra et al., 2012],and metal-rich particulate matter (miRNA) [Bollati et al., 2010].Biomarkers and mechanisms of disease have also been revealed through analysis of peripheral blood transcriptomes in a range of diseases includingcolorectal cancer [Han et al., 2008],autism spectrum disorder [Kong et al., 2012],sickle cell disease [Raghavachari et al., 2012],hypertensionand Type 2 Diabetes [Stoynev et al., 2013],coronary artery disease [Nuhrenberg et al., 2013],myocardial infarction [Kiliszek et al., 2012],aggressive/advanced prostate cancer [Liong et al., 2012],Alzheimer's disease [Lunnon et al., 2013],and Graves' disease [Liu et al., 2012].It is unclear how good a surrogate the blood transcriptome is for target tissues.The peripheral blood transcriptome was shown to overlap considerably with those of nine other tissue types in healthy individuals [Liew et al., 2006].A study mapping expressed genes to gene ontologies recently showed that the white blood cell transcriptome was a good surrogate for a generalized multiorgan transcriptome constructed using profiles from healthy and diseased individuals [Kohane and Valtchinov, 2012].Similar patterns of gene expression in RNA-stabilized whole blood and lung were reported for early stage lung adenocarcinoma [Rotunno et al., 2011] and non-small cell lung cancer (NSCLC) cases compared with controls [Showe et al., 2009; Zander et al., 2011].However, fewer probes (∼300), by an order of magnitude, were found to be affected by smoking in blood lymphocytes [Charlesworth et al., 2010] than in non-tumor lung tissue of lung cancer patients (> 300) [Bosse et al., 2012].Though the number of differentially expressed probes identified was likely influenced by the statistical methodologies used, this suggests that effects in blood may not fully recapitulate those in the target tissues.In the NSCLC studies, the cancer-associated genes were enriched in immune function [Showe et al., 2009; Zander et al., 2011], and peripheral blood mononuclear cell (PBMC) gene signatures of immune response were also shown to predict NSCLC outcome [Kossenkov et al., 2012; Showe et al., 2012].Immune response is frequently impacted in toxicogenomic studies.Studies examining gene expression changes in the blood transcriptome during liver injury induced by APAP have identified mechanisms and potential predictive biomarkers of hepatotoxicity in rats and humans [Cui and Paules, 2010; Fannin et al., 2010].Similar expression changes and biological processes, including immune response, in a subset of genes in liver and blood of APAP exposed rats were reported by analysis of toxicogenomics data from prior studies using an Extracting Patterns and Identifying Co-expressed Genes (EPIG) approach [Zhang et al., 2012].Using available rat gene expression data sets, Huang demonstrated that blood gene expression profiles could predict liver necrosis induced by exposure to a wide variety of hepatotoxicants and validated their findings in independent data sets [Huang et al., 2010].Pathways impacted by the predicted genes included immune (Toll-like receptor signaling) and inflammatory response.We have reported effects of benzene on immune response, including Toll-like receptor signaling and B- and T-cell receptor signaling, in exposed individuals [McHale et al., 2011].Together, these data suggest that the blood transcriptome may identify biomarkers relevant to the organ of interest and capture processes such as immune response, but may not capture the entirety of the mechanistic response in the target tissue.Even for immune response, circulating immune cell subsets may not fully reflect the entire immune response at the tissue level.Another challenge associated with analysis of the blood transcriptome is that it is typically analyzed in whole blood or PBMC, a mixed cell population in which proportions of distinct cell types vary by individual.Whole blood or PBMC are typically analyzed because it is not always known a priori which blood population to analyze and available samples or sample preparation and storage techniques may preclude such analyses.Furthermore, positive selection approaches such as incubation of PBMCs with anti-CD19 or anti-CD20- to purify B-cell populations may activate cell-surface receptors and alter gene expression.However, blood cell populations may be altered by certain hematotoxic exposures, e.g., benzene [Lan et al., 2004] and trichloroethylene [Lan et al., 2010] and changes in gene expression may simply reflect this.Statistical analysis can be used to adjust for blood cell counts in analysis of transcriptomics if these data are available.Exposure to benzene in air could affect the mean gene expression measured in PBMC via two causal pathways -- either directly or via the ensuing hematotoxicity. In a recent analysis, we attempted to estimate the direct effects [Petersen et al., 2006] of benzene exposure on changes in mean gene expression.We estimated these effects non-parametrically using the SuperLearner [Sinisi et al., 2007] an approach which allowed the data to guide the choice of models of mean gene expression as functions of benzene exposure, counts of different types of PBMC and other potential confounders like gender and smoking status (in submission).Deconvolution approaches have been applied to identify cell-type specific effects from whole blood transcriptomes [Abbas et al., 2009; Shen-Orr et al., 2010; Bolen et al., 2011].Gene expression profiles of blood cell subsets are increasingly available [Martinez, 2009; Watkins et al., 2009; Shen-Orr et al., 2010; Tondeur et al., 2010; Beyer et al., 2012], facilitating such analyses.Similar approaches have been taken in the assessment of epigenetic effects in blood samples [Houseman et al., 2012; Liu et al., 2013].Such deconvolution approaches, however, may not be able to reveal information on minor immune subsets or on cells in various stages of differentiation and activation in each heterogeneous lineage.Single cell sequencing may ultimately be necessary to obtain that level of detail.In the case of leukemogens, the relevant target may be the hematopoietic stem cell [McHale et al., 2012].Accessible, disease-relevant tissues other than blood are useful surrogates for investigating the effects of smoking and possibly other lung toxicants in lung cancer.Cigarette smoke creates a field of injury in the airway epithelium of the respiratory tract [Brody, 2012].Several groups have demonstrated similarly altered gene expression induced by cigarette smoke in the cytologically normal small and large airway epithelium [Spira et al., 2004b; Beane et al., 2007, 2011; Zhang et al., 2008; Tilley et al., 2009; Gower et al., 2011] and in nasal and buccal epithelium [Sridhar et al., 2008; Boyle et al., 2010; Zhang et al., 2010].Tan et al.  found concordant expression levels of antioxidant and xenobiotic genes and p16 in laser-captured alveolar macrophages and distal airway epithelial cells of 62 smokers without cancer.Furthermore, airway gene expression in COPD was shown to reflect molecular processes occurring in more distal diseased lung tissue [Gower et al., 2011].Expression signatures in these accessible tissues have prognostic capability; cytologically normal epithelial cells collected at bronchoscopy from smokers with suspected lung cancer revealed an epithelial cell GEX-based biomarker with diagnostic accuracy of 83% [Spira et al., 2007].Another disease relevant tissue analyzed in molecular epidemiology studies is exfoliated bladder cells in the investigation of bladder cancer [Rosser et al., 2009; Urquidi et al., 2012].Experimental VariationThe number of transcriptomic endpoints generated by an RNA-Seq study is very large, with the current number of genes estimated at 30–40,000 plus alternative isoforms and transcript variants [Pertea, 2012].Toxicogenomic studies therefore need to be designed with sufficient power (relatively large sample sizes) to detect effects of the exposure under consideration.Depending on the goal of the study, as discussed above, RNA-Seq studies need to have sufficient depth of coverage to accurately measure expression and sequence of rare variants [Marioni et al., 2008; Liu et al., 2012].Inadequate study design with respect to these factors can increase the probability of false positive findings [Ioannidis, 2005; Robles et al., 2012].In microarray analysis, we [McHale et al., 2011] and others [Kitchen et al., 2011; Schurmann et al., 2012] found variation due to factors such as RNA extraction, labeling, hybridization, chip assignment that require statistical adjustment.In our study, analysis with a mixed-effects model minimized potential confounding and experimental variability.Factors related to sample preparation and storage can influence the quality of the data generated and limit inter-study comparisons.Millions of human biosamples are currently stored in biobanks and have the potential to yield valuable transcriptomic data.Hebels et al.  investigated the effect of handling and prolonged storage on the suitability of fresh and biobanked blood samples and isolated components for transcriptomic analysis.They found that adequate amounts of microarray-quality RNA with RNA Integrity Number (RIN) > 6.0 (average RIN = 7. 2, similar to fresh samples) could be isolated from ∼85% of the biobank samples tested, even after 13–17 years of storage.Differences in gene expression profiles were mainly associated with longer bench times prior to sample processing, followed by choice of anticoagulant (mainly EDTA vs. heparin) and, to a much lesser extent, storage temperature.They also found that transcriptomics quality RNA could be isolated from buffy coat samples frozen in the absence of RNA preservative, by thawing these samples in the presence of RNAlater, provided the buffy coats had been deep frozen within 8 hr of blood collection.In another study, even a 4 hr processing delay after phlebotomy led to altered expression of genes involved in inflammatory, immunologic, and cancer pathways [Barnes et al., 2010].The effect of different isolation techniques has also been examined.Compared with PBMC, PAXgene RNA-stabilized samples showed a lower number of expressed genes, lower gene expression values, and higher variability, probably due to the differing cell populations in each sample type and the presence of globin RNA in the PAXgene samples [Min et al., 2010]. Globin reduction was shown to improve data quality from microarray analysis on Illumina BeadChips [Tian et al., 2009] but may not be necessary for RNA-Seq analyses [Raghavachari et al., 2012].Debey-Pascher et al.  reported substantially differing expression profiles in fresh and cryopreserved PBMC and found that expression profiles in cryopreserved PBMC samples were significantly altered with increasing storage period, whereas profiles from PAXgene RNA-stabilized remained unaltered.Weber et al.  found that superior RNA yield and integrity values were obtained from blood samples stabilized with RNALater than with PAXgene, though both produced RNA of acceptable quality and detected similar expression levels of specific genes by qRT-PCR.Poor overlap in expression profiles from PBMC and PAXgene RNA-stabilized WB was observed in several studies [Debey et al., 2004, 2006; Zander et al., 2011].All of these factors can limit the usefulness of cross-study comparisons using publicly available datasets.Confounding FactorsAn individual's transcriptome reflects characteristics of the individual including genotype, specifically eQTL; the microbiome or enterotype; irreversible alterations in gene expression acquired in utero and throughout life; and dynamic transcriptional responses to the exposure of interest and all other confounding exposures at the time of sampling.Molecular epidemiological studies assessing toxicogenomic endpoints typically account for age, gender, smoking, some dietary features, infection, alcohol intake, medication use, confounding exposures, and mixed cell populations.It may be difficult to account for all aspects of dietary effect on the human blood transcriptome as so many components have effects including macronutrients and micronutrients [Pagmantidis et al., 2008; Ryu et al., 2011; de Mello et al., 2012; Drew, 2012; Sagaya et al., 2012; Vedin et al., 2012].Through regulation of host gene expression, the human gut microbiome performs functions critical to host physiology including processing and biotransformation of xenobiotics, regulation of human metabolism, and shaping the development of the immune system [Nicholson and Wilson, 2003; Clemente et al., 2012; Nicholson et al., 2012].There are three major forms of host enterotype with a huge degree of inter-individual variation [Arumugam et al., 2011].Environmental stressors that disturb the balance between commensal microbes and their human hosts may alter host physiology and underlie many disease states [Spor et al., 2011; Maurice et al., 2013]. Stress [Kawai et al., 2007] and exercise [Connolly et al., 2004; Zieker et al., 2005; Carlson et al., 2011] also impact the transcriptome.Studies of gene expression in different ethnic groups in Morocco and Fiji suggested that over a third of the human transcriptome is influenced by environmental geography, with much lesser influences of age, gender, and genetic factors [Idaghdour et al., 2008, 2010; Nath et al., 2012].Aspects of immune function were found to be strongly affected by regional factors, potentially influencing susceptibility to respiratory and inflammatory disease, in the Morocco study.Distal environmental conditions, such as in utero or early childhood exposures, can influence an individual's response to a later exposure [Szyf, 2007] and disease risk [Votavova et al., 2011; Martinez et al., 2012].Fetal exposure to carcinogens was shown to have gender specific effects on gene expression in the cord blood of newborns [Hochstenbach et al., 2012] and maternal smoking was reported to cause significant changes in the transcriptome of placental and fetal cells and to deregulate pathways associated with autoimmune diseases in the newborns of smokers [Votavova et al., 2011].These effects could be mediated through long-term effects on gene expression and cumulative damage such as genetic or epigenetic mutations could increase the risk of disease even at low exposures, particularly those diseases occurring later in life. Indeed, a greater effect of environmental tobacco smoke was found among smokers compared with never-smokers in a large prospective study of respiratory cancer and chronic obstructive pulmonary disease [Vineis et al., 2005].Several studies have shown that changes in gene expression induced by cigarette smoke remain altered many years after smoking cessation [Spira et al., 2004c; Beane et al., 2007, 2011].Given the huge potential for confounding, human toxicogenomic studies need to be carefully designed and analyzed to identify true causal associations with the exposure of interest.For example, this can be done with appropriate choices of cases and controls in case-control studies matched on a given set of confounders and possibly accounting for other confounders in statistical analyses like those involving mixed models [Laird and Ware, 1982].Mixed models were developed to quantify variation of an outcome like gene expression from different sources (genotype, diet).This would result in a lower residual (unexplained) variance and thus provide greater power in detecting exposure effects.Interpretation of Subtle Changes in Expression and Low-Dose EffectsIn our analysis of global gene expression in 125 benzene-exposed subjects and controls, we reported the subtle (small fold-change, many far less than 2-fold) alteration of expression of ∼2,000 genes, dose-dependent effects on gene expression and biochemical pathways, and an apparently supra-linear response in the expression of a 16-gene signature [McHale et al., 2011].Using non-parametric approaches to statistically model the dose-response of AML pathway gene expression in our benzene-exposed population including exposed and control individuals, with air benzene exposure levels in the latter estimated from unmetabolized urinary benzene levels, we found that the AML pathway and pathway representative genes exhibited similar supralinear responses and responses at benzene levels as low as 100 ppb in air (unpublished data).It is unclear what the implication of these pathway alterations at low benzene exposure levels is for AML risk.In a recent study examining the effects of short term, low dose APAP in human volunteers, transcriptomics (mRNA and miRNA) outperformed clinical chemistry tests, revealing novel response pathways to APAP and detecting dose-specific immune-modulating effects that suggested the occurrence of possible pretoxic effects of therapeutic APAP doses [Jetten et al., 2012].Both of these studies analyzed the transcriptome by microarray; RNA-Seq has the potential to detect more subtle changes in gene expression.The implications of such subtle changes in expression, and the distinction of adaptive responses from adverse effects at low doses [Jennings, 2013], will present challenges in the application of transcriptomics in molecular epidemiology studies.Pathway Analysis, Systems Biology, and the Exposure-Disease ContinuumAs described earlier, most recently published transcriptomic data is publicly available through theGEO [Barrett et al., 2009, 2011, 2013],CEBS [Waters et al., 2008b],ArrayExpress [Brazma et al., 2003; Parkinson et al., 2005, 2007; Rustici et al., 2013],Japanese Toxicogenomics Project (TGP) [Uehara et al., 2010],and DrugMatrix [Ganter et al., 2006] databases.Gene Expression Transcriptomic and other toxicogenomic datasets are available through the Comparative Toxicogenomics Database (CTD) [Mattingly et al., 2006] and Chemical Effects in Biological Systems (CEBS) [Waters et al., 2008a].In July 2012, CTD contained manually curated data on 5,99,182 chemical-gene interactions, 1,76,627 chemical-disease, and 23,395 gene-disease relationships, internal integration of which leads to > 10.1 million inferred gene-disease relationships and 9,13,622 inferred chemical-disease relationships [Davis et al., 2013].
(Message over 64 KB, truncated)