– Welcome everyone to Wednesday Nite at the Lab, I’m Tom Zinnen, I work here at the UW-Madison Biotechnology Center. I also work for UW Extension and Cooperative Extension, and on behalf of those folks and our other co-organizers, Wisconsin Public Television, Wisconsin Alumni Association, and the UW-Madison Science Alliance, thanks again for coming to Wednesday Nite at the Lab. We do this every Wednesday night, 50 times a year. Tonight, it’s my pleasure to introduce to you Frank Pugh. He’s a professor at Penn State University, special guest of John Denu here at the Wisconsin Institute for Discovery, and he’s here to talk about his research on the epigenome. He was born in Freeport, New York, on Long Island, and went to Riverhead High School on Long Island. Then he went to Cornell University, and he came here to get his PhD in biochemistry, right across the street. Then he went to Cal-Berkeley for a postdoc and in 1992, he joined the faculty at Penn State University. Tonight, he’s here to talk to us about what the epigenome can tell us. Please join me in welcoming Professor Frank Pugh to Wednesday Nite at the Lab.
(audience applauding)
– Thank you, Tom, it’s a great pleasure to be here, and tell you about some of the exciting work that my lab has been doing over the past many years. Most of you are pretty familiar with DNA, right, and genes and genetics and genomics. Well, this is this concept of an epigenome is probably more recent, compared to the notion of DNA, and it’s about how that DNA is regulated, and that’s what I would like to tell you about tonight. So, I want to show you a picture. The pictures that you see here, to me, they look like beautiful works of art, but to others, in the lab, they’re the pictures of the epigenome, and this, hopefully, by the end of tonight, I want to tell you, and you’ll understand exactly how this and how cool it is, a picture of the regulatory structure of a typical genome, and what I mean by that is, up here are these proteins that interact with DNA, and it’s all the compilation of all these proteins together that comprise this thing called the epigenome and how it regulates DNA. That’s what I want to be talking to you today about. In thinking about this whole process, you might be asking, why does it matter, why should I care about it? And it matters, because when we think about going to the doctor and you’re basically getting a physical exam, you might come across, unfortunately, a tumor and it’s examined further by imaging or other methods, but ultimately, a biopsy might be taken. The idea is, these traditional approaches, when that biopsy might go to pathology to have a look at, are these sort of cytological approaches. However, more recently there’s these sort of molecular genetic techniques that we’re becoming more familiar with, and the important thing is, there’s two flavors of that. There’s the genome itself, the DNA sequence, and changes to that DNA sequence, and that tells you what might happen. But there’s this whole other path of the epigenome that’s more about the gene expression process, what’s happening with your genes and how they are read. And why that is so valuable is it’s telling you what is happening as opposed to what might happen in the future, and that’s why it matters. So we have terms like genetics and genome, and that relates to DNA, and then you have things like epigenetics and epigenome, and that relates to things that interact with DNA. And these two concepts, I really need to define to you so that you can appreciate what the epigenome is. So, in order to do that, I’m going to assume that we haven’t been working too long with…
Okay, too long with the small molecules, and so the way to think about that is, if we as people are the size of the planet Earth, and we start to dig down into how many cells we have, it’s the size of Camp Randall. If we keep digging deeper into the chromosomes and the nucleus, the size of a chromosome compared to us, as people, is like the size of a football compared to planet Earth. And then we finally get into the DNA that we’re interested in, and that’s like noodles, spaghetti noodles in a plate. And so you can see quite the magnitude of difference in size, and that’s the kind of thing that we need to be thinking about here in terms of scalability. Now, when we go to the DNA itself and read it, of course you know that DNA is a letter, it’s a set, a series of letters, four different letters that are coded in chemical information, and that they are read in groups of three into, from DNA into RNA, and then ultimately, into protein, and that protein folds to create protein complexes that do all sorts of things, including producing metabolites that the cell functions in. So when you take this all together, this is the genetic information flow.
Now, you can get misinformation, in which case, a single nucleotide might change, for example here to a T, and that changes in RNA, and then, ultimately, the amino acids that change. When you look at it on the protein level, this is one example that I learned as an undergraduate, one of the first concepts I learned as genetics, and that is that a normal hemoglobin might produce a normal red blood cell, whereas a single mutation here in this amino acid will produce a sickled version, a mutant form of that protein, that produce a diseased state. Now, when you take this in its summation, of genetics going from DNA to RNA to proteins, you have the regulation of that, and that’s the epigenome. RNA can regulate DNA, proteins can regulate DNA, and metabolites can regulate DNA, and that’s what we call the epigenome.
Now, let’s dig further into what that epigenome is. Here’s your chromosome, here’s your DNA, and everything in between is the epigenome. This is really the proteins, in this case, histones, that wrap the DNA around into these little balls, and there’s a string of balls that ultimately form your chromosome. Here’s one of ’em right here, here’s the DNA wrapped around one of these histone cores shown in blue. Now, I don’t know if you can see this, but there’s a little string of this protein hanging off the end here, and that’s, you can think of that as a light post or a flagpole, in which we’re going to put a little marker on there, a little light, and that’s a little metabolite called a methyl group, and that’s going to attach to that protein, and it’s shown schematically here, and that is going to be a mark, an indicator on the genome about regulating a particular part of that DNA sequence, and it’s called an epigenetic factor.
Okay, now, what determines whether something is really epigenetic or something beyond that is whether it is stable versus dynamic. So I need to spend a few moments describing what that is. So we think about inheritance, right, we, as organisms, we make progeny, so we go through replication, and then we make more of us. When we have cells, they also replicate, inside our body, and they make multiple cells. We have chromosomes, they replicate and make multiple chromosomes. And DNA sequence, ultimately, replicates to make duplicates. But the epigenome does the same thing. If you dig deep inside these chromosomes, they replicate to make more of themselves. Now, if this process is stable, and instructs its own replication, so two properties of this hereditary material in DNA, is that it’s passed on to its progeny and it’s stable. So, for example, the DNA here instructed its own replication, where G specifies C, A specifies T, but in the epigenetic state, you have this component, this mark, that specifies its own replication, by instructing the placement of a second equivalent mark when that component is duplicated. So that would be an indication of what epigenetics is. But what’s important about epigenetic inheritance is that it does something, all right, in this case, if this thing shuts down the expression of this gene, so a normal specialized cell, you have this mark shutting down the expression of the genes, and you have, ultimately, what results is a normalized, specialized, a normal specialized cell. But if this mark were to, let’s just say, mutate or change and not be there anymore, then what happens, that goes away, that all goes away, and instead, you now have expression of this gene that’s involved in self proliferation, and you get an oncogenic cell, that will self replicate. So that’s kind of the notion of an epigenetic mark, and how that compares to a genetic mark, would be a gene in which you have a mutation, but when this thing replicates, you’ll have a normalized specialized cell, but if it mutates, then this thing is no longer functional, and now you get an oncogenic cell. So the one is, in one case you have a change in the DNA, and the other case, you have a change in the regulation of that gene.
Okay, so we’ve talked about the stability of both genetic information and epigenetic information, but what my group focuses on is the dynamic aspects of it. So a lot of times, some of these epigenomic marks will change, all right, or be dynamic. And when they change and are dynamic, the expression of the underlying gene can change. All right, and so this is the pattern we are seeing in a typical epigenome. It’s not just nucleosome, but it’s many many more proteins.
All right, so let’s dive into what are some of these. Ultimately, at the end of this talk, what I hope that you’ll appreciate is what these dots and these patterns mean. I still haven’t gotten there, and there’s still a little ways to go before you understand what they mean. So let’s focus on these little cartoon representations of what proteins are that interact with DNA. Here’s a blowup of what the atomic structure of one of these complexes actually looks like. What was really striking to me was when I was actually a graduate student here, I was always interested in what this might be, and I’m going to tell you what it is in just a second, but at the time we had no concept. There were just vague ideas of what it might be. But if we click on here and show you the structure of this protein, this is RNA polymerase, bound to DNA, and that’s this thing here, transcribing RNA from DNA. And then you have these red proteins and yellow proteins that are going to bind specifically to DNA sequence, at the start of a gene, and it brings in RNA polymerase, to know where to begin, at the beginning of a gene, and then, this RNA polymerase will move down through the body of genes and continue on to make RNA. So this was a really recent new structure, from the laboratory of Patrick Cramer, pretty impressive work. So if we take that and now come to the question of, well, how do you even measure or look at or examine all these protein-DNA interactions, all of these interactions with the genome? The way you might do that is, you can start with anything, you can start with tissue, you can start with mammalian cells, and you can start with yeast cells.
Basically, what you do is you throw this stuff into a blender, you get a protein shake out at the end of the day on this, and that’s your, all these proteins are your epigenome sitting in a test tube. So what we do with that blender, it actually breaks apart the DNA into these little fragments, and then we go in, and then we pull out that piece of protein using an antibody that is specific to that protein, and then we put that into a test tube, and really what we want to know, is what is the DNA sequence that defines that process. How do we sequence this DNA? All right, ’cause that DNA sequence will identify where in this genome that protein is present. So I want to tell you a little bit about DNA sequencing technology because it’s something that’s really come into existence in its own, it’s into massive high-throughput in the past 10 years that has enabled sort of routine sequencing of the human genome, and this is the way it is. You basically pull out this piece of DNA, and we want to sequence it, what simply happens is, you float these nucleotides that are fluorescently colored into this DNA and that’s sitting on a cover slip, basically, and if you’ve got a proper match, and of course, you know what matches with G is C, it sticks and you get this blue color. If you go to the next nucleotide, well, what can you imagine will interact with A? It’s going to be a T, so G comes in, A comes in, and it’s only when the T comes in does it actually stick. Okay, so now, when that happens over a large series of parallel reactions on a glass slide, you can do billions of these things. Billions of spots, meaning billions of sequencing reactions in parallel, and each one that I just showed you is one little dot in this whole scheme, and so at the end of the day, you get massive amounts of data out at the end of this. And so what we care about is one of these is basically one of those spots, and it tells you, by sequencing the DNA, you can look at the sequence, ’cause we know what it is, and we can tell you where this one green protein is sitting, and it’s this protein right here, okay, at this coordinate, all right.
Now, how do we, you, go in and quantify that in a way that speaks to the presence of this protein throughout an entire genome? That’s a really challenging problem, because as I, maybe as I showed you at the beginning, that the human genome contains three billion nucleotide positions, that’s rungs on this ladder. That’s a massive amount of positions, so how do we even begin to look at that data? So, one simple way is, well, of course, we need to collect data over many cells, and we quantify that, so in this case, we have four cells, where, let’s just say we’re looking at, and there’s four events that took place, but over here, the same green protein is not bound, it’s a brown protein, and there are zero hits. So we march along the entire chromosome and look at that, and you can bar graph this out, so there’s, at this position here, there’s chromosome four, at position eight, 173,744, there’s this protein right here, but elsewhere, you don’t find it, okay? And so, ultimately, what that gets us to is simplifying this on a broader scale to the entire genome. And this is the pattern that we see. Now, there’s a couple more things I need to tell you before you really appreciate what that is, but just simply put, this bar graph here, this little dot here and this protein, binding across the genome, mean essentially the same thing. All right, it’s just different representations of the same data, so, what can we do with this, in terms of discovery? Now, I want to illustrate how we collect this data on a genome scale and illustrate it using nucleosomes. So the beautiful aspect of the packaging of DNA is the wrapping of this DNA double helix around this protein core. That’s called a single nucleosome. And the way that we actually measure this is to simply make cuts in the DNA on either side of these nucleosomes, and there’s enzymes that we can use that will cut the DNA, and now, release each of these nucleosome particles.
We can then strip the DNA off of them, and sequence them in, like I showed you before. Then you get these bar graphs that I just showed you in the previous slide. Those bar graphs, peak and valleys, peaks and valleys. So these little individual green lines are each individual corded that we detect a nucleosome being present. And so, here’s a gene, called DEP1, and it’s got a nucleosome here, here, here. There’s four, five, six, eight nucleosomes on this gene. It’s packaged, so this would be one gene with eight of these brown nucleosome spheres. All right, now, the question is, all right, there’s thousands and thousands of genes in the genome. How do you even display all of this information in a way that we can visualize it? Well, we can turn these peaks and valleys into single dimensional tracks, like railroad tracks, where we call the peak yellow and the valleys blue. And they’re just colors that we use. And now we can start to assemble all the genes bioinformatically on top of each other or next to each other in order of the length of the gene, so short genes on top, long genes on the bottom, and this is what you get out at the end of the day, and to me, this is just so beautiful and awesome, because it’s a view of the entire chromatin structure of a single cell, and it’s organized, again, so there’s these genes up here have three nucleosomes, so there’s 5,000 rows in this diagram here. You can’t see all the individual rows, but there’s one gene per row, so we’re looking at 5,000 genes, These are long genes, these are short genes. They start here on the left, and they end over there on the right. So that’s part of the epigenome.
Short genes versus long genes. This is the pattern that I’m going to be showing to you through the rest of the talk, because there’s much more to this than simply just nucleosomes. There’s all these other proteins. These proteins interact very specifically with DNA sequences, side chains of these proteins interact with a very precise sequence along a gene or the beginning of a gene to control the regulation of that, essentially, the epigenome. So these patterns that you’re seeing are, each little dot is a different epigenomic protein that’s regulating the expressions of thousands of genes. These ones over here on the left, you see, are at the beginning of the genes, and some of these are at the ends of genes. Some run across the entire gene bodies.
So there’s quite a variety out there. It’s always fun to look at these things. In fact, I want to show you a few more. This is some, our newest collection of data, which I really haven’t shown before. And we went through every single protein that we knew existed inside the nucleus of a cell, and we asked, what genes did it bind to? And here’s one of ’em, REB1 is the name of a protein, and you can see it’s binding some genes, but not all genes, and it’s binding at the beginning. And these are just different classes of genes. All right, now, I’m going to show you a little movie here. It’s going to run for one minute, and it’s going to show you every single different protein that we can measure inside of a nucleus bound to the chromatin. And so you see these different patterns. What’s really cool is sometimes you see little punctate spots, sometimes you see large swaths, sometimes you see things at the left side, sometimes on the right side, sometimes they’re only up here, certain classes of genes. Sometimes they’re spread throughout. Others streak across the gene body, so you’re seeing all different sorts of really interesting patterns. And I know I could just stare at this all day, (audience laughing) but it took us many years to collect this data, and one minute to show it, which is really pretty striking.
But when you compile all of this together, you ultimately get, essentially, a snapshot of the entirety of what might be happening inside a nucleus, in terms of regulating genes, and this is kind of a cartoon image of that whole process. And one of the things you realize quickly is that it’s a pretty crowded place. There’s a lot of things happening. The beginning of genes, in the middle of genes, and why are they all there? They’re all controlling the expression of the gene, why that gene should be turned on or turned off, typically in response to environmental signals, and that’s something I want to touch on towards the end. So, for now, we’re just looking at cool little patterns, but what happens is that these patterns, they’re not static, they change, and they change with environmental signals, whether it’s hormonal signals throughout the body or a viral infectivity, any sort of impact on the human body will change these patterns. So, for example, this is a heat shock of cells, and some things stay the same, but others change. I just want to illustrate one particular example. This is a protein that binds near the beginning of the genes, but it shifts, in terms of its locations, when cells are just briefly shocked by heat for three minutes. That doesn’t kill the cells, it just shocks them, and then they kind of recover from that. So it’s a very dynamic process, so the epigenome can be very dynamic. Anything that’s epigenetic is rather static, ’cause it has a genetic component to it. So this gets back to, sort of, the final aspect of why does all of this matter.
So, as I mentioned, the proteins that interact with the genome are influenced by their environment, so you have gene environment interactions, and so cell fate is tied to this configuration of proteins across the genes, and knowing, even just a part of it, may be predictive of the outcome or the future behavior of these cells. In other words, diagnostics. So, we were really intrigued and really, we got excited, basically, about this paper that was published in Nature from Jason Carroll’s group back in 2012, and they were looking at the binding of an epigenomic component called the estrogen receptor. It binds the DNA, and it regulates certain genes. This is what it looks like. What they found was that if you take the entire pattern of binding of this protein to many different genes in a genome, and distill that down into a single data point and looked at it in different patients, these are breast cancer biopsies, that what they discovered were certain patterns indicated a good outcome to the breast cancer, and others had a poor outcome. And so you could segregate, based upon this profiling, what patients might benefit from certain treatments, versus others that may not be effective. So this turned out to be, I think, quite insightful, and so we jumped on this bandwagon and started investigating similar things, but in this case, we were basically looking at endometrial cancers, these are cancers of the uterine, and we were tracing that particular different transcription factor.
And we’ve just started down this road as of last year, collaborating with a group at Geisinger Medical Center in Pennsylvania, where we were able to get samples of these endometrial cancers, 10 patients, here, 11, and I won’t go into the details of how, what this data really means, but it’s just an example of the epigenomic profile that we’re able to obtain from a single transcription factor across many different patients. So we were excited that we were actually able to get real data from real patients, rather than just cell lines, and moreover, we were able to segregate these patterns, again, distilling each individual patient into a single data point of binding profiles, we were able to separate grade one tumors from normal tissue, and so this was an initial proof principle, and so we are continuing this investigation by grabbing more samples and hopefully, we’re able to predict what might be the future outcome of some of these patients. It turns out that when these patients are essentially surgically cured of their endometrial cancers, about 90% of them are fine, so they don’t go through any other treatment, but 10% get a relapse, and what we’d really like to know is in that 10%, is there any sort of epigenomic profile that can indicate that you fall into that 10%, and therefore, would be a good candidate for particular treatments.
So I’m going to end here with just a few bullet points of what we’ve learned. The epigenome is something that reads and regulates the genome. They’re small molecules of metabolites, they’re proteins and they’re RNAs. All sorts of things interact and regulate the information content of the genome. It has a precise organizational pattern, so you saw those bell-shaped plots and all those little dots and how each one of them was organized? So that very precise organizational pattern. And it’s very dynamic, it’s changing often, in response to environmental cues. But constrained by the genes that you have, the genetic constraints.
And then, finally, we hope that– This is a really early stage, and it’s at early times in this field, but that it may be potentially diagnostic of certain diseases and outcomes. And we really, that’s an area we hope to emphasize a bit more, so anyway, this is my group at Penn State University, a number of folks who were involved in this, both current students and former students, and I think I’ll stop there and take any questions that you have.
(audience applauding)
Search University Place Episodes
Related Stories from PBS Wisconsin's Blog
Donate to sign up. Activate and sign in to Passport. It's that easy to help PBS Wisconsin serve your community through media that educates, inspires, and entertains.
Make your membership gift today
Only for new users: Activate Passport using your code or email address
Already a member?
Look up my account
Need some help? Go to FAQ or visit PBS Passport Help
Need help accessing PBS Wisconsin anywhere?
Online Access | Platform & Device Access | Cable or Satellite Access | Over-The-Air Access
Visit Access Guide
Need help accessing PBS Wisconsin anywhere?
Visit Our
Live TV Access Guide
Online AccessPlatform & Device Access
Cable or Satellite Access
Over-The-Air Access
Visit Access Guide
Follow Us