Encyclopedia Britannica

  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • Games & Quizzes
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center

initial proposal of DNA structure

Who discovered the structure of DNA?

Endoplasmic reticulum. cell biology

Our editors will review what you’ve submitted and determine whether to revise the article.

  • National Center for Biotechnology Information - The Structure and Function of DNA
  • Khan Academy - DNA
  • Healthline - DNA explained and explored
  • Biology LibreTexts - DNA
  • Live Science - DNA: Definition, Structure & Discovery
  • Genetic Home Reference - What Is DNA?
  • National Human Genome Research Institute - Deoxyribonucleic Acid (DNA) fact sheet
  • Nature - What Rosalind Franklin truly contributed to the discovery of DNA’s structure
  • DNA - Children's Encyclopedia (Ages 8-11)
  • DNA - Student Encyclopedia (Ages 11 and up)

initial proposal of DNA structure

What does DNA do?

Deoxyribonucleic acid (DNA) is an organic chemical that contains genetic information and instructions for protein synthesis . It is found in most cells of every organism. DNA is a key part of reproduction in which genetic heredity occurs through the passing down of DNA from parent or parents to offspring.

What is DNA made of?

DNA is made of nucleotides . A nucleotide has two components: a backbone, made from the sugar deoxyribose and phosphate groups, and nitrogenous bases, known as cytosine , thymine , adenine , and guanine . Genetic code is formed through different arrangements of the bases.

The discovery of DNA’s double-helix structure is credited to the researchers James Watson and Francis Crick , who, with fellow researcher Maurice Wilkins , received a Nobel Prize in 1962 for their work. Many believe that Rosalind Franklin should also be given credit, since she made the revolutionary photo of DNA’s double-helix structure, which was used as evidence without her permission.

Can you edit DNA?

Gene editing today is mostly done through a technique called Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), adopted from a bacterial mechanism that can cut out specific sections in DNA. One use of CRISPR is the creation of genetically modified organism (GMO) crops.

What’s the difference between DNA and RNA?

DNA is the master blueprint for life and constitutes the genetic material in all free-living organisms. RNA uses DNA to code for the structure of proteins synthesized in cells . Learn more about the differences between DNA and RNA.

Recent News

DNA , organic chemical of complex molecular structure that is found in all prokaryotic and eukaryotic cells and in many viruses . DNA codes genetic information for the transmission of inherited traits.

genetically modified humans

A brief treatment of DNA follows. For full treatment, see genetics: DNA and the genetic code .

Learn how Francis Crick and James Watson revolutionized genetics by discerning DNA's structure

The chemical DNA was first discovered in 1869, but its role in genetic inheritance was not demonstrated until 1943. In 1953 James Watson and Francis Crick , aided by the work of biophysicists Rosalind Franklin and Maurice Wilkins , determined that the structure of DNA is a double-helix polymer , a spiral consisting of two DNA strands wound around each other. The breakthrough led to significant advances in scientists’ understanding of DNA replication and hereditary control of cellular activities.

research on dna structure

Each strand of a DNA molecule is composed of a long chain of monomer nucleotides . The nucleotides of DNA consist of a deoxyribose sugar molecule to which is attached a phosphate group and one of four nitrogenous bases : two purines ( adenine and guanine ) and two pyrimidines ( cytosine and thymine ). The nucleotides are joined together by covalent bonds between the phosphate of one nucleotide and the sugar of the next, forming a phosphate-sugar backbone from which the nitrogenous bases protrude. One strand is held to another by hydrogen bonds between the bases; the sequencing of this bonding is specific—i.e., adenine bonds only with thymine, and cytosine only with guanine.

Explore Paul Rothemund's DNA origami and its future application in medical diagnostics, drug delivery, tissue engineering, energy, and the environment

The configuration of the DNA molecule is highly stable, allowing it to act as a template for the replication of new DNA molecules, as well as for the production ( transcription ) of the related RNA (ribonucleic acid) molecule. A segment of DNA that codes for the cell’s synthesis of a specific protein is called a gene .

DNA replicates by separating into two single strands, each of which serves as a template for a new strand. The new strands are copied by the same principle of hydrogen-bond pairing between bases that exists in the double helix. Two new double-stranded molecules of DNA are produced, each containing one of the original strands and one new strand. This “semiconservative” replication is the key to the stable inheritance of genetic traits.

Within a cell, DNA is organized into dense protein-DNA complexes called chromosomes . In eukaryotes , the chromosomes are located in the nucleus , although DNA also is found in mitochondria and chloroplasts . In prokaryotes , which do not have a membrane-bound nucleus, the DNA is found as a single circular chromosome in the cytoplasm . Some prokaryotes, such as bacteria , and a few eukaryotes have extrachromosomal DNA known as plasmids , which are autonomous , self-replicating genetic material. Plasmids have been used extensively in recombinant DNA technology to study gene expression.

Finding prehistoric family ties with modern DNA

The genetic material of viruses may be single- or double-stranded DNA or RNA. Retroviruses carry their genetic material as single-stranded RNA and produce the enzyme reverse transcriptase , which can generate DNA from the RNA strand. Four-stranded DNA complexes known as G-quadruplexes have been observed in guanine-rich areas of the human genome .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

DNA structure and function

Affiliations.

  • 1 MRC Laboratory of Molecular Biology, Cambridge, UK.
  • 2 Department of Biochemistry, University of Cambridge, UK.
  • 3 Jacobs University Bremen, Germany.
  • PMID: 25903461
  • DOI: 10.1111/febs.13307

The proposal of a double-helical structure for DNA over 60 years ago provided an eminently satisfying explanation for the heritability of genetic information. But why is DNA, and not RNA, now the dominant biological information store? We argue that, in addition to its coding function, the ability of DNA, unlike RNA, to adopt a B-DNA structure confers advantages both for information accessibility and for packaging. The information encoded by DNA is both digital - the precise base specifying, for example, amino acid sequences - and analogue. The latter determines the sequence-dependent physicochemical properties of DNA, for example, its stiffness and susceptibility to strand separation. Most importantly, DNA chirality enables the formation of supercoiling under torsional stress. We review recent evidence suggesting that DNA supercoiling, particularly that generated by DNA translocases, is a major driver of gene regulation and patterns of chromosomal gene organization, and in its guise as a promoter of DNA packaging enables DNA to act as an energy store to facilitate the passage of translocating enzymes such as RNA polymerase.

Keywords: A-DNA; B-DNA; DNA as an energy store; DNA backbone conformation; DNA elasticity; DNA information; DNA structure; DNA topology; alternative DNA structures; genome organisation.

© 2015 FEBS.

PubMed Disclaimer

Similar articles

  • DNA Supercoiling, Topoisomerases, and Cohesin: Partners in Regulating Chromatin Architecture? Björkegren C, Baranello L. Björkegren C, et al. Int J Mol Sci. 2018 Mar 16;19(3):884. doi: 10.3390/ijms19030884. Int J Mol Sci. 2018. PMID: 29547555 Free PMC article. Review.
  • Helical chirality: a link between local interactions and global topology in DNA. Timsit Y, Várnai P. Timsit Y, et al. PLoS One. 2010 Feb 19;5(2):e9326. doi: 10.1371/journal.pone.0009326. PLoS One. 2010. PMID: 20174470 Free PMC article.
  • Brownian dynamics simulations of sequence-dependent duplex denaturation in dynamically superhelical DNA. Mielke SP, Grønbech-Jensen N, Krishnan VV, Fink WH, Benham CJ. Mielke SP, et al. J Chem Phys. 2005 Sep 22;123(12):124911. doi: 10.1063/1.2038767. J Chem Phys. 2005. PMID: 16392531
  • Protein tracking-induced supercoiling of DNA: a tool to regulate DNA transactions in vivo? Dröge P. Dröge P. Bioessays. 1994 Feb;16(2):91-9. doi: 10.1002/bies.950160205. Bioessays. 1994. PMID: 8147849 Review.
  • Torsional Stiffness of Extended and Plectonemic DNA. Gao X, Hong Y, Ye F, Inman JT, Wang MD. Gao X, et al. Phys Rev Lett. 2021 Jul 9;127(2):028101. doi: 10.1103/PhysRevLett.127.028101. Phys Rev Lett. 2021. PMID: 34296898 Free PMC article.
  • A DFTB study on the electronic response of encapsulated DNA nucleobases onto chiral CNTs as a sequencer. Monavari SM, Memarian N. Monavari SM, et al. Sci Rep. 2024 May 11;14(1):10826. doi: 10.1038/s41598-024-61677-0. Sci Rep. 2024. PMID: 38734799
  • Force spectroscopy with electromagnetic tweezers. Piccolo JG, Méndez Harper J, McCalla D, Xu W, Miller S, Doan J, Kovari D, Dunlap D, Finzi L. Piccolo JG, et al. J Appl Phys. 2021 Oct 7;130(13):134702. doi: 10.1063/5.0060276. Epub 2021 Oct 5. J Appl Phys. 2021. PMID: 38681504 Free PMC article.
  • Improving somatic exome sequencing performance by biological replicates. Cebeci YE, Erturk RA, Ergun MA, Baysan M. Cebeci YE, et al. BMC Bioinformatics. 2024 Mar 22;25(1):124. doi: 10.1186/s12859-024-05742-5. BMC Bioinformatics. 2024. PMID: 38519906 Free PMC article.
  • Building an ab initio solvated DNA model using Euclidean neural networks. Lee AJ, Rackers JA, Pathak S, Bricker WP. Lee AJ, et al. PLoS One. 2024 Feb 15;19(2):e0297502. doi: 10.1371/journal.pone.0297502. eCollection 2024. PLoS One. 2024. PMID: 38358990 Free PMC article.
  • Structural effects of inosine substitution in telomeric DNA quadruplex. Zheng YY, Dartawan R, Wu Y, Wu C, Zhang H, Lu J, Hu A, Vangaveti S, Sheng J. Zheng YY, et al. Front Chem. 2024 Jan 19;12:1330378. doi: 10.3389/fchem.2024.1330378. eCollection 2024. Front Chem. 2024. PMID: 38312345 Free PMC article.

Publication types

  • Search in MeSH

Related information

  • Cited in Books
  • PubChem Compound (MeSH Keyword)

Grants and funding

  • MC_U105178783/MRC_/Medical Research Council/United Kingdom

LinkOut - more resources

Full text sources.

  • Ovid Technologies, Inc.

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Portland Press Opt2Pay

Logo of portlandopen

Understanding biochemistry: structure and function of nucleic acids

Steve minchin.

School of Biosciences, University of Birmingham, Birmingham, United Kingdom

Julia Lodge

Nucleic acids, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), carry genetic information which is read in cells to make the RNA and proteins by which living things function. The well-known structure of the DNA double helix allows this information to be copied and passed on to the next generation. In this article we summarise the structure and function of nucleic acids. The article includes a historical perspective and summarises some of the early work which led to our understanding of this important molecule and how it functions; many of these pioneering scientists were awarded Nobel Prizes for their work. We explain the structure of the DNA molecule, how it is packaged into chromosomes and how it is replicated prior to cell division. We look at how the concept of the gene has developed since the term was first coined and how DNA is copied into RNA (transcription) and translated into protein (translation).

The structure of deoxyribonucleic acid

Deoxyribonucleic acid (DNA) is one of the most important molecules in living cells. It encodes the instruction manual for life. Genome is the complete set of DNA molecules within the organism, so in humans this would be the DNA present in the 23 pairs of chromosomes in the nucleus plus the relatively small mitochondrial genome. Humans have a diploid genome, inheriting one set of chromosomes from each parent. A complete and functioning diploid genome is required for normal development and to maintain life.

Discovery and chemical characterisation of DNA

DNA was discovered in 1869 by a Swiss biochemist, Friedrich Miescher. He wanted to determine the chemical composition of leucocytes (white blood cells), his source of leucocytes was pus from fresh surgical bandages. Although initially interested in all the components of the cell, Miescher quickly focussed on the nucleus because he observed that when treated with acid, a precipitate was formed which he called ‘nuclein’. Almost all molecular bioscience graduates would have repeated a form of this experiment in laboratory classes where DNA is isolated from cells. Miescher, Richard Altmann and Albrecht Kossel further characterised ‘nuclein’ and the name was changed to nucleic acid by Altmann. Kossel went on to show that nucleic acid contained purine and pyrimidine bases, a sugar and phosphate. Work in the 1930s from many scientists further characterised nucleic acids including the identification of the four bases and the presence of deoxyribose, hence the name deoxyribonucleic acid (DNA). Erwin Chargaff had found that DNA molecules from a particular species always contained the same amount of the bases cytosine (C) and guanine (G) and the same amount of adenosine (A) and thymine (T). So, for example, the human genome contains 20% C, 20% G, 30% A and 30% T.

DNA is a polymer made of monomeric units called nucleotides ( Figure 1 A), a nucleotide comprises a 5-carbon sugar, deoxyribose, a nitrogenous base and one or more phosphate groups. The building blocks for DNA synthesis contain three phosphate groups, two are lost during this process, so the DNA strand contains one phosphate group per nucleotide.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g1.jpg

( A ) A nucleotide (guanosine triphosphate). The nitrogenous base (guanine in this example) is linked to the 1′ carbon of the deoxyribose and the phosphate groups are linked to the 5′ carbon. A nucleoside is a base linked to a sugar. A nucleotide is a nucleoside with one or more phosphate groups. ( B ) A DNA strand containing four nucleotides with the nitrogenous bases thymine (T), cytosine (C), adenine (A) and guanine (G) respectively. The 3′ carbon of one nucleotide is linked to the 5′ carbon of the next via a phosphodiester bond. The 5′ end is at the top and the 3′ end at the bottom.

There are four different bases in DNA, the double-ring purine bases: adenine and guanine; and the single-ring pyrimidine bases: cytosine and thymine ( Figure 1 B). The carbon within the deoxyribose ring are numbered 1′ to 5′. Within each monomer the phosphate is linked to the 5′ carbon of deoxyribose and the nitrogenous base is linked to the 1′ carbon, this is called an N-glyosidic bond. The phosphate group is acidic, hence the name nucleic acid.

In the DNA chain ( Figure 1 B), the phosphate residue forms a link between the 3′-hydroxyl of one deoxyribose and the 5′-hydroxyl of the next. This linkage is called a phosphodiester bond. DNA strands have a ‘sense of direction’. The deoxyribose at the top of the diagram in Figure 1 B is not linked to another deoxyribose; it terminates with a 5′ phosphate group. At the other end the chain terminates with a 3′ hydroxyl.

DNA is the genetic material

Although many scientists, including Miescher, had observed that prior to cell division the amount of nucleic acid increased, it was not believed to be the genetic material until the work of Fredrick Griffith, Oswald Avery, Colin MacLeod and Maclyn McCarty. In 1928, Griffith showed that living cells could be transformed by extracts from heat-killed cells and that this transformation had the potential to permanently change the genetic makeup of the recipient cell. Griffith was working with two strains of the bacterium Streptococcus pneumoniae. The encapsulated so-called S strain is virulent, whereas the non-capsulated R strain is nonvirulent. If the S strain is injected subcutaneously into mice, the mice die, whereas, if either live R strain is injected or heat-killed S strain is injected, the mouse lives. However, if a mixture of live R strain and heat-killed S strain is injected into a mouse, the mouse will die, and live S strain can be isolated from the blood. So, in the Griffith experiment a component of the heat-killed S strain is transforming the R strain. In 1944, Avery, MacLeod and McCarty went on to show that it was DNA that could transform the avirulent bacterium. They isolated a crude DNA extract from the S strain and destroyed any protein, lipid, carbohydrate and ribonucleic acid (RNA) component and showed that this purified DNA could still transform the R strain. However, when the purified DNA was treated with DNAse, an enzyme that degrades DNA, transformation was lost.

Alfred Hershey and Martha Chase confirmed that DNA was the genetic material. They used a virus that infects bacteria called a bacteriophage. The bacteriophage contains a protein capsid surrounding a DNA molecule. They showed that when bacteriophage T2 infects Escherichia coli , it is the phage DNA, not protein, that enters the bacterial cell.

Determining the structure of DNA

Once it had been shown that DNA was the genetic material, there was a race to determine the three-dimensional structure of the DNA molecule. At King’s College London, Rosalind Franklin and Maurice Wilkins, having obtained data using X-ray diffraction, had proposed that DNA had a helical structure and Franklin had obtained a particularly good X-ray diffraction pattern. In Cambridge, James Watson and Francis Crick used model building together with data from a variety of sources including Franklin’s X-ray diffraction pattern and Chargaff’s base composition data to work out the now well-known double helix structure of DNA. Their work was published in Nature in 1953. The Watson–Crick structure is shown in Figure 2 A.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g2.jpg

( A ) The DNA double helix, with the sugar phosphate backbone on the outside and the nitrogenous bases in the middle. ( B ) An A:T and a G:C base pair with the C1′ of the deoxyribose indicated by the arrow. Note that the C1′ of the deoxyribose is in the same position in all base pairs. In this figure, the atoms on the upper edge of the base pair face into the major groove and those facing lower edge face into the minor groove. The hydrogen bonds between the base pairs are indicated by the dotted line.

DNA is a two-stranded helical structure, the two strands run in opposite directions. In Figure 2 A, one strand is running 5′ to 3′ top to bottom, whereas the other strand is running 3′ to 5′ top to bottom. The helix is right-handed which means that if you are looking down the axis, the helix turns clockwise as it gets further away from you. The two chains interact via hydrogen bonds between pairs of bases with adenine always pairing with thymine, and guanine always pairing with cytosine. The Watson–Crick structure therefore accounts for and explains the Chargaff data which showed that there was always an equal amount of C and G and of A and T. The regular nature of the double helix comes about because the distance between the 1′ carbon of the deoxyribose on one strand and 1′ carbon of the opposite deoxyribose is always the same irrespective of the base pair ( Figure 2 B). The 1′ carbons of the deoxyribose opposing nucleotides do not lie directly opposite each other on the helical axis, this means that the two sugar–phosphate backbones are not equally spaced along the helical axis resulting in major and minor grooves.

The diameter of the helix is 2 nm, adjacent bases are separated by 0.34 nm (0.34 × 10 −9 m) and related by a rotation of 36°, this results in the helical structure repeating every 10 residues. DNA molecules are normally very long and the sequence of bases along the DNA chain is not restricted. For example, the genome of the bacterium E. coli is a single circular chromosome which contains 4.6 million base pairs (4.6 × 10 6 bp), this is therefore 1.6 mm long (4.6 × 10 6 × 0.34 × 10 −9 m). The human genome is made up of 24 distinct chromosomes, chromosomes 1–22 and the X and Y chromosomes present in the nucleus plus mitochondrial DNA. The nuclear chromosomes vary in size from approximately 50–250 × 10 6 bp, the mitochondrial DNA is 17 × 10 3 bp. The total length of a haploid human genome is 3 × 10 9 bp. Within a single human diploid cell, which contains 23 chromosome pairs there is 2 m of DNA. Based on the assumption that humans contain 3 trillion cells with a nucleus, if all the DNA from a single human individual was put end to end, it would reach to the sun and back approximately 20 times.

Another important class of nucleic acids is RNA, the roles of RNA molecules in the cell will be discussed below. Chemically RNA is similar to DNA, it is a chain of similar monomers. The building blocks are nucleotides containing the 5-carbon sugar ribose, a phosphate and a nitrogenous base. The phosphate is attached to the 5′ carbon of the ribose and the nitrogenous base to the 1′ carbon ( Figure 3 ). RNA contains four bases adenine, guanine, cytosine and uracil. RNA is more labile (easily broken down) than DNA and most RNA molecules do not form stable secondary structures, some notable exceptions will be discussed below. The properties of RNA make it ideal as a genetic messenger during protein synthesis, the idea of this genetic messenger, mRNA, was proposed by François Jacob and Jacques Monod.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g3.jpg

An RNA strand containing the four nucleotides with the nitrogenous bases: adenine (A), cytosine (C), guanine (G) and uracil (U) respectively. The 3′ carbon of the ribose of one nucleotide is linked to the 5′ carbon of the next via a phosphodiester bond. The 5′ end on the left and the 3′ end on the right.

Packaging of DNA into eukaryotic cells

DNA has to be highly condensed to fit into the bacterial cell or eukaryotic nucleus. In eukaryotes, histone proteins are used to condense the DNA into chromatin. The basic structure of chromatin is the nucleosome, a nucleosome contains DNA wrapped almost two times around the histone octamer (comprising two copies each of the histone proteins H2A, H2B, H3 and H4) ( Figure 4 ). Further levels of compaction are required to fit the DNA into the nucleus ( Figure 4 ), the nucleosomes are folded upon themselves to form the 30-nm fibre, this is then folded again to form the 300-nm fibre and during mitosis further compaction can occur forming the chromatid which is 700 nm in diameter.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g4.jpg

Histone proteins (H2A, H2B, H3 and H4) associate to form a histone octamer. Approximately 147 bp of DNA wraps around histone octamer to form a nucleosome, generating a ‘beads on a string’ structure, the nucleosome together with histone H1 condense into the 30-nm fibre, there is further condensation to form the 300-nm fibre. During mitosis there is further compaction (not shown).

Processes such as DNA replication and DNA transcription need to occur in the chromatin environment and because of the level of compaction, this acts as a barrier to proteins that need to interact with DNA. Therefore, chromatin structure plays an important role in processes such as regulation of gene expression in eukaryotes. DNA and the histone proteins can be chemically modified, these are called epigenetic modifications as they do not change the DNA sequence, however, they can be passed on during cell division and to subsequent generations, a process known as epigenetic inheritance. As these epigenetic modifications can alter the chromatin structure they regulate gene transcription and can affect the phenotype. Epigenetics plays key roles in many processes, including development, cancer and behaviour and addiction. This will be discussed further later in this article.

Nuclear organisation plays an important role in many biological processes including regulation of gene transcription. In recent years the development of several techniques, including microscopy, have allowed us to gain an understanding of the way the genome is organised in 3D. Individual chromosomes are not randomly spaced within the nucleus; each chromosome has a distinct territory. Actively transcribed regions from different chromosomes are often close to each other and near the interior of the nucleus, whereas, inactive genes are on the periphery or near a special area called the nucleolus where ribosomal RNA is transcribed.

DNA replication

Whenever a cell divides there is a need to synthesise two copies of each chromosome present within the cell. For example in a human, prior to cell division, all 23 pairs of chromosomes need to be replicated to form 46 pairs, so that following cell division each daughter cell has a full complement (23 pairs) of chromosomes. The structure of DNA gives us a clue to how it is replicated, this was eloquently postulated by Watson and Crick in their 1953 paper: “It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material”. Each strand can act as a template for the synthesis of the complementary strand, so the replication machinery would ‘unzip’ the double helix and read along the two existing ‘parent’ strands, synthesising a complementary new ‘daughter’ strand with A opposite T, C opposite G etc. This is described as semi-conservative, since each ‘new’ double-stranded DNA molecule has one original parent strand and one newly made daughter ‘strand’.

The evidence that DNA replication was semi-conservative came from an elegant experiment completed by Matthew Meselson and Franklin Stahl. They labelled the parental DNA with a heavy isotope of nitrogen ( 15 N) by growing bacteria in a growth medium that contained 15 NH 4 Cl. They then grew the bacteria, in a medium that contained 14 NH 4 Cl, in conditions such that any newly synthesised DNA would contain 14 N. Since DNA replication is semi-conservative, after one round of DNA replication, each cell would have a DNA molecule that contains one ‘old’ parental strand labelled with 15 N and one ‘new’ daughter strand labelled with 14 N. This was shown by analysing the density of the DNA using density-gradient centrifugation. As predicted, they observed that the new daughter DNA molecule had a density consistent with the fact that it contained both 15 N and 14 N and that this daughter DNA contained one strand with 15 N and another strand with 14 N.

DNA polymerase and DNA synthesis

The enzyme, DNA polymerase, is responsible for DNA synthesis. DNA polymerase is a template-driven enzyme, so it will use the parental DNA strand as a template. It cannot synthesise DNA in the absence of a template. In addition, it will only add nucleotides on to the 3′ end of an existing nucleic acid chain. The building blocks for DNA synthesis are deoxynucleoside triphosphates (dATP, dTTP, dCTP and dGTP). During DNA synthesis, the base within the incoming deoxynucleoside triphosphate pairs with the complementary base on the template strand, a phosphodiester bond is formed between the 5′ phosphate on the incoming nucleotide and the free 3′ hydroxyl on the existing nucleic acid chain; pyrophosphate is released ( Figure 5 ).

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g5.jpg

( A ) DNA polymerase binds the template DNA and the new strand. The next nucleotide to be added to the 3′ end of the growing chain will contain guanine (G), this is complementary to the C on the template strand. DNA polymerase catalyses the formation of a phosphodiester bond. ( B ) The chemical reaction during the formation of a phosphodiester bond, showing the addition of a nucleotide containing guanine and the release of pyrophosphate.

Pyrophosphate is the two phosphate residues within the deoxynucleoside triphosphate building block that are not incorporated into the DNA chain. DNA polymerase synthesises DNA in the 5′ to 3′ direction, because it can only add nucleotides on to the 3′ end of the chain. DNA polymerase has proofreading activity, so after the phosphodiester bond has been formed, the base pairing is checked and if a nucleotide with an incorrect base has been added, DNA polymerase will remove the nucleotide using a 3′ to 5′ exonuclease activity. Exonucleases are enzymes that can remove nucleotides from the ends of a DNA molecule, 3′ to 5′ exonucleases remove nucleotides from the 3′ end of a DNA molecule and therefore can remove the last nucleotide that was added during DNA replication. This is analogous to using the delete key to remove a letter that you have typed incorrectly before adding the correct one and continuing typing.

DNA polymerase requires a short double-stranded region with a free 3′ hydroxyl in order to start making a copy of the template; this ensures that DNA is synthesised in a controlled way. Initiation of DNA synthesis uses a small RNA primer (8–12 bases) made by the enzyme primase. DNA polymerase will then extend from the primer copying the template and synthesising the daughter DNA strand. This means that when DNA synthesis first starts each DNA molecule actually contains a small piece of RNA at its 5′ end. This RNA will ultimately be replaced with DNA, how this is done is discussed below.

The origin of replication and the replisome

A large multiprotein complex, called the replisome, is responsible for DNA replication. In prokaryotes, two replisomes form at a specific point on the chromosome called the Origin of Replication ( ori ). The DNA in this region will be opened up, ‘unzipped’ so that the replication machinery can gain access to single-stranded parental DNA, which will act as template for synthesis of the new daughter strands. The two replisomes then travel in opposite directions around the circular prokaryotic chromosome, each replisome forming a replication fork, a schematic representation of one replication fork is shown in Figure 6 .

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g6.jpg

A single replication fork showing the leading and lagging strands. The leading strand is synthesised continuously, reading the template 3′ to 5′, synthesising DNA in the 5′ to 3′ direction. The lagging strand is synthesised discontinuously, in short Okazaki fragments (1000 bases in prokaryotes and 100 bases in eukaryotes).

The replication fork

Within the replication fork, on the so-called leading strand, DNA polymerase moves 3′ to 5′ with respect to the template and synthesises DNA in the 5′ to 3′ direction as it moves in the same direction as the replication fork. Although overall the lagging strand is synthesised in the 3′ to 5′ direction, it is actually synthesised discontinuously in small segments called Okazaki fragments, which are synthesised 5′ to 3′ ( Figure 6 ). Each Okazaki fragment will be started with an RNA primer and is synthesised in the opposite direction to the movement of the replication fork. In prokaryotes, Okazaki fragments are 1000–2000 bases in length. In Figure 6 you will see that the DNA polymerase synthesising the Okazaki fragment will eventually reach the primer for the previous Okazaki fragment. When this happens the primer for the previous fragment is removed by a DNA polymerase using 5′ to 3′ exonuclease activity. DNA polymerase then replaces the missing nucleotides by adding them to the 3′ end of the last Okazaki fragment. When all the primer has been removed, there will be two DNA strands adjacent to each other but not joined by a phosphodiester bond, these two strands are joined together by the enzyme DNA ligase.

The replisome contains a number of other important proteins required for DNA replication. The double-stranded DNA needs to be separated, ‘unzipped’, by a helicase to generate the single-stranded DNA templates for DNA polymerase. As the replication fork moves along the helical DNA, the coils in the DNA in front of the fork become compressed so the DNA is described as being overwound; a topoisomerase is required to ‘relax’ it by remove the over-winding. Single-stranded binding proteins (SSBs) bind the lagging strand template to stabilise and protect the single-stranded DNA.

The two replication forks that form at the ori will move in opposite directions around the circular prokaryotic genome until they reach the terminator sequence, ter , which is on the opposite side of the genome compared with the ori , i.e. it is at 6 o’clock compared with 12 o’clock. This results in the complete replication of the genome. Once DNA replication has been completed a post-replication DNA repair process will correct errors that were not corrected by the proofreading activity of DNA polymerase. The fidelity of DNA replication is extremely high, resulting in an error rate of 1 mistake per 10 9 –10 10 nucleotides added.

DNA replication in eukaryotes

DNA replication is essentially the same in eukaryotes and prokaryotes. In both cases two replisomes form at an ori and generate two replication forks moving in opposite directions away from the origin. In each replication fork there are leading and lagging strands. There are two major differences. The first is that, due to the larger genome size, each chromosome has multiple origins of replication, so there will be a large number of replication forks on each chromosome.

The second difference is that, with the exception of mitochondrial DNA, eukaryotic chromosomes are linear and this results in an issue because of lagging strand synthesis. Replication of a linear chromosome results in shortening of one 5′ end of each daughter DNA molecule. This is because when the primer required for the last Okazaki fragment is removed, DNA polymerase cannot fill the gap ( Figure 7 A). Repeated rounds of DNA replication results in shorter and shorter DNA molecules. If this is not corrected, eukaryotes would have become extinct as their chromosomes get shorter with each generation. Eukaryotes have a mechanism to preserve the ends of chromosomes when it counts; that is in the gametes. The terminal ends of chromosomes, telomeres, contain a highly repeated sequence, for example, in humans the sequence TTAGGG is repeated in tandem 100 to over 1000-times. Repeated rounds of DNA replication will result in the shortening of these telomeric sequences that is the number of repeats will reduce. Telomerase, an RNA containing enzyme, can add additional copies of the repeat sequence to the 3′ end, replacing those lost during DNA replication (see Figure 7 ).

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g7.jpg

( A ) Following DNA replication and removal of the primer for the last Okazaki fragment of the lagging strand, there will be a region at the 3′ end that is not base paired, called a 3′ overhang. ( B ) Telomerase binds and uses the RNA it contains to act as a template to extend the 3′ overhang. This extends the 3′ end sufficiently for a new RNA primer to bind and the final Okazaki fragment to be made.

This actually extends the 3′ end of the telomere rather than extending the 5′ that is initially lost during DNA replication. The RNA sequence within telomerase is complementary to the 3′ telomeric sequence and so can bind and act as a template for synthesis of a short DNA sequence. Telomerase then moves along the newly synthesised strand and the process is repeated. Multiple rounds of elongation and translocation ultimately results in the 3′ end being extended so that it is long enough for it to act as template for synthesis of another Okazaki fragment, hence extending both strands of the telomere. Only germ cells and a few other actively dividing cells (e.g. haematopoietic cells) have sufficient levels of telomerase activity to counteract the loss of repeat sequences during DNA replication. At birth, telomeres are over 10000 base pairs in length and there are enough repeats to allow DNA replication and somatic cell division during the lifetime of the organism. If telomeres become too short this will trigger programmed cell death (a process called apoptosis). The lack of telomerase activity in somatic cells limits the number of cell divisions that can occur, and this is a ‘problem’ that needs to be overcome by cancer cells. Telomerase activity is reactivated in most cancers, allowing these cells to divide indefinitely and therefore this activity is a potential target for cancer therapies.

An understanding of DNA synthesis is central to many experimental approaches in molecular biosciences, it allows us to determine DNA sequences including that of the human genome, to analyse environmental samples to better understand the living world around us and to analyse minute biological samples from crime scenes to identify offenders. It is exploited in medicine, for example several drugs used to treat HIV infection or exposure are nucleoside analogues that inhibit DNA synthesis. Many chemotherapy agents used to treat cancer target DNA replication.

The genetic code and the concept of a gene

As we have seen in the previous two sections, the genetic material in a cell is made of DNA and can be copied and passed on to progeny through DNA replication allowing for inheritance of the information that it carries. A large proportion of the information on the DNA is first transcribed into mRNA and then translated into proteins. However there are some RNAs that are never translated into proteins and these have important functions too. Phrases like ‘it is in my genes’ or ‘in my DNA’ are used in common speech to mean to be an important part of who someone is.

The term gene was coined in the early 1900s to describe the basic unit of heredity. Genes were thought of as distinct loci arranged lineally on chromosomes. Breeding experiments with the fruitfly Drosophila supported this view and showed that if two genes are close together on a chromosome they are more likely to be inherited together. The observation that mutations in genes could give rise to altered phenotypes gave rise to the ‘one gene one polypeptide’ hypothesis. Once it became clear that genes were made of DNA, what is referred to as the central dogma of molecular biology was coined. This describes a two step process in which the genes on the DNA are transcribed into RNA and then translated into a sequence of amino acids that makes up a protein. The information flow is from DNA to RNA and then to protein ( Figure 8 ).

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g8.jpg

The arrows represent steps where DNA or RNA is being used as a template to direct the synthesis of another polymer, either RNA or protein.

However there are exceptions to this, firstly some viruses have RNA genomes and in some cases these are reverse transcribed into DNA before the genes can be expressed. The retrovirus HIV is an example of this. The other exception is that not all functional RNAs are translated into proteins (see non-coding RNAs below).

The genetic code

The genetic code is the set of rules used by living cells to translate the information encoded within genetic material into proteins. When DNA and RNA were first discovered, the relative simplicity of nucleic acids led many scientists to doubt that it carried the genetic information. DNA only has four different kinds of bases; the question was how it could code for 20 amino acids. If there were a 1:1 correlation between bases and amino acids DNA could only encode four amino acids. Pairs of bases would give 16 possible combinations which is still not enough. However if you consider a triplet code you have 64 possibilities, which is more than enough. This is the code that we are familiar with where each codon, a sequence of three nucleotides, specifies a particular amino acid. This triplet code still did not seem logical because now you have far more codons than you need. There are some other important questions about the genetic code too; are the spare codons used? Is the code overlapping? And is it continuous or are there spacers indicating the end of each codon?

Table 1 shows the genetic code as we now understand it. It is written as RNA with a U rather than a T because it is RNA that cells translate into amino acids. The code is said to be redundant or degenerate because a single amino acid is often coded for by more than one codon. In most cases it is the third nucleotide in the codon that differs; this is often referred to as the degenerate position.

Evidence for the triplet code

The experiments that allowed scientists to decipher the genetic code were carried out long before we were able to determine the sequence of DNA. While it was possible at that time to determine the proportions of each different amino acid in a protein, it was not yet possible to work out the order in which they occurred. Francis Crick and Sydney Brenner answered some key questions with an experiment using mutants of a virus that infects bacteria called bacteriophage. The normal or wild-type phage will infect E. coli and grow. Crick and Brenner investigated mutants that would not grow on some strains of E. coli .

Mutants which are insertions or deletions cause what are called frameshifts. Inserting a single adenine base into the DNA sequence not only changes the amino acid at the position of the insertion but all subsequent amino acids translated from that sequence (Compare Figures 9 A and B); the reading frame has been shifted by one base and it results in a protein that is non-functional. However if you insert three nucleotides you often get a wild-type or near wild-type phenotype. This is because you have inserted a whole triplet codon, you will get one or two amino acids that were not in the original sequence but the reading frame is not shifted ( Figure 9 C) and the rest of the sequence is normal.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g9.jpg

( A ) Wild-type sequence, ( B ) a single base insertion (shown in red) causes a frameshift so all subsequent amino acids are different from the wild-type, ( C ) insertion of three base pairs (shown in red) causes two incorrect amino acids to be incorporated into the protein but there is no frameshift so the rest of the protein has the wild-type sequence.

Crick and Brenner were looking for what they called suppressor mutations that would rescue the mutant and allow it to grow normally. They showed that their suppressor mutants did not simply reverse the original mutation; they often added or subtracted one or more bases. They worked out that if you insert or delete one, two or four nucleotides then you see a mutant phenotype. However, if you insert or delete three nucleotides, this has little or no effect. This was strong supporting evidence for a triplet code. This is also evidence for a redundant code where the same amino acid can be coded in more than one way. If the code were non-redundant there would be 20 codons that code for amino acids and 44 that are ‘nonsense’ codons. In this case inserting three nucleotides would be most likely to introduce a nonsense codon and not restore the wild-type. Crick and Brenner proposed correctly that the genetic code is read from a fixed starting point and the bases are read in groups of three.

Cracking the code

At about the same time two American scientists Marshall Nirenberg and Heinrich Matthaei had developed a cell-free system which could synthesise proteins in a test tube when provided with an RNA molecule. They showed that when provided with an artificial RNA chain composed only of uracil (polyuracil) the system made a polypeptide composed entirely of phenyalanine residues. They now had a tool that they could use to crack the genetic code. RNA composed of cytosine (C) residues directed the synthesis of polyproline and RNA composed of adenosine (A) made polylysine. Experiments with combinations of nucleotides demonstrated that, for example, if you make RNA from A and C you produce proteins containing only six amino acids: asparagine, glutamine, histidine, lysine, proline and threonine. There are eight possible triplet codons that can be made from A and C, two of these we know encode proline and cysteine. The remaining four amino acids must be encoded by other combinations of A and C. This of course provides additional evidence for the redundancy of the genetic code.

These experiments using RNA molecules composed of random combinations of two or three bases were not enough to fully crack the genetic code. The use of chemically synthesised RNA molecules of known repeating sequence added some more important information. For example a synthetic RNA of alternating A and G residues (AGAGAGAGAG…) can be read as two alternating codons CAC and ACA. It encodes a protein of alternating histidine and threonine residues.

In the last section, we will discuss how tRNAs and ribosomes decode the genetic code and synthesise proteins. The final detail of the genetic code was determined by a technique using ribosome-bound tRNAs. Pieces of RNA as short as a single codon will bind to ribosomes and if amino acids attached to tRNA are added they will associate with the complementary RNA. If you then filter the solution you trap only the tRNAs that are bound to the ribosome, these are the ones specified by the codon in your RNA.

Start and stop codons

Of the 64 possible codons, 61 encode amino acids. The three remaining codons: UAA, UAG and UGA do not code for an amino acid, they are sometimes called nonsense codons. They are stop codons; when the ribosome encounters these protein synthesis stops. The AUG codon encodes the amino acid methionine but it is also the most common start codon. As you will see in the last section, the first residue in eukaryotic proteins is always a methionine and in prokaryotes it is a modified amino acid N-Formylmethionine.

Expanding the genetic code

Nature uses a small set of amino acids to make proteins, however if we were able to engineer cells that could use a wider range of building blocks with different physical and chemical properties it would be possible to make novel materials some of which could have useful therapeutic properties; this is one of the aims of synthetic biology. To do this successfully we need to reprogramme the genetic code and to engineer the translation machinery (see later section) to use these new combinations. Some progress has been made, for example in using both the UAG stop codon and the AAG codon for arginine to code for amino acids not normally found in proteins.

Current concept of the gene

Once the genetic code was cracked it was clear that a gene is a sequence of bases on a DNA molecule that codes for a sequence of amino acids in a polypeptide chain or for an RNA molecule with a specific function. The availability of DNA sequences (see ‘Recombinant DNA Technology and DNA Sequencing’ in this issue of Essays in Biochemistry ) of individual genes made it possible to look for patterns characteristic of genes. A gene that codes for a protein has a start codon followed by a series of codons that encode the amino acid sequence and then a stop codon; this is called an open reading frame.

Whole genome sequencing has provided biological data on an unprecedented scale. The need to analyse sequence data has led to the development of the field of bioinformatics; the analysis of these data to answer biological questions. One key concept used in bioinformatics is that of homology. Two organisms that have a common ancestor are said to be homologous and the same can be said of a structure or of a gene. For example limbs with five digits (the pentadactyl limb) are found not only in humans and other mammals but also in birds, reptiles and amphibians. The limbs are homologous, and this is evidence of a common evolutionary ancestor of all of these groups of animals. The same is true of genes. All vertebrates have red blood cells that contain haemoglobin, adult human haemoglobin is made from two α and two β globin molecules. The DNA sequence of the genes that encode globin molecules in vertebrates are all similar to each other and you can estimate how long ago two animals shared a common ancestor by looking at how similar their globin genes are. This principle can also be used to find genes in a new piece of DNA sequence; if there is a section of sequence that is similar to a known gene then it is likely to encode a homologous gene.

A gene is more than just the sequence that encodes the protein; it also includes sequences involved in regulation of gene expression such as promoter sequences that define where transcription starts and are the sites where proteins involved in transcription bind to the DNA. In bacteria, almost all genes are a single uninterrupted sequence of DNA. In eukaryotes the situation is more complicated because the coding region is usually interrupted by introns. The primary transcript is referred to as precursor or pre-mRNA, this contains both exons and introns. The introns are removed when the pre-mRNA is processed before it leaves the nucleus ( Figure 10 ) leaving the exons which are spliced together to make the mature mRNA. Eukaryotic mRNAs have a 5′ cap which is a methylated guanosine nucleotide added to the 5′ end of the mRNA by an unusual 5′ to 5′ linkage; this is important in initiating translation. At the 3′ end is the poly A tail, this is a chain of between 100 and 250 adenine residues added to the mRNA to increase its stability. Analysis of the human genome sequence suggests that there are approximately 20000–25000 protein-coding genes, however there are far more different proteins. This is because many genes are capable of encoding several variants of a protein. Alternative splicing allows for different combinations of exons to be included in the mature mRNA and genes can also have several alternative promoters and alternative poly A sites. It is thought that 95% of human genes are alternatively spliced.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g10.jpg

The DNA includes an untranslated region at both the 5′ and 3′ ends as well as introns and exons. The codon where translation starts (green) and the stop codon (red) are shown. The DNA is transcribed into mRNA and is processed by addition of the 5′ cap, splicing out the introns and addition of the poly A tail. This mature mRNA is exported from the nucleus into the cytoplasm.

Non-coding RNAs

Only approximately 1.2% of the human genome codes for protein. However, if you compare the genomes of the human, the mouse and the dog you can see that much more of the genome is under what is called negative selection since the species diverged. Negative selection means that mutations which are disadvantageous are selected against. This suggests that more than just the protein coding regions affect the fitness of the organism carrying the DNA. Some of these are DNA sequences that are important in controlling gene expression (next section). However systematic screens are revealing large numbers of RNA transcripts that are processed but do not encode proteins. The most well-known are transfer RNAs and ribosomal RNAs both of which as we will see in a later section are fundamental to protein synthesis. However we are beginning to understand that there are other non-coding RNAs that carry out important cellular processes.

Two types of non-coding RNA, small inhibitory RNAs (siRNAs) and microRNAs (miRNAs) have a role in reducing gene expression after the mRNA has been transcribed from the DNA. They work by targeting a protein complex called RISC to specific mRNAs which it then degrades. Expression of the gene is specifically knocked out or reduced and the phenotypic effect of this can then be observed. Another group of non-coding RNAs play an important role in increasing the stability and correct folding of ribosomal RNAs. This process takes place in a compartment within the nucleus called the nucleolus; the RNAs are called small nucleolar RNAs (snoRNAs). These are mostly generated from intron RNA after it has been spliced out of the precursor mRNA and they function in association with proteins.

Modern concept of a gene

The modern concept of the gene has to take into account all of the complexity of mRNA processing including alternative splicing, regulatory sequences and polyadenylation sites as well as the plethora of non-coding RNAs. A definition of a gene that takes these factors into account would be that a gene codes for one or more transcripts that can function as an RNA or can be translated into one or more proteins.

Transcription

We have seen that a gene can encode either an RNA product or a protein sequence. The production of both requires the gene to be transcribed into RNA, either because the RNA is the final product or because the RNA will need to act as template for protein synthesis. RNA synthesis is very similar in prokaryotes and eukaryotes, being catalysed by the enzyme RNA Polymerase. However, of the processes discussed in this article it is arguably the one that differs most between prokaryotes and eukaryotes. One difference is that in eukaryotes the whole process needs to occur in a chromatin context, so access to the DNA template is limited. Regulation of gene expression is a major facilitator of cell differentiation, homoeostasis and speciation. Different cell types turn on transcription of different genes giving rise to their differentiated phenotypes. If we look at mammals as an example of speciation, they all have roughly the same gene content; it is how transcription is regulated that has changed as mammals have evolved. For example, if you compare humans and mice, the important changes to the human and mouse genome sequence that have occurred since they diverged from a common ancestor, are predominantly in the sequences that control transcription rather than in protein coding sequences.

RNA polymerase

DNA-dependent RNA polymerases are responsible for transcription of DNA into RNA. Like DNA Polymerase, RNA polymerase requires a DNA template and nucleoside triphosphate precursors. RNA polymerase does not require a primer. During RNA synthesis, the base within the incoming nucleoside triphosphate pairs with the base on the DNA template, a phosphodiester bond is formed, and pyrophosphate is released. RNA polymerase synthesises RNA in the 5′ to 3′ direction, because it can only add nucleotides on to the 3′ end of the chain. During transcription only one DNA strand is transcribed into RNA.

Gene transcription

When a gene is transcribed, RNA polymerase will bind upstream from the start of the gene, it will unwind almost two turns of the DNA helix to form a transcription bubble, it will add nucleotides on to the growing RNA chain, the last 12 nucleotides to be added to the RNA chain will base pair with the DNA template, forming a DNA–RNA heteroduplex ( Figure 11 ).

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g11.jpg

As each nucleotide is added to the growing chain, the transcription bubble and the heteroduplex moves with respect to the DNA template. So, as RNA polymerase synthesises RNA, there is unwinding of the DNA template in front of the site of synthesis and rewinding of DNA once RNA polymerase has passed through. Once RNA polymerase has transcribed the gene, transcription will terminate. For some genes, transcription termination is signalled by a particular sequence within the DNA, a terminator sequence, which RNA polymerase recognises. In some cases, RNA polymerase requires the help of other protein factors to recognise the terminator sequence. Finally, many eukaryotic genes do not contain a specific terminator sequence; instead, termination of transcription is linked to other events, for example cleavage of the RNA prior to addition of the polyA tail. Termination of transcription, leads to the dissociation of RNA polymerase from the DNA template and release of the RNA product. In prokaryotes, mRNA does not need processing before it can be translated, in fact, as will be discussed below, mRNA is translated as it is being made. However, the initial transcript in eukaryotes does need to be processed to produce a functional mRNA that can be exported to the cytoplasm for translation.

Control of transcription in prokaryotes

At many genes in prokaryotes, RNA polymerase can bind to the gene and initiate transcription without other protein factors. However, for most prokaryotic genes, the binding of RNA polymerase to the gene is controlled by transcription factors to ensure the correct genes are transcribed at the correct level within the cell. Upstream from the transcription start there will be a ‘promoter’ which contains specific DNA sequences that are recognised by RNA polymerase and transcription factors. Each gene will have a different promoter sequence and can be controlled by different transcription factors. A good example of this type of promoter is the promoter that controls the lac operon in E. coli ( Figure 12 ). Transcription factors that up-regulate transcription are called activators and those that down-regulate transcription are called repressors. In this example RNA polymerase on its own can bind the promoter and drive low levels of transcription. If the repressor binds it will stop all transcription and would override RNA polymerase and the activator. In the absence of the repressor, if the activator is present then it can drive high levels of transcription.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g12.jpg

The different binding sites for transcription factors are shown on the DNA; ABS, activator binding site, RBS, repressor binding site. The left-hand panel indicates the presence of lactose and/or glucose in the environment, the right-hand panel indicates transcription levels.

The lac operon codes for genes required to use lactose and needs to be controlled in response to glucose and lactose concentrations. A repressor protein is responsible for responding to lactose concentration and an activator is responsible for responding to glucose ( Figure 12 ).

In the absence of lactose, the lac operon is kept in an off state by the repressor protein binding to the promoter and stopping transcription. If lactose is present in the cell, it will bind to the repressor and this stops the repressor binding the promoter, RNA polymerase can bind and drive low levels of transcription. If the cell is starved of glucose, the activator is turned on and this binds the promoter and helps RNA polymerase to initiate transcription, resulting in high rates of transcription.

In the examples above, RNA polymerase on its own drives low levels of transcription. This might not be the case for all promoters, at some promoters RNA polymerase may not be able to bind and drive transcription without an activator protein. At other promoters RNA polymerase on its own will be able to drive high levels of transcription and a repressor protein would be needed to turn off transcription.

Control of transcription in eukaryotes

Control of transcription in eukaryotes has to occur on a chromosome which is condensed into chromatin ( Figure 13 ). In addition, transcription requires the assembly of a large multiprotein complex at the gene. This complex will contain RNA polymerase and several other general transcription factors (GTFs). The core promoter is a region that overlaps the transcription start and is the binding site for RNA Polymerase and the GTFs. In addition, there will be further control sequences, enhancers, that can be just upstream or several 1000 base pairs away from the core promoter. In the absence of activator proteins the chromatin structure will stop RNA polymerase and the GTFs binding to the core promoter. Here histone proteins act as generic repressors of transcription. In order for a transcription to be turned on activators will bind the enhancers and recruit co-activators which open up the chromatin structure and ensure the core promoter is not blocked by histone proteins. The activators and co-activators will then assemble RNA polymerase and the GTF at the core promoter and drive transcription initiation. Transcription factors will also ensure the chromatin structure across the whole gene is in a conformation that is suitable for transcription.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g13.jpg

( A ) When a gene is in a silent state the surrounding DNA will be in condensed chromatin and the histones will epigenetic modifications which facilitate gene repression (red spheres). ( B ) A gene that is being transcribed will have activators bound to enhancer sequences, the activators recruit co-activators that acetylate the histone and add other epigenetic modifications that facilitate gene transcription (green spheres). The activator and co-activators will recruit RNA Polymerase and the GTFs to the core promoter.

Repressors are not normally required to block assembly of transcription complex at the core promoter, however, they are important in the regulatory patterns needed in complex multicellular organisms. Eukaryotes have repressor proteins which can block the action of a specific activator and ensure the activator is only active when required. Repressors can work in a number of ways including binding to DNA and blocking the binding of the activator to the DNA, stopping the activator interacting with other proteins required for transcription or by binding to the activator and keeping the activator in the cytoplasm.

Epigenetics

As discussed above transcription initiation in eukaryotes requires the opening up of the chromatin structure. This is facilitated by co-activator proteins that can move the relative position of the nucleosomes ( Figure 4 ) with respect to the DNA and hence make certain regions of the DNA more accessible. They can also add chemical tags to both the histone proteins and DNA ( Figure 13 ). These epigenetic modifications can affect whether a gene or genomic region is available for transcription or is transcriptionally silenced. Histones are acylated by enzymes which transfer an acetyl functional group to from acetyl-coenzyme A to lysine residues in the histone protein. This is linked to activation of transcription because it reduces the positive charge on histones and therefore reduces their affinity for the negatively charged DNA. Acetylation can also act as a tag that is recognised by other proteins that drive gene transcription. This modification of the DNA is described as epigenetic because it affects gene expression rather than the genetic code itself. Conversely some repressor proteins will recruit co-repressors that deacetylate histones, increasing their affinity for DNA causing the chromatin to be highly condensed and leading to transcriptional silencing. Methylation of lysine residues is another epigenetic tag, a single lysine residue can have 1, 2 or 3 methyl groups added. Unlike acetylation, methylation of lysine residues does not change the positive charge. The consequences of histone methylation are more complex because depending on which lysine residue is methylated and the level of methylation, the tag may mark that region of the genome for transcription activation or repression.

DNA methylation is another important epigenetic modification which leads to transcriptional silencing of the genomic region that has been methylated. During differentiation in the developing embryo whole regions of the genome will be methylated and therefore transcriptionally silenced. The DNA methylation patterns are maintained during cell division and future generations of that cell.

Analysing transcription on a global scale

For many years individual scientists would study the transcriptional regulation of their ‘favourite’ gene and so we gained an understanding of how individual genes were regulated in response to different development or environmental signals, for example the control of the lac operon in response to lactose and glucose. In the last 15 years, many techniques have been developed to allow us to study transcriptional control of genes within a cell. Using techniques such as ‘RNA-Seq’, we can isolate the total RNA from a cell and use high-throughput sequencing to catalogue the level of transcription of all genes. In the case of eukaryotes this will also show how they have been spliced. It is also possible to analyse the binding of transcription factors and study epigenetic changes within histone proteins across the genome using techniques such as ChIP-Seq. So, combining techniques such as RNA-Seq and ChIP-Seq we can determine when and where a protein factor is bound to DNA and study epigenetic changes in a particular cell type and the consequences in terms of gene transcription. In combination these techniques give a detailed picture of the factors that affect transcription; this has been used, for example to look at differences between cancer cells and normal cells from the same patient.

Transcription and disease

Transcription factors and promoters play major roles in health and disease, below are just a few examples to give an idea of their role in health and disease.

  • The transcription factor p53 is a tumour suppressor protein, it guards against cancer and some human cancers have mutations that knock out p53 function.
  • The drug Tamoxifen used in the treatment of breast cancer binds the oestrogen receptor inhibiting its function. The oestrogen receptor is a transcription factor that turns on the transcription of genes in response to oestrogen.
  • Rett Syndrome is a neurodevelopmental disorder that affects approximately 1 in 15000 female births. It is due to mutations in a transcription factor that would normally repress transcription of specific genes, the mutations lead to inappropriate transcription of these genes.
  • Cocaine use results in changes in expression of many genes, this can include epigenetic changes within genes involved in cognition and brain function. These epigenetic changes can be inherited and there is evidence that cocaine use by a father can result in epigenetic changes that result in male, but not female, offspring being cocaine resistant.

Translation of RNA into proteins

The key player in protein synthesis is the ribosome, a complex structure composed of RNA and proteins. The ribosome provides a framework that ensures that the mRNA and tRNA are correctly positioned enabling the deciphering of the genetic code. There are many other proteins that are important in protein synthesis; some of these are part of the ribosome and some are again correctly positioned by the framework of the ribosome. As we will see, the small subunit ribosomal RNA is a ribozyme; an RNA molecule with catalytic properties similar to those of enzymes. Ribosomal RNA can form a peptide bond between two amino acids.

Transfer RNA

The other nucleic acid that you need for protein synthesis is the tRNA. The tRNA molecule is single stranded and folds up into a characteristic structure by base pairing ( Figure 14 ). These act as adaptor molecules, each has an anticodon for a specific mRNA codon and each carries the amino acid specified by that codon. The anticodon has a complementary sequence to the codon on the mRNA.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g14.jpg

( A ) Tertiary structure of the phenylalanine tRNA from yeast showing the anticodon (grey), the acceptor stem (violet) with the nucleotides CAA at the 3′ OH end (yellow). Image modified from ‘TRNA-Phe yeast’ Yikrazuul (licensed under CC BY-SA 3.0). ( B ) Clover leaf representation of the secondary structure of tRNA.

The enzymes which attach amino acids to tRNAs are called aminoacyl tRNA synthetases; they recognise a specific amino acid and the corresponding tRNA. The reaction also requires ATP, it is carried out in two steps:

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-e1.jpg

In the first step the enzyme hydrolyses ATP releasing pyrophosphate (PP) and in the second it attaches the amino acid to the 3′ hydroxyl of the tRNA. Aminoacyl tRNA synthetase enzymes are highly specific, they recognise specific amino acids and will only attach them to the correct tRNA. This ensures correct coupling of amino acids and tRNA molecules which is just as important in ensuring the fidelity of protein synthesis as the matching of the anticodon to the codon by the ribosome. In addition this step is said to activate the aminoacyl tRNA as it not only produces the correct substrate for the ribosome but also provides much of the energy required for peptide bond formation during protein synthesis.

Structure of the ribosome

All living things contain ribosomes. The ribosomes in bacteria are slightly smaller than those found in eukaryotic cells ( Table 2 ) but the overall structure and the way in which they work are essentially the same. The 2009 Nobel Prize for Chemistry was awarded to three scientists, Ada Yonath, Thomas Steitz and Venkatraman Ramakrishnan, who used X-ray crystallography to solve the three-dimensional structure of the bacterial ribosome. The ribosome is composed of two subunits, the small subunit which reads the messenger RNA and the large subunit which forms the bonds between amino acids, adding them to the growing polypeptide chain. There are three important binding sites for tRNAs in the ribosome which are at the interface between the two subunits and only formed when the two subunits come together. These sites are shown on the image in Figure 15 , they are referred to as the acceptor or aminoacyl (A) site, the peptidyl (P) site where the peptide bond between amino acids is formed and the exit (E) site from which spent tRNAs leave the ribosome.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g15.jpg

In ( A ) the new tRNA is delivered to the ribosome by elongation factor EF-Tu (purple). In ( B ) the amino acid on the incoming tRNA is brought close to the amino acid on the tRNA in the peptidly site to facilitate peptide bond formation (bright green) (Adapted from Goodsell 2010 , licensed under CC-BY-4.0 licence).

Bacterial ribosomeEukaryotic ribosome
Size70S , 2.3–2.6 MDa80S, 3.3–4.5 MDa
Small subunit30S40S
Approximately 20 proteins32 proteins
16S rRNA, ∼1600 nucleotides18S rRNA
Large subunit70S60S
Approximately 33 proteins79–80 proteins
23S rRNA, ∼2900 nucleotides25S rRNA
5S rRNA, ∼120 nucleotides5.8S & 5S

In addition to the ribosome, the mRNA and tRNA, there are a number of small proteins that are not part of the structure of the ribosome, but are required for protein synthesis: initiation factors, elongation factors and termination factors. The importance of these factors is illustrated by the inherited condition Vanishing White Matter Disease (VWM). This serious neurodegenerative disease which results in lesions in the white matter in the brain is due to mutations in one of the initiation factors.

Protein synthesis

During protein synthesis the ribosome brings together the amino acid charged tRNA and the mRNA, the codon and anticodon are matched and the amino acids are joined together in the correct sequence. There are three phases to this process: initiation where the ribosome assembles on the mRNA, elongation where the triplet code is read and amino acids are added to the growing peptide chain and termination where protein synthesis stops.

A complex of proteins called the cap-binding complex bind to the 5′ cap of the mRNA ( Figure 10 ) in the nucleus. The mRNA is then exported to the cytoplasm where it recruits initiation factors, tRNA charged with a methionine and the small (40S) ribosomal subunit. Initiation factors also bind and the small subunit scans along the 5′ untranslated region of the mRNA until it encounters the first AUG start codon (Figure 16A). This is recognised by the anticodon codon of the initiator tRNA, the large subunit then docks to give the translation complex. The 80S ribosome with the tRNA charged with methionine at the P site is now ready to accept the next tRNA ( Figure 16 B).

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g16.jpg

( A ) During initiation, the mRNA recruits a tRNA charged with a methionine and the small ribosomal subunit, ( B ) the large subunit then docks to give the translation complex, ( C ) a tRNA with an amino acid attached enters the A site, ( D ) the peptide bond is formed between the amino acid in the P site and the one in the A site. The effect is that the growing peptide chain is transferred to the incoming aminoacyl tRNA in the A site leaving an empty tRNA in the P site. ( E ) Finally, everything moves along the mRNA by one codon in a process called translocation so the peptidyl tRNA with the growing peptide chain attached moves to the P site and the spent tRNA to the E site from where it leaves the ribosome. ( F ) When a stop codon is in the A site, a termination or release factor enters the A site, ( G ) the peptide is released from the ribosome and ( H ) the two subunits of the ribosome disassociate and are recycled.

With initiation complete, the mRNA is in the correct reading frame with the A site empty and the next codon exposed. In the elongation phase an aminoacyl tRNA, one charged with an amino acid, is brought to the ribosome in a complex with an elongation factor and enters the A site. If the anticodon it carries is complementary to the exposed codon it is correctly positioned in the acceptor site and GTP is hydrolysed on the elongation factor ( Figure 16 C). A peptide bond ( Figure 17 ) is then formed between the C terminus of the amino acid in the P site and the N terminus of the amino acid in the A site, this reaction is catalysed in the peptidyl transfer centre of the large subunit of the ribosome. The effect is that the growing peptide chain is transferred to the incoming aminoacyl tRNA in the A site leaving an empty or spent tRNA in the P site ( Figure 16 D). Finally, the peptidyl tRNA with the growing peptide chain attached moves to the P site. This step is called translocation and the energy being provided by hydrolysis of GTP by the elongation factor EF-G. The spent tRNA moves to the exit site from where it can leave the ribosome. The mRNA moves so that the next codon is exposed in the A site ( Figure 16 E) ready to accept a new aminoacyl-tRNA charged with another amino acid. During the elongation phase the ribosome cycles through this process, adding amino acids to the growing peptide chain until a stop codon is exposed in the A site. The new protein emerges from the ribosome through an exit tunnel in the large subunit.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g17.jpg

( A ) Amino acids consist of a carbon atom with an amine group (the N terminus), a carboxylic acid group (the C terminus) and a variable R group. The simplest R group is a methyl group giving the amino acid alanine. ( B ) When two amino acids are joined together a peptide bond is formed between the N terminus of one amino acid and the C terminus of another. This is a condensation reaction releasing one molecule of water.

Termination

The stop codon is not decoded by being recognised by an anticodon on a tRNA. Instead it is detected by proteins called termination or release factors. In eukaryotes there is a single release factor (RF1) that recognises all three stop codons enters the A site ( Figure 16 F). The ester bond linking the peptide chain to the tRNA in the P site is broken and the peptide is released from the ribosome ( Figure 16 G) The two subunits of the ribosome disassociate and are recycled ( Figure 16 H).

The structure and function of ribosomes are highly conserved with a large core of structurally conserved proteins and rRNAs found in both eukaryotic and prokaryotic ribosomes. However, there are some differences both in the rRNAs and in some of the additional proteins involved in translation ( Table 2 ). The elongation phase is highly conserved but there are important differences in how protein synthesis is initiated. Bacterial mRNAs have a specific sequence called the ribosome binding site or Shine–Dalgarno sequence. In order to ensure that the mRNA is correctly positioned in the ribosome the Shine–Dalgarno sequence binds to a complementary sequence of the 16S rRNA in the small subunit. In bacteria the initiator tRNA is charged with a modified amino acid N-Formylmethionine.

Differences between the structure of bacterial and eukaryotic ribosomes can be exploited by antibiotics which are selective in that they affect protein synthesis in bacteria but not in mammalian cells. Macrolide antibiotics like erythromycin, block the exit tunnel in the large subunit of bacterial ribosomes and halt protein synthesis. The exit tunnel in eukaryotic ribosomes is slightly narrower which means that eukaryotic ribosomes are not affected. Streptomycin, an important antibiotic in the treatment of tuberculosis binds to the 16S of bacterial ribosomes. This distorts the structure of the decoding site and results in misreading of the mRNA.

Polyribosome

Protein synthesis can proceed very quickly, particularly in rapidly growing cells or those that are differentiating. In bacteria between 15 and 20, new peptide bonds can be formed per second. In eukaryotes it is slower, more like five peptide bonds per second. A small human protein like insulin would take only 10 seconds to make whereas the largest human protein titin, which is found in human muscle cells, takes about an hour and a half per molecule. One of the mechanisms that ensures that protein synthesis is carried out efficiently is the polyribosome. As soon as one ribosome has started translation another ribosome binds to initiate synthesis of another protein copy. This gives rise to polyribosomes or polysomes which can be seen by electron microscopy. Recent cryo-EM images show that ribosomes can be arranged very closely on the mRNA with the mRNA entry and exit channels aligned to allow the smooth passage of mRNA between them ( Figure 18 ). Sometimes these polyribosomes can form circular structures so that, as soon as the ribosome has finished synthesis of one polypeptide it can rebind the same mRNA molecule and start synthesis of another copy of the protein.

An external file that holds a picture, illustration, etc.
Object name is ebc-63-ebc20180038-g18.jpg

Cyro-electron micrograph reconstruction of eukaryotic polyribosome. Reprinted from ( Myasnikov 2014 ) by permission.

Closing remarks

The study of nucleic acids, from their first identification as the genetic material is littered with landmarks in molecular biosciences, many of them marked with Nobel Prizes. Since Watson and Crick proposed their structure of DNA our knowledge about DNA and how it works has expanded almost exponentially. The topics introduced in this article are important topics covered in all bioscience programmes; understanding them is key to all areas of biosciences from evolution and animal diversity to health and disease. Recent developments in the techniques that we can use to study DNA, often in living cells means that new and exciting developments in our understanding of the way nucleic acids work are occurring all the time. Given the scope of this article we have barely scratched the surface of the topic, however, the reader can find more detail from the articles in the bibliography below and even more detail from a few minutes searching on the internet.

Abbreviations

DNAdeoxyribonucleic acid
GTFgeneral transcription factor
oriorigin of replication
RNAribonucleic acid
RISCRNA-induced silencing

complex

Competing interests

The authors declare that there are no competing interests associated with the manuscript.

Recommended reading and key publications Nobel lectures

  • Blackburn E.H. (2010) Telomeres and Telomerase: The Means to the End (Nobel Lecture) 49 , Int. Ed., pp. 7405–7421, Angewandte Chemie [ PubMed ] [ Google Scholar ]
  • Ehrenberg M. (2009) Scientific Background on the Nobel Prize in Chemistry 2009 Structure and Function of the Ribosome , The Royal Swedish Academy of Sciences, https://www.nobelprize.org/uploads/2018/06/advanced-chemistryprize2009.pdf [ Google Scholar ]
  • Kornberg R.D. (2007) The Molecular Basis of Eukaryotic Transcription (Nobel Lecture) 32 , Int. Ed., pp. 12955–12961, Angewandte Chemie [ PubMed ] [ Google Scholar ]

Review articles

  • Afonina Z.A. and Shirokov V.A. (2018) Three dimensional organization of polyribosomes–a modern approach . Biochemistry (Moscow) 83 , S48–S55 10.1134/S0006297918140055 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gerstein M.B, Bruce C., Rozowsky J.S., Zheng D, Du J., Korbel J.O.. et al. (2007) What is a gene, post-ENCODE? History and updated definition . Genome Res. 17 , 669–681 10.1101/gr.6339607 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kruglyak L. and Stern D.L. (2007) An embarrassment of switches . Science 317 , 758–759 10.1126/science.1146921 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Minchin S.D. and Busby S.J.W. (2013) Transcription factors . In Brenner’s Encyclopedia of Genetics (Maloy S. and Hughes K., eds), Elsevier, U.S.A. [ Google Scholar ]
  • Roberts M. (2019) Recombinant DNA technology and DNA sequencing . Essays Biochem. 63 , 10.1042/EBC20180039 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Roeder R.G. (2003) The eukaryotic transcriptional machinery: complexities and mechanisms unforeseen . Nat. Med. 9 , 1239–1244 10.1038/nm938 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Historical perspectives

  • Dahm R. (2008) Discovering DNA: Friedrich Miescher and the early years of nucleic acid research . Hum. Genet. 122 , 565–581 10.1007/s00439-007-0433-0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McCarty M. (2003) Discovering genes are made of DNA . Nature 421 , 406 10.1038/nature01398 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Maddox B. (2003) The double helix and the “wronged heroine” . Nature 421 , 407–408 [ PubMed ] [ Google Scholar ]
  • Kemp M. (2003) The Mona Lisa of modern science . Nature 421 , 416–420 10.1038/nature01403 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Original research papers

  • Crick F.H.C., Barnett L., Brenner S. and Watts-Tobin R.J. (1961) General nature of the genetic code for proteins . Nature 192 , 1227–1232 10.1038/1921227a0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Franklin R.E. and Gosling R.G. (1953) Molecular configuration in sodium thymonucleate . Nature 171 , 740–741 10.1038/171740a0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Meselson M. and Stahl F.W. (1958) The replication of DNA in Escherichia coli . Proc. Natl. Acad. Sci. U.S.A. 44 , 671–682 10.1073/pnas.44.7.671 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Watson J. and Crick F. (1953) Molecular structure of nucleic acid. A structure for deoxyribose nucleic acid . Nature 171 , 737–738 10.1038/171737a0 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Citations for figures

  • Goodsell D. (2010) Molecule of the month: ribosome . https://pdb101.rcsb.org/motm/121 [ Google Scholar ]
  • Myasnikov A.G. (2014) The molecular structure of the left-handed supra-molecular helix of eukaryotic polyribosomes . Nat. Commun. 5 , 5294 10.1038/ncomms6294 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yikrazuul X.X. (2010) tRNA-Phe yeast . https://commons.wikimedia.org/wiki/File:TRNA-Phe_yeast_1ehz.png [ Google Scholar ]

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Biology archive

Course: biology archive   >   unit 15.

  • DNA as the "transforming principle"
  • Hershey and Chase: DNA is the genetic material

Classic experiments: DNA as the genetic material

  • The discovery of the double helix structure of DNA
  • Discovery of the structure of DNA
  • Discovery of DNA

research on dna structure

Introduction

Protein vs. dna, frederick griffith: bacterial transformation.

  • R strain. When grown in a petri dish, the R bacteria formed colonies, or clumps of related bacteria, that had well-defined edges and a rough appearance (hence the abbreviation "R"). The R bacteria were nonvirulent, meaning that they did not cause sickness when injected into a mouse.
  • S strain. S bacteria formed colonies that were rounded and smooth (hence the abbreviation "S"). The smooth appearance was due to a polysaccharide, or sugar-based, coat produced by the bacteria. This coat protected the S bacteria from the mouse immune system, making them virulent (capable of causing disease). Mice injected with live S bacteria developed pneumonia and died.

Avery, McCarty, and MacLeod: Identifying the transforming principle

  • The purified substance gave a negative result in chemical tests known to detect proteins, but a strongly positive result in a chemical test known to detect DNA.
  • The elemental composition of the purified transforming principle closely resembled DNA in its ratio of nitrogen and phosphorous.
  • Protein- and RNA-degrading enzymes had little effect on the transforming principle, but enzymes able to degrade DNA eliminated the transforming activity.

The Hershey-Chase experiments

  • One sample was produced in the presence of 35 S ‍   , a radioactive isotope of sulfur. Sulfur is found in many proteins and is absent from DNA, so only phage proteins were radioactively labeled by this treatment.
  • The other sample was produced in the presence of 32 P ‍   , a radioactive isotope of phosphorous. Phosphorous is found in DNA and not in proteins, so only phage DNA (and not phage proteins) was radioactively labeled by this treatment.

Remaining questions

Attribution:, works cited:.

  • Aldridge, Susan. (2003). The DNA story. In Royal society of chemistry . Retrieved July 27, 2016 from http://www.rsc.org/chemistryworld/Issues/2003/April/story.asp .
  • Avery, O. T., MacLeod, C. M., and McCarty, M. (1944). Studies on the chemical nature of the substance inducing transformation of Pneumococcal types. J. Exp. Med. , 79 (2), 137-158. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2135445/ .
  • Scarc. (2009, July 7). Oswald Avery's Pneumococcus experiments: Forerunner of the DNA story [web log post]. In The Pauling blog. Retrieved from https://paulingblog.wordpress.com/2009/07/07/oswald-averys-pneumococcus-experiments-forerunner-of-the-dna-story/ .
  • Scarc. (2009, August 18). The Hershey-Chase blender experiments [web log post]. In The Pauling blog . Retrieved from https://paulingblog.wordpress.com/2009/08/18/the-hershey-chase-blender-experiments/ .

Additional references:

Want to join the conversation.

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Good Answer

Microbe Notes

Microbe Notes

DNA: Properties, Structure, Composition, Types, Functions

Deoxyribonucleic acid (DNA) is the heredity material found in humans and all living organisms. It is a double-stranded molecule and has a unique twisted helical structure.

DNA is made up of nucleotides, each nucleotide has three components: a backbone made up of a sugar (Deoxyribose) and phosphate group and a nitrogen-containing base attached to the sugar.

Each strand has many nucleotides or says numerous sugar, a phosphate group, and nitrogenous bases. These nitrogenous bases are complementary to the other strand’s nitrogenous base to maintain helical symmetry.

Each base pairs are bonded through Hydrogen bonding. These nitrogenous bases are Adenine (A), Guanine (G), Cytosine (C), and Thymine (T), A is complementary to T, and G to C. These bases are responsible for storing the genetic information. Most DNA is located at the cell nucleus so is called nuclear DNA, however, a small amount of DNA is also located in mitochondria, and so is referred to as mitochondrial DNA.

DNA- Definition, Properties, Structure, Composition, Types, Functions

Table of Contents

Interesting Science Videos

Properties of DNA (Deoxyribonucleic acid)

  • DNA is made up of two helical strands that are coiled around the same axis. If coiled from right it is known as right-handed helices DNA and if coiled from left then it is known as left-handed helices. However, the right-handed helices DNA is the most stable and thus the structure of it is to be referred to as the standard.
  • The two chains of helices run antiparallel to each other. Thus, one strand runs 5’ to 3’ and another strand runs from 3’ to 5’.
  • Both the strands denature on heating and can renature or say hybridize on cooling. However, the temperature on which these strands are separately permanently is referred to as melting temperature and varies according to the specific sequence of DNA.
  • For instance, the region of higher concentration of C-G has a higher melting temperature cause these bases are bonded with three hydrogen bonds, which require more energy to break than the region of higher concentration A-T which are bonded only with two hydrogen bonds.
  • These nitrogenous bases store genetic information and thus encode for amino acids which give rise to proteins.

Structure and Composition of DNA (Deoxyribonucleic acid)

  • DNA is made of two helical chains that intertwine with each other to form a double helix. The most widely accepted structure of DNA is right-handed helix DNA also known as the B-form of DNA, which is 1.9 nm in diameter.
  • These helical chains run anti-parallel to each other, one polynucleotide chain runs from 5’ to 3’ and the other polynucleotide chain runs from 3’ to 5’. These chains are connected to each other via nitrogen bases through hydrogen bonding.
  • Hydrogen bonding contributes to the specificity of base pairing. Adenine preferentially pairs with Thymine through 2 hydrogen bonds. Similarly, Cytosine preferentially pairs with Guanine through 3 hydrogen bonds. 
  • We can even say, that the base pairing happens when Pyrimidines pair with Purines because Pyrimidines refers to the single ring structure of Thymine and Cytosine and Purines refers to double-ring structures, Adenine and Thymine.
  • The base pairs A = T and G ≡ C are known as complementary base pairs. Hence, the amount of Adenine is equal to the amount of Thymine, and the amount of Guanine is equal to the amount of Cytosine.
  • The geometry of the DNA is influenced by the distance between the backbones and the angle at which the nitrogenous bases are attached to the backbone.
  • The major groove occurs when the backbones are far apart from each other and the minor groove occurs when they are close.
  • The regularity of the helical structure forms two repeating and alternating spaces: Major and Minor grooves.
  • These groves act on base-pair recognition and binding sites for protein, the major groove contains base pair specific information while the minor groove is largely base-pair nonspecific, caused by protein interactions in the grooves
  • The double-helical structure of DNA is highly regular, each turn of the helix measures approximately 10 base pairs. In addition to hydrogen bonding in between the bases, the staging of bases also stabilizes the structure, there are pi-pi interactions between staged aromatic rings of the bases.
  • The distance between each turn is 3.4 nm.
  • The major groove is 2.2 nm wide and the minor groove is 1.1 nm wide.

Structure and Composition of DNA

Major and Minor Grooves of the DNA

  • As a result of the double helical nature of DNA, the molecule has two asymmetric grooves. One groove is smaller than the other.
  • This asymmetry is a result of the geometrical configuration of the bonds between the phosphate, sugar, and base groups that forces the base groups to attach at 120 degree angles instead of 180 degree.
  • The larger groove is called the major groove, occurs when the backbones are far apart; while the smaller one is called the minor groove, occurs when they are close together.
  • Since the major and minor grooves expose the edges of the bases, the grooves can be used to tell the base sequence of a specific DNA molecule.
  • The possibility for such recognition is critical, since proteins must be able to recognize specific DNA sequences on which to bind in order for the proper functions of the body and cell to be carried out.

Types of DNA on the basis of forms

  • The major difference between the A and B forms of DNA is the conformation of the deoxyribose sugar ring. For B form, it is in the C2 endoconformation, while in A form it is in the C3 endoconformation.
  • Another important difference between A and B-form is the arrangement or say placement of nitrogenous base pairs within the duplex. 
  • In B-form, the base pairs are almost in the center over the helical axis, whereas in A-form, the base pairs are diverted away from the central axis towards the major groove.
  • The distance between two base pairs is 0.29 nm. One turn of the helix contains 11 base pairs with a length of 2.8 nm
  • Shorter than B-form of DNA. However, the helical width is 2.3 nm which is more than B-form.
  • Narrow and deep major groove and wide and shallow minor groove.
  • This form of DNA is favored by low hydration and by repeating units of purines or pyrimidines.

Types of DNA on the basis of forms

Figure: Types of DNA on the basis of forms. Image Source: Mauroesguerroto .

  • The standard structure of DNA that is commonly known, was described by Watson and Crick and is a right-handed double helix.
  • The double-helical chains run antiparallel to each other, one running from 5’ to 3’ and another running from 3’ to 5’ and are joined together via complementary nitrogenous base pairing.
  • Based upon Chargaff’s rules, bases coherent with another, only when one purine of one strand pairs with one pyrimidine of another strand. A with T and G with C
  • The pair formed is a keto base pair, with an amino base, a purine with a pyrimidine.
  • The two strands of the DNA molecule are plectonemic coil meaning that these two strands are coiled around the same axis and are intertwined with each other.
  • The consequence of this plectonemic coil is that these two strands can’t be separated without the DNA rotating.
  • The distance between the base pairs is 0.34 nm. One turn of the helix contains 10 base pairs with a length of 3.4 nm.
  • This form of DNA is 1.9 nm in diameter, which means the width of the helix is 1.9 nm.
  • The wide and shallow major groove of 2.2 nm, making it easily assessable to proteins, and narrow and minor groove of 1.1 nm.
  • It is a left-handed helix and is a very different structure when compared with the A and B-form.
  • This form of DNA can form when the DNA is in alternating purines-pyrimidines sequences.
  • The backbone is not a smooth helix but an irregular zig-zag, which is resulted from alternating sequences of purines and pyrimidines.
  • The B form DNA can take the Z form when proteins are bound to DNA in one helical conformation and force the DNA to adopt a different conformation.
  • This adoption happens at the G nucleotide, the sugar in this form is of C3 endoconformation and the guanine base is in the synconformation.
  • The result of which places the guanine back over the sugar ring, which is unusual than the B and A form.
  • It is long and thin than the B and A forms.
  • The helical width is 1.8 nm, being the smallest among the three forms.
  • The distance between the base pairs is 0.37 nm. One turn of the helix contains 12 base pairs with a length of 4.56 nm.
  • The major groove is flat and the minor groove is narrow and deep.

Types of DNA on the basis of location

1. nuclear dna.

  • As the name suggests, these DNAs are located inside the nucleus organized in the chromosome.
  • These chromosomes are 43 pairs in humans and are linear with open ends and contain 3 billion nucleotides.
  • Nuclear DNA houses genes that are transcribed into mRNA and ultimately translated to proteins, that are necessary for the functioning and maintaining the integrity of the cell.
  • It is inherited from both parents, so this is diploid and considered unique to each individual except for identical twins.
  • It is usually present in two copy numbers per cell

2. Mitochondrial DNA

  • It is located inside the mitochondria .
  • It is small and circular in structure
  • It is inherited only from the mother, so is a haploid.
  • It is present in a much higher copy number. i.e., 100-10,000 per cell.
  • It has only 16,500 base pairs and encodes proteins that are specific for mitochondria. These proteins are vital for producing energy.
  • Mitochondrial DNA encoded proteins also play a pivotal role during intracellular protein transport.

Functions of DNA (Deoxyribonucleic acid)

  • DNA stores complete genetic information that requires to specify an organism.
  • It is the source of information that is needed in order to synthesize cellular proteins, and other macromolecules required by an organism.
  • It is responsible for identifying and determining the individuality of the given organism.
  • It can also be taken as a targeted element during the diagnosis of a particular disease. 
  • It can replicate to give rise to two daughter cells and transfer one copy to the daughter cells during cell division. Thus, maintaining the genetic materials from generation to generation.
  • https://bio.libretexts.org/Bookshelves/Genetics/Book%3A_Working_with_Molecular_Genetics_(Hardison)/Unit_I%3A_Genes_Nucleic_Acids_Genomes_and_Chromosomes/2%3A_Structures_of_Nucleic_Acids/2.5%3A_B-Form_A-Form_and_Z-Form_of_DNA – 5%
  • https://www.sciencedirect.com/topics/neuroscience/nuclear-dna- 1%
  • https://lauterdar.com/sitefinity/docs/default-source/biotech-basics/dna-replication34a-4599bff.pdf?sfvrsn=78563407_6- 1%
  • https://www.coursehero.com/file/p1hc1f5e/Major-groove-forms-when-the-two-backbones-are-far-apart-from-each-other-Minor/- <1%
  • http://atlasgeneticsoncology.org/Educ/DNAEngID30001ES.html- <1%
  • https://medlineplus.gov/genetics/understanding/basics/dna/- <1%
  • https://www.coursehero.com/file/42035440/Module-3-docx/- <1%
  • https://tardigrade.in/question/the-length-of-one-turn-of-the-helix-in-a-b-form-dna-is-approximately-5sff2neg- <1%
  • https://www.genome.gov/genetics-glossary/Mitochondrial-DNA- <1%
  • https://www.visiblebody.com/learn/biology/dna-chromosomes/dna-structure- <1%
  • https://opentextbc.ca/biology/chapter/9-1-the-structure-of-dna/- <1%
  • https://worldtoday00.blogspot.com/2016/01/dna.html- <1%
  • http://www.tulane.edu/~biochem/nolan/lectures/rna/DNAstruc2001.htm
  • http://people.bu.edu/mfk/restricted566/dnastructure.pdf
  • http://eagri.org/eagri50/GBPR111/lec15.pdf
  • https://www.ancestry.com/lp/dna-function
  • http://www.bch.cuhk.edu.hk/vr_biomolecules/different-form-of-dna.html
  • https://www.britannica.com/science/heredity-genetics/Structure-and-composition-of-DNA

About Author

Photo of author

Rajat Thapa

2 thoughts on “DNA: Properties, Structure, Composition, Types, Functions”

thanks it will help me in my project work

The Double Helix is wrong: there is no mutual twisting of the two strands — they run in parallel (of course they are antiparallel). I proposed a different model, it is described in the article, I hope it will be interesting for you:: https://www.researchgate.net/publication/339106477_DNA_the_Double_Helix_or_the_Ribbon_Helix The original (in Russian, 1999) https://www.researchgate.net/publication/362430547_Dvojnaa_spiral_ili_lenta-spiral

Leave a Comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed .

  • Biology Article
  • Dna Structure

DNA: Structure, Function and Discovery

Nucleic acids are the organic materials present in all organisms in the form of DNA or RNA. These nucleic acids are formed by the combination of nitrogenous bases, sugar molecules and phosphate groups that are linked by different bonds in a series of sequences. The DNA structure defines the basic genetic makeup of our body. In fact, it defines the genetic makeup of nearly all life on earth.

Table of Contents

What is DNA?

Dna structure, chargaff’s rule, dna replication.

  • Function of DNA

Why DNA is called a Polynucleotide Molecule?

Read on to explore DNA meaning, structure, function, DNA discovery and diagram in complete detail.

“DNA is a group of molecules that is responsible for carrying and transmitting the hereditary materials or the genetic instructions from parents to offsprings.”

This is also true for viruses, as most of these entities have either RNA or DNA as their genetic material . For instance, some viruses may have RNA as their genetic material, while others have DNA as the genetic material. The Human Immunodeficiency Virus (HIV) contains RNA, which is then converted into DNA after attaching itself to the host cell.

Apart from being responsible for the inheritance of genetic information in all living beings, DNA also plays a crucial role in the production of proteins. Nuclear DNA is the DNA contained within the nucleus of every cell in a eukaryotic organism. It codes for the majority of the organism’s genomes while the mitochondrial DNA and plastid DNA handles the rest.

The DNA present in the mitochondria of the cell is termed mitochondrial DNA. It is inherited from the mother to the child. In humans, there are approximately 16,000 base pairs of mitochondrial DNA. Similarly, plastids have their own DNA, and they play an essential role in photosynthesis.

Also Read:  Difference between gene and DNA

Full-Form of DNA

DNA is known as Deoxyribonucleic Acid. It  is an organic compound that has a unique molecular structure. It is found in all prokaryotic cells and eukaryotic cells . 

There are three different DNA types:

  • A-DNA:  It is a right-handed double helix similar to the B-DNA form. Dehydrated DNA takes an A form that protects the DNA during extreme conditions such as desiccation. Protein binding also removes the solvent from DNA, and the DNA takes an A form.
  • B-DNA:  This is the most common DNA conformation and is a right-handed helix. The majority of DNA has a B type conformation under normal physiological conditions.
  • Z-DNA:  Z-DNA is a left-handed DNA where the double helix winds to the left in a zig-zag pattern. It was discovered by Andres Wang and Alexander Rich. It is found ahead of the start site of a gene and hence, is believed to play some role in gene regulation.

Who Discovered DNA?

DNA was first recognized and identified by the Swiss biologist  Johannes Friedrich Miescher in 1869 during his research on white blood cells .

The double helix structure of a DNA molecule was later discovered through the experimental data by James Watson and Francis Crick. Finally, it was proved that DNA is responsible for storing genetic information in living organisms.

Also Read:  Difference between deoxyribose and ribose

DNA Diagram

The following diagram explains the DNA structure representing the different parts of the DNA. DNA comprises a sugar-phosphate backbone and the nucleotide bases (guanine, cytosine, adenine and thymine).

Structure of DNA

DNA Diagram representing the DNA Structure

Read more: Properties of DNA

The DNA structure can be thought of as a twisted ladder. This structure is described as a double-helix, as illustrated in the figure above. It is a nucleic acid, and all nucleic acids are made up of nucleotides.  The DNA molecule is composed of units called nucleotides, and each nucleotide is composed of three different components such as sugar, phosphate groups and nitrogen bases. 

The basic building blocks of DNA are nucleotides, which are composed of a sugar group, a phosphate group, and a nitrogen base. The sugar and phosphate groups link the nucleotides together to form each strand of DNA. Adenine (A), Thymine (T), Guanine (G)  and Cytosine (C) are four types of nitrogen bases.

These 4 Nitrogenous bases pair together in the following way: A  with  T, and C  with G . These base pairs are essential for the DNA’s double helix structure, which resembles a twisted ladder.

The order of the nitrogenous bases determines the genetic code or the DNA’s instructions.

DNA Structure

Components of DNA Structure

Among the three components of DNA structure, sugar is the one which forms the backbone of the DNA molecule. It is also called deoxyribose. The nitrogenous bases of the opposite strands form hydrogen bonds, forming a ladder-like structure.

Structure of DNA

DNA Structure Backbone

The DNA molecule consists of 4 nitrogen bases, namely adenine (A), thymine (T), cytosine (C) and Guanine (G), which ultimately form the structure of a nucleotide. The A and G are purines, and the C and T are pyrimidines.

The two strands of DNA run in opposite directions. These strands are held together by the hydrogen bond that is present between the two complementary bases. The strands are helically twisted, where each strand forms a right-handed coil, and ten nucleotides make up a single turn.

The pitch of each helix is 3.4 nm. Hence, the distance between two consecutive base pairs (i.e., hydrogen-bonded bases of the opposite strands) is 0.34 nm.

Structure of DNA

The DNA coils up, forming chromosomes , and each chromosome has a single molecule of DNA in it. Overall, human beings have around twenty-three pairs of chromosomes in the nucleus of cells. DNA also plays an essential role in the process of cell division.

Also Read:   DNA Packaging

Recommended Video:

research on dna structure

Erwin Chargaff , a biochemist, discovered that the number of nitrogenous bases in the DNA  was present in equal quantities. The amount of A is equal to T, whereas the amount of C is equal to G.

In other words, the DNA of any cell from any organism should have a 1:1 ratio of purine and pyrimidine bases.

DNA replication is an important process that occurs during cell division. It is also known as semi-conservative replication, during which DNA makes a copy of itself.

DNA Replication

DNA replication takes place in three stages:

Step 1: Initiation

The replication of DNA begins at a point known as the origin of replication. The two DNA strands are separated by the DNA helicase. This forms the replication fork.

Step 2: Elongation

DNA polymerase III reads the nucleotides on the template strand and makes a new strand by adding complementary nucleotides one after the other. For eg., if it reads an Adenine on the template strand, it will add a Thymine on the complementary strand.

While adding nucleotides to the lagging strand, gaps are formed between the strands. These gaps are known as Okazaki fragments. These gaps or nicks are sealed by ligase.

Step 3: Termination

The termination sequence present opposite to the origin of replication terminates the replication process. The TUS protein (terminus utilization substance) binds to terminator sequence and halts DNA polymerase movement. It induces termination.

Also Read:  DNA Replication

DNA Function

DNA is the genetic material which car­ries all the hereditary information. Genes are the small segments of DNA, consisting mostly of 250 – 2 million base pairs. A gene code for a polypeptide molecule, where three nitrogenous bases sequence stands for one amino acid.

Polypeptide chains are further folded in secondary, tertiary and quaternary structures to form different proteins. As every organism contains many genes in its DNA, different types of proteins can be formed. Proteins are the main functional and structural molecules in most organisms. Apart from storing genetic information, DNA is involved in:

  • Replication process: Transferring the genetic information from one cell to its daughters and from one generation to the next and equal distribution of DNA during the cell division
  • Mutations: The changes which occur in the DNA sequences
  • Transcription
  • Cellular Metabolism
  • DNA Fingerprinting
  • Gene Therapy

Also Read:  r-factor

The DNA is called a polynucleotide because the DNA molecule is composed of nucleotides – deoxyadenylate (A) deoxyguanylate (G) deoxycytidylate (C)  and deoxythymidylate (T), which are combined to create long chains called a polynucleotide. As per the DNA structure, the DNA consists of two chains of polynucleotides.

Also Read:  Genetic Material

For more detailed information on DNA meaning, diagram, its types, DNA structure and function, or any other related topics, explore at  BYJU’S Biology.

Explore more

  • Difference between Replication and Transcription
  • DNA Cloning
  • DNA As Genetic Material
  • DNA Structure and Polynucleotide
  • How is DNA inherited from each parent?
  • Do you get more DNA from your mother or father?

Frequently Asked Questions

What is the structure of dna.

DNA is a double helical structure composed of nucleotides. The two helices are joined together by hydrogen bonds. The DNA also bears a sugar-phosphate backbone.

What are the three different types of DNA?

The three different types of DNA include:

How is Z-DNA different from other forms of DNA?

Z-DNA is a left-handed double helix. The helix winds to the left in a zig-zag manner. On the contrary, A and B-DNA are right-handed DNA.

What are the functions of DNA?

The functions of DNA include:

  • Replication
  • Gene expression

What type of DNA is found in humans?

B-DNA is found in humans. It is a right-handed double-helical structure.

Quiz Image

Put your understanding of this concept to test by answering a few MCQs. Click ‘Start Quiz’ to begin!

Select the correct answer and click on the “Finish” button Check your score and answers at the end of the quiz

Visit BYJU’S for all Biology related queries and study materials

Your result is as below

Request OTP on Voice Call

BIOLOGY Related Links

Leave a Comment Cancel reply

Your Mobile number and Email id will not be published. Required fields are marked *

Post My Comment

research on dna structure

This was a very precise and wonderful notes in an understandable way of the wanted part which helped a lot Thanks!

very informative. very helpful.

Very summarized form have been highlighted. I am benifitted.

I like the teaching on this website

Very good and informative answer. Thank you

infact am greatfull with this program and the site and it is helpful learning program

well explained

Information given here is very accurate precise and easily understandable helpful to all those who are new to science too.

NICE CONTENT KEEP IT UP

research on dna structure

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

  • Share full article

A close-up view of a mammoth foot standing up in a muddy area on the ground.

A Mammoth First: 52,000-Year-Old DNA, in 3-D

A “fossil chromosome” preserves the structure of a woolly mammoth’s genome — and offers a better grasp of how it once worked.

The foot of a woolly mammoth excavated from the permafrost in Siberia. In a new study, scientists extracted mammoth DNA that retained its original architecture, a feat never before achieved with an ancient genetic sample. Credit... Love Dalén/Stockholm University

Supported by

Siobhan Roberts

By Siobhan Roberts

  • July 11, 2024

In 2018 an international team of scientists — from labs in Houston, Copenhagen, Barcelona and beyond — got their hands on a remarkable biological specimen: a skin sample from a 52,000-year-old woolly mammoth that had been recovered from the permafrost in Siberia. They probed the sample with an innovative experimental technique that revealed the three-dimensional architecture of the mammoth’s genome. The resulting paper was published on Thursday in the journal Cell.

Hendrik Poinar, an evolutionary geneticist at McMaster University in Canada, was “floored” — the technique had successfully captured the original geometry of long stretches of DNA, a feat never before accomplished with an ancient DNA sample. “It’s absolutely beautiful,” said Dr. Poinar, who reviewed the paper for the journal.

The typical method for extracting ancient DNA from fossils, Dr. Poinar said, is still “kind of cave man.” It produces short fragments of code composed of a four-letter molecular alphabet: A (adenine), G (guanine), C (cytosine), T (thymine). An organism’s full genome resides in cell nuclei, in long, unfragmented DNA strands called chromosomes. And, vitally, the genome is three-dimensional; as it dynamically folds with fractal complexity, its looping points of contact help dictate gene activity.

“To have the actual architectural structure of the genome, which suggests gene expression patterns, that’s a whole other level,” Dr. Poinar said.

“It’s a new kind of fossil, a fossil chromosome,” said Erez Lieberman Aiden , a team member who is an applied mathematician, a biophysicist and a geneticist and directs the Center for Genome Architecture at Baylor College of Medicine in Houston. Technically, he noted, it is a non-mineralized fossil, or subfossil, since it has not turned to stone.

We are having trouble retrieving the article content.

Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit and  log into  your Times account, or  subscribe  for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber?  Log in .

Want all of The Times?  Subscribe .

Advertisement

share this!

July 19, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

Study deciphers intricate 3D structure of DNA aptamer for disease theranostics

by Liu Jia, Chinese Academy of Sciences

DNA

In a study published in PNAS , a research team has resolved the first high-resolution structure of the sgc8c DNA aptamer that targets protein tyrosine kinase 7 (PTK7), engineered two optimal sgc8c variants for disease theranostics efficiently, and revealed new principles for the sophisticated structural and functional organization of DNA molecules.

Aptamers are functional nucleic acids that have broad applications in clinical diagnosis and targeted drug delivery . The high binding affinity and specificity of an aptamer for its protein target depend on its intricate three-dimensional (3D) structure.

The 3D structure of an aptamer in complex with its protein partner helps to understand and optimize its functionality. However, the complex structure is difficult to be obtained due to the conformational heterogeneity of the aptamer and/or protein, and the 3D structures of DNA molecules, which are perceived to lack RNA-like tertiary interactions, remain largely unexplored.

Sgc8c is a 41-nt DNA aptamer screened through cell-SELEX to target leukemia cells. The molecular target of sgc8c is PTK7, a transmembrane receptor pseudokinase that is overexpressed in various types of cancers.

Owing to its high binding affinity and specificity for both protein and cell targets, sgc8c has become one of the most widely used DNA aptamers in cancer theranostics. However, the structural basis underlying the functionality of sgc8c remains elusive, and the structure-guided functional understanding and optimization of sgc8c are needed.

In this study, the researchers led by Prof. Tan Weihong, Prof. Han Da, and Assoc. Prof. Guo Pei from the Hangzhou Institute of Medicine (HIM) of the Chinese Academy of Sciences (CAS), first probed 10 Watson–Crick base pairs in sgc8c using solution nuclear magnetic resonance (NMR), and identified three paired regions including P1, P2 and P3.

They then confirmed that nucleotides from P2 constituted the key binding element by using NMR chemical shift perturbations (CSPs) and site-directed mutagenesis assays.

After consolidating that binding to PTK7 did not perturb the original 3D fold of sgc8c, the researchers determined the solution NMR structure of sgc8c, and elucidated an intricate three-way junction fold stabilized by long–range hydrogen bonding and extensive base–base stacking interactions.

Several tertiary interactions, commonly observed in RNA but rarely found in DNA molecules, are crucial to maintain the structure and function of sgc8c. Most intriguingly, sgc8c can recruit more than ten nucleotides from distinct regions and assemble them into its key structural and functional framework.

Guided by the well-established structural and functional relationship, the researchers efficiently engineered two optimal sgc8c variants that exhibit simultaneously enhanced thermostability, biostability, and binding affinity to both protein and cell targets, providing new avenues for diverse aptamer -based biomedical applications.

This work develops a streamlined NMR-based approach to overcome challenges in understanding and optimizing the function of DNA aptamers that target membrane proteins , and highlights the pivotal role of tertiary interactions in shaping the intricate structure and sophisticated function of DNA molecules.

Journal information: Proceedings of the National Academy of Sciences

Provided by Chinese Academy of Sciences

Explore further

Feedback to editors

research on dna structure

New genetic test can help eliminate a form of inherited blindness in dogs

7 hours ago

research on dna structure

Saturday Citations: Scientists study monkey faces and cat bellies; another intermediate black hole in the Milky Way

Jul 20, 2024

research on dna structure

Researchers zero in on the underlying mechanism that causes alloys to crack when exposed to hydrogen-rich environments

Jul 19, 2024

research on dna structure

International study highlights large and unequal life expectancy declines in India during COVID-19

research on dna structure

Global study demonstrates benefit of marine protected areas to recreational fisheries

research on dna structure

Killifish can adjust their egg-laying habits in response to predators, study shows

research on dna structure

Enhanced information in national policies can accelerate Africa's efforts to track climate adaptation

research on dna structure

Innovative microscopy reveals amyloid architecture, may give insights into neurodegenerative disease

research on dna structure

Gold co-catalyst improves photocatalytic degradation of micropollutants, finds study

research on dna structure

How mantle hydration changes over the lifetime of a subduction zone

Relevant physicsforums posts, the predictive brain (stimulus-specific error prediction neurons).

18 hours ago

Contradictory statements made by two different professors

21 hours ago

The Cass Report (UK)

Understanding covid quarantine guidance, innovative ideas and technologies to help folks with disabilities.

Jul 18, 2024

New and Interesting Publications Relevant to the Origin of Life

Jul 15, 2024

More from Biology and Medical

Related Stories

research on dna structure

Development of novel aptamers unlocks opportunities for the treatment of cancers and neurological diseases

Mar 21, 2024

research on dna structure

Researchers devise a better way to build aptamers

Jul 6, 2023

research on dna structure

Researchers identify structural characteristics of newly emerged SARS-CoV-2 variants

research on dna structure

Ointment containing DNA molecules can combat allergic contact dermatitis

Jul 5, 2024

research on dna structure

Novel tool targets unusual RNA structures for potential therapeutic applications

May 11, 2022

research on dna structure

Scientists elucidate substrate recognition and proton coupling mechanism of transporter protein VMAT2

May 28, 2024

Recommended for you

research on dna structure

Genome study informs restoration of American chestnut tree

research on dna structure

Rhythmic gene expression in plants is crucial for symbiosis with nutrient-providing bacteria, study finds

research on dna structure

MicroRNA study sets stage for crop improvements

research on dna structure

Genetics reveal ancient trade routes of Four Corners potato

research on dna structure

Dynamic view of opioid receptor could refine pain relief

Let us know if there is a problem with our content.

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

  • Australian research discovers DNA structure's role in forming memories

CANBERRA, July 19 (Xinhua) -- A specific type of DNA structure could be key in regulating how the brain forms memories, Australian research has found.

In a study published on Friday, international researchers led by a team from the Australian National University (ANU) discovered that G-quadruplex DNA (G4-DNA) plays a role in transcribing memories.

G4-DNA is generally found in cells when DNA sequences fold into a different, four-stranded structure. It is usually associated with DNA damage and is frequently observed in cancer cells.

Paul Marshall, lead author of the study from the ANU College of Health and Medicine, said that G4-DNA's involvement in stalling the basic functions of some cells had previously been discovered but that the new findings were the first evidence of its role in making new memories.

The study found that the accumulation of G4-DNA in neurons in the brain is required for the activation and silencing of genes that are critically involved in learning and memory.

"We found that casually manipulating G4-DNA can lead to a substantial impairment in memory," Marshall said in a media release.

"But in other scenarios it can result in increased transcription. It can have different effects on memory depending on the area of the brain, and type of memory involved.

The research was conducted on living cells from mice in collaboration with scientists from the University of Queensland, Linköping University in Sweden, the Weizmann Institute of Science in Israel, and the University of California. Enditem

Go to Forum >> 0 Comment(s)

Add your comments....

  • User Name Required
  • Your Comment

Australian research discovers DNA structure's role in forming memories

Source: Xinhua

Editor: huaxia

2024-07-19 13:53:16

research on dna structure

CANBERRA, July 19 (Xinhua) -- A specific type of DNA structure could be key in regulating how the brain forms memories, Australian research has found.

In a study published on Friday, international researchers led by a team from the Australian National University (ANU) discovered that G-quadruplex DNA (G4-DNA) plays a role in transcribing memories.

G4-DNA is generally found in cells when DNA sequences fold into a different, four-stranded structure. It is usually associated with DNA damage and is frequently observed in cancer cells.

Paul Marshall, lead author of the study from the ANU College of Health and Medicine, said that G4-DNA's involvement in stalling the basic functions of some cells had previously been discovered but that the new findings were the first evidence of its role in making new memories.

The study found that the accumulation of G4-DNA in neurons in the brain is required for the activation and silencing of genes that are critically involved in learning and memory.

"We found that casually manipulating G4-DNA can lead to a substantial impairment in memory," Marshall said in a media release.

"But in other scenarios it can result in increased transcription. It can have different effects on memory depending on the area of the brain, and type of memory involved.

The research was conducted on living cells from mice in collaboration with scientists from the University of Queensland, Linköping University in Sweden, the Weizmann Institute of Science in Israel, and the University of California. ■

research on dna structure

Nursing aide turned sniper: Thomas Crooks' mysterious plot to kill Trump

research on dna structure

BUTLER, Pa. – Donald Trump and would-be assassin Thomas Crooks started on their violent collision course long before the former president's political rally ended in gunshots and death.

Crooks, 20, was a one-time registered Republican, a nursing home worker with no criminal record, shy in school, and living in a decent middle-class neighborhood in suburban Pennsylvania with his parents. Trump, 78, was eyeing Crooks' state as a key battleground – but not in the way that anyone envisioned on Saturday.

Riding high on polls showing that he's got a strong chance of toppling President Joe Biden, the former president had been campaigning for reelection in swing states, and Pennsylvania is a key prize. Trump won the state in 2016 but lost it four years later.

And on July 3, Trump's campaign announced he would hold a rally at the Butler Farm Show grounds, about 30 miles north of Pittsburgh.

"Pennsylvania has been ravaged by monumental surges in violent crime as a direct result of Biden’s and Democrats’ pro-criminal policies," Trump's campaign said in announcing the event, noting that when he's elected, he'll "re-establish law and order in Pennsylvania!"

The Saturday attack on Trump turned the heated rhetoric of the 2024 presidential campaign freshly violent. Authorities said bullets fired from Crooks' AR-15 style rifle about 150 yards away grazed Trump's ear, killed a rally attendee as he dove to protect his family, and critically wounded two others. Secret Service agents killed Crooks moments later.

Attack planned well in advance

Investigators are still seeking Crooks' motive – despite his Republican leanings, he had donated recently to a progressive voter-turnout campaign in 2021 – but indicated he'd planned the attack well in advance.

The shooting marks the first assassination attempt against a former or current U.S. president since President Ronald Reagan was injured in a March 1981 shooting at a Washington, D.C., hotel. 

There are many questions about why Crooks turned into a would-be presidential assassin, firing indiscriminately into hordes of political supporters.

FBI special agent Kevin Rojek said on a call with media that law enforcement located "a suspicious device" when they searched Crooks' vehicle and that it's being analyzed at the FBI crime lab.

"As far as the actions of the shooter immediately prior to the event and any interaction that he may have had with law enforcement, we're still trying to flesh out those details now," Rojek said.

None of Crooks' shocked neighbors or high school classmates described him as violent or that he in any way signaled he was intent on harming Trump. Sunday morning, reporters and curious locals swarmed the leafy streets of the home where Crooks lived with his parents in Bethel Park, about 50 miles from the shooting scene.

Those who knew him described a quiet young man who often walked to work at a nearby nursing home. One classmate said he was bullied and often ate alone in high school.

Sunday morning, neighbor Cathy Caplan, 45, extended her morning walk about a quarter mile to glimpse what was happening outside Crooks’ home.“It came on the morning news and I was like ‘I know that street,’” said Caplan, who works for the local school district. "It feels like something out of a movie.”

Dietary aide turned deadly killer

Authorities say they are examining Crooks' phone, social media and online activity for motivation. They said he carried no identification and his body had to be identified via DNA and biometric confirmation.

Although no possible motive has yet been released, Crooks nevertheless embodies the achingly familiar profile of an American mass shooter: a young white man, isolated from peers and armed with a high-powered rifle. His attack was one of at least 59 shootings in the United States on Saturday, according to the Gun Violence Archive.

According to records and online posts of the ceremony, Crooks graduated from Bethel Park High School, about 42 miles from Butler County, on June 3, 2022. That same day, Trump met briefly with investigators at his Mar-a-Lago club in Florida as they examined whether he improperly took classified documents with him when he left the White House.

A classmate remembered Crooks as a frequent target of bullies. Kids picked on him for wearing camouflage to class and his quiet demeanor, Jason Kohler, 21, said. Crooks usually ate lunch alone, Kohler said.

Crooks worked as a dietary aide at the Bethel Park Skilled Nursing and Rehabilitation, less than a mile from his home. In a statement provided to USA TODAY on Sunday, Marcie Grimm, the facility's administrator, said she was "shocked and saddened to learn of his involvement."

Neighbor Dean Sierka, 52, has known Crooks and his parents for years. The families live a few doors apart on a winding suburban street, and Sierka’s daughter, who attended elementary, middle and high school with Crooks, remembers him as quiet and shy. Sierka said they saw Crooks at least once a week, often when he was walking to the nursing home from his parents' three-bedroom brick house.

"You wouldn’t have expected this," Sierka said. "The parents and the family are all really nice people."

"It's crazy," he added.

Secret Service role: Did they do enough?

Founded in 1865, the Secret Service is supposed to stop this kind of attack, and dozens of agents were present Saturday. As the former president and presumptive Republican presidential nominee, Trump's public appearances are managed by the Secret Service, which works with local law enforcement to develop security plans and crowd-management protocols.

In the days before the event, the agency's experts would have scouted the location, identified security vulnerabilities, and designed a perimeter to keep Trump and rally attendees safe. Congress and the Secret Service are now investigating how Crooks was able to get so close to the former president, and several witnesses reported seeing him in the area with the gun before Trump took the stage.

As the event doors opened at 1 p.m., the temperature was already pushing close to 90, and ticketed attendees oozed through metal detectors run by members of the Secret Service's uniformed division. Similar to airport security screenings, rallygoers emptied their pockets to prove they weren't carrying guns or other weapons.

Media reports indicate the Secret Service had in place, as usual, a counter-sniper team scanning the surrounding area for threats.

In an exclusive interview, former Secret Service Director Julia Pierson told USA TODAY that maintaining such a sniper security perimeter is part of the agency's responsibility for safeguarding protectees like Trump from harm. She said agents typically consider 1,000 yards to be the minimum safe distance for sniper attacks.

The Secret Service has confirmed that it is investigating how Crooks got so close to Trump, who took the stage shortly after 6 p.m. Officials say Crooks' rifle was legally obtained but have not yet released specifics.

Outside the venue at that time, Greg Smith says he tried desperately to get the attention of police. He told the BBC that he and his friends saw a man crawling along a roof overlooking the rally. Other witnesses said they also saw a man atop the American Glass Research building outside the official event security perimeter, well within the range of a 5.56 rifle bullet.

"We noticed the guy bear-crawling up the roof of the building beside us, 50 feet away from us," Smith told the BBC. "He had a rifle, we could clearly see him with a rifle."

Smith told the BBC that the Secret Service eventually saw him and his friends pointing at the man on the roof.

"I'm thinking to myself, why is Trump still speaking, why have they not pulled him off the stage?" Smith said. "Next thing you know, five shots rang out."

From his nearby deck, Trump supporter Pat English watched as the former president took the stage to Lee Greenwood's "God Bless the U.S.A.," and attendees raised their cell phones to record.

English had taken his grandson to see the rally earlier but left when it got too hot. From his deck, they listened as Trump began speaking at 6:05 p.m., backed by a crown of red-hatted MAGA supporters waving "fire Joe Biden" signs.

And then gunfire began.

Boom, boom, boom

"I heard a 'boom, boom, boom' and then screams,” English said Sunday. "I could see people running and the police run in."

Trump was saying the word "happened" as the first pop rang out. He reached up to grab his ear as two more shots echoed, and the crowd behind him – and Trump himself – ducked. Plainclothes Secret Service agents piled atop the president as a fusillade of shots rang out, apparently the Secret Service killing Crooks.

The crowd screamed, and the venue's sound system picked up the agents atop Trump planning to move the former president to safety. One yelled, "shooter's down. Let's move, let's move."

The agents then helped Trump back to his feet as they shielded him on all sides.

The sound system then picked up Trump's voice: "Wait, wait," he said, before turning to the audience and triumphantly raising his fist to yell "fight, fight" as the crowd cheered, blood streaming down his face.

By 6:14 p.m. Trump's motorcade was racing from the scene, and in a later statement, Trump's campaign said he was checked out at a local medical facility.

"I was shot with a bullet that pierced the upper part of my right ear," Trump said in a statement. "I knew immediately that something was wrong in that I heard a whizzing sound, shots, and immediately felt the bullet ripping through the skin. Much bleeding took place, so I realized then what was happening."

Firefighter 'hero' gunned down

Outside of the Butler Township Administration Office Sunday afternoon, Pennsylvania Gov. Josh Shapiro identified the rally attendee killed by Crooks as Corey Comperatore, a firefighter, father of two and longtime Trump supporter.

“Corey died a hero,” Shapiro said. “Corey dove on his family to protect them last night at this rally. Corey was the very best of us. May his memory be a blessing.”

Two other Pennsylvanians are still undergoing treatment for their injuries, Shapiro said.

Pennsylvania State Police identified two wounded attendees David Dutch, 57, of New Kensington, and James Copenhaver, 74, of Moon Township. Both are hospitalized and listed in stable condition. Shapiro said he spoke with the family of one victim and received a message from the other.

Biden spoke briefly with Trump on Saturday night, and the president condemned the assassination attempt as “sick.” He said there’s no place for political violence in the U.S. and called on Americans to unite together to condemn it.

But earlier in the week, Biden told campaign donors in a private phone call it was time to stop talking about his own disastrous presidential debate performance and start targeting Trump instead.

"I have one job and that's to beat Donald Trump," Biden said. "We're done talking about the (June 27) debate. It's time to put Trump in the bullseye."

Republicans across the country have used similar language to attack their opponents over the years, and political scientists say violent rhetoric used worldwide almost invariably leads to physical violence.

On Sunday, someone parked a truck-mounted electronic billboard at the gates to the Butler Farm Show grounds reading "Democrats attempted assassination," along with a picture of Trump clutching an American flag, his face overlaid with a bullseye crosshairs.

Authorities say they have not yet determined a motive for Crooks' attack. But in a statement, Trump declared the shooting an act of evil and thanked God for preventing the unthinkable.

"We will fear not, but instead remain resilient in our faith and defiant in the face of wickedness," Trump said.

And he said he'd be back on the campaign trail for the Republican National Convention in Milwaukee, which starts Monday.

"Based on yesterday’s terrible events, I was going to delay my trip to Wisconsin, and the Republican National Convention, by two days," Trump said on his Truth Social account Sunday, "but have just decided that I cannot allow a 'shooter,' or potential assassin, to force change to scheduling, or anything else."

Contributing: David Jackson, Aysha Bagchi, Christopher Cann, Bryce Buyakie, Emily Le Coz, Josh Meyer, USA TODAY Network

How the assassination attempt unfolded : Graphics, maps, audio analysis show what happened

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 July 2024

A systematic search for RNA structural switches across the human transcriptome

  • Matvei Khoroshkin 1 , 2 , 3 , 4 ,
  • Daniel Asarnow 1   nAff12 ,
  • Shaopu Zhou 1 , 2 , 3 , 4 ,
  • Albertas Navickas   ORCID: orcid.org/0000-0003-0016-2643 1 , 2 , 3 , 4   nAff13 ,
  • Aidan Winters 2 , 3 , 4 , 5 , 6 , 7 ,
  • Jackson Goudreau 1 , 2 , 3 , 4 ,
  • Simon K. Zhou   ORCID: orcid.org/0000-0001-8695-7943 8 , 9 ,
  • Johnny Yu 1 , 2 , 3 , 4 ,
  • Christina Palka 10 ,
  • Lisa Fish 1 , 2 , 3 , 4 ,
  • Ashir Borah 1 , 2 , 3 , 4 ,
  • Kian Yousefi 1 , 2 , 3 , 4 ,
  • Christopher Carpenter 1 , 2 , 3 , 4 ,
  • K. Mark Ansel   ORCID: orcid.org/0000-0003-4840-9879 8 , 9 ,
  • Yifan Cheng   ORCID: orcid.org/0000-0001-9535-0369 1 , 11 ,
  • Luke A. Gilbert   ORCID: orcid.org/0000-0001-5854-0825 2 , 3 , 6 , 7 &
  • Hani Goodarzi   ORCID: orcid.org/0000-0002-9648-8949 1 , 2 , 3 , 4 , 7  

Nature Methods ( 2024 ) Cite this article

6858 Accesses

36 Altmetric

Metrics details

  • Computational platforms and environments
  • Riboswitches
  • Systems biology

RNA structural switches are key regulators of gene expression in bacteria, but their characterization in Metazoa remains limited. Here, we present SwitchSeeker, a comprehensive computational and experimental approach for systematic identification of functional RNA structural switches. We applied SwitchSeeker to the human transcriptome and identified 245 putative RNA switches. To validate our approach, we characterized a previously unknown RNA switch in the 3ʹ untranslated region of the RORC (RAR-related orphan receptor C) transcript. In vivo dimethyl sulfate (DMS) mutational profiling with sequencing (DMS-MaPseq), coupled with cryogenic electron microscopy, confirmed its existence as two alternative structural conformations. Furthermore, we used genome-scale CRISPR screens to identify trans factors that regulate gene expression through this RNA structural switch. We found that nonsense-mediated messenger RNA decay acts on this element in a conformation-specific manner. SwitchSeeker provides an unbiased, experimentally driven method for discovering RNA structural switches that shape the eukaryotic gene expression landscape.

Similar content being viewed by others

research on dna structure

Expansion of the RNAStructuromeDB to include secondary structural data spanning the human protein-coding transcriptome

research on dna structure

The RNA fold interactome of evolutionary conserved RNA structures in S. cerevisiae

research on dna structure

Systematic detection of tertiary structural modules in large RNAs and RNP interfaces by Tb-seq

Gene expression is regulated at the RNA level in all kingdoms of life. Some of the oldest groups of RNA-based regulatory mechanisms are ribozymes (catalytically active RNA molecules) and RNA structural switches (elements that adopt two mutually exclusive conformations, each leading to different gene-regulatory outcomes) 1 , 2 , 3 . In bacteria, a subset of RNA switches, termed riboswitches, control gene expression by binding small molecule ligands that induce RNA conformational changes 4 , 5 . The discovery of RNA switches in eukaryotes, however, has been more challenging. While a number of thiamine pyrophosphate-sensing riboswitches have been identified in plants and fungi 6 , only two human RNA switches are known: the protein-dependent RNA switch in vascular endothelial growth factor-A (VEGFA), and m6A modification-based switches 7 , 8 . Therefore, the overall impact of RNA switches on gene expression in higher eukaryotes remains unclear, despite their ubiquity in other domains of life. Here, we introduce SwitchSeeker, a systematic computational and experimental framework for unbiased discovery of RNA structural switches in any transcriptome.

While several RNA switch detection software packages have been developed, most identify new switch sequences based on their homology to one of the 40 known RNA switch families 9 . The small minority of tools enabling de novo prediction of RNA switches lack experimental verification of RNA structure and function 10 , 11 . Therefore, there is an unmet need for scalable methods of detecting eukaryotic RNA switches and assessing the extent to which they carry out regulatory functions in gene expression control. The approach we introduce here relies on integrating multiple computational and experimental methods: RNA switches are first predicted in silico, then structurally and functionally characterized in vivo, which in turn informs the next iteration of in silico predictions. First, we developed a computational model called SwitchFinder for de novo RNA switch detection, and showed that it identifies RNA switches from novel families with higher accuracy than existing models. Combining SwitchFinder with a set of high-throughput experimental techniques, we set up an end-to-end iterative predict-and-validate platform that we term SwitchSeeker. We applied SwitchSeeker to the human transcriptome to identify putative RNA switches, which we then characterized structurally and functionally using massively parallel assays in vivo. By iteratively improving the SwitchFinder predictions with experimental data, we ultimately report 245 high-confidence and functional RNA structural switches.

Finally, we selected the top scoring switch, located in the 3ʹ untranslated region (3ʹUTR) of the RORC (RAR-related orphan receptor C) transcript, for further analysis. We used dimethyl sulfate (DMS) mutational profiling with sequencing (DMS-MaPseq) structural probing and single-particle cryogenic electron microscopy (cryo-EM) to confirm that the predicted switch populates alternate molecular conformations. We then performed genome-scale CRISPR-interference (CRISPRi) screens, which showed that one of the two conformations reduces RORC gene expression through activation of the noncanonical nonsense-mediated decay (NMD) pathway. Taken together, our framework provides new insights into the role of RNA structural switches in shaping the human transcriptome, and outlines a broader approach for future comprehensive characterization of RNA switches regulating eukaryotic gene expression across cell types and organisms.

Systematic annotation of human RNA structural switches

We define RNA structural switches as regulatory RNA elements that affect the expression of the host RNA molecule through conformational shifts. To discover new eukaryotic RNA switch families, we devised an approach called SwitchFinder that, unlike most existing methods 12 , 13 , 14 , 15 , 16 , 17 , does not depend on known sequence motifs. Instead, SwitchFinder uses the RNA sequence to generate an ensemble of secondary structures and their corresponding energy landscape using a Boltzmann equilibrium probability distribution 18 . It prioritizes the sequences that show RNA switch-like features, such as having two local minima in close proximity with a relatively small barrier in between (Fig. 1a and Extended Data Fig. 1a,b ). This approach ensures that RNA switches are identified in a generalizable and family-agnostic way, which we validated by demonstrating its high performance on held-out Rfam families (Fig. 1b and Extended Data Fig. 1c ). We compared the performance of SwitchFinder to SwiSpot, the state-of-the-art method for family-agnostic riboswitch prediction 10 , and observed a performance improvement of 44% on average across all RNA switch families except cyclic di-GMP-II (Fig. 1c ). By relying on biophysical features of the folding energy landscape, SwitchFinder captures a wider variety of RNA switches compared with the existing methods.

figure 1

a , Example of SwitchFinder locating the RNA switch in the VEGFA mRNA sequence. b , Receiver operating characteristic (ROC) curves of SwitchFinder predictions of RNA switches from the common Rfam families. SwitchFinder was applied to a mix of real sequences and their shuffled counterparts (with preserved dinucleotide content). ROC curves measure its ability to correctly select the real sequences. AUC, area under the ROC curve; riboswitch families, c-di-GMP-I (Cyclic di-GMP); FMN, flavin mononucleotide; NiCo, nickel or cobalt ions; SAM, S-adenosyl-l-methionine; THF, tetrahydrofolate; TPP, thiamine pyrophosphate. c , AUCs of RNA switch predictions across the Rfam families for two models: SwitchFinder and SwiSpot 10 . Each dot represents one Rfam family. The lines show the change in accuracy between the two models. The families that have higher AUCs for SwitchFinder are shown with blue lines; the ones that have higher AUCs for SwiSpot are shown in red. P  value calculated with the paired two-sided t -test ( P  = 0.00056). d , AUCs of RNA switch predictions across various groups of natural and synthetic riboswitches, calculated as in b .

To confirm that SwitchFinder is not overly tailored to bacterial riboswitches, we tested it on eukaryotic and synthetic riboswitches, including those sensing theophylline 19 and specific RNA-binding proteins 20 . Additionally, we applied SwitchFinder to ribosomal RNAs to ensure its ability to distinguish RNA switches from nonswitching but highly structured RNAs. This analysis showed that SwitchFinder could distinguish true riboswitches from shuffled controls much more effectively than it could do so with ribosomal RNAs, and that it performed even better on eukaryotic and synthetic riboswitches than it did on bacterial riboswitches (Fig. 1d ). Altogether, these benchmarking results gave us high confidence that SwitchFinder could nominate new eukaryotic RNA switches that would expand our understanding of RNA structural switching in gene regulation.

Discovery of RNA switches with regulatory function in the human transcriptome

Messenger RNA secondary structure in the cell is highly dynamic 21 , 22 , 23 and compartment dependent 24 ; therefore, we reasoned that the SwitchFinder predictions may be greatly improved with experimental measurements of RNA secondary structure from living cells. To counteract the limitations of in silico RNA folding predictions in complex eukaryotic transcriptomes 25 , we enhanced SwitchFinder by allowing the incorporation of in vivo RNA secondary structure probing data to refine the model’s energy terms, resulting in an iterative cycle of computational prediction and experimental validation that we name SwitchSeeker. First, we applied the SwitchFinder model using naive in silico folding to the entirety of the 3ʹUTRs of the human transcriptome, and chose the 3,750 top candidate switches (of length ≤186 nucleotides) as putative switch elements. To identify the RNA switches that are both functional and structurally bi-stable in the cell, we independently performed two high-throughput in vivo screens: a ‘structure screen’ that differentiates RNAs that exist as an ensemble of two mutually exclusive conformations from those that exist only in a single conformation, and a ‘functional screen’ that measures the effect of candidate RNA switches on the expression of a reporter gene.

For the structure screen, we performed an in vivo DMS-MaPseq assay on HEK293 cells expressing a library of the 3,750 candidate RNA switches in a reporter gene context to identify bi-stable RNA structures in the initial pool of 3,750 candidate switches (Extended Data Fig. 2b,c ) 26 , 27 . The accessibility of a single nucleotide in the DMS-MaPseq data is measured as a population average of multiple RNA molecules that represent different minima in the Gibbs free energy landscape. If one conformation dominates the landscape, it dominates the DMS-MaPseq reactivity profile; however, if multiple conformations coexist, they all contribute to the reactivity profile 28 , 29 . SwitchSeeker exploits this distinction in nucleotide accessibility to find RNA switches that coexist in a balanced state between two conformations in vivo.

For the functional screen, we implemented a massively parallel reporter assay (MPRA) 30 to functionally interrogate RNA switches in HEK293 cells. We cloned the library of 3,750 candidate RNA switch sequences or cognate scrambled control sequences into a dual enhanced green fluorescent protein (eGFP)–mCherry fluorescent reporter, directly downstream of the eGFP open reading frame (ORF; Extended Data Fig. 2d ). This enabled us to use eGFP fluorescence to measure the effect of candidate RNA switches on gene expression while using the unaffected mCherry fluorescence as an endogenous control. We transduced HEK293 cells with this synthetic library and sequenced DNA and RNA derived from eight bins of cells sorted by flow cytometry according to their eGFP : mCherry expression ratio (Extended Data Fig. 2e , see Methods ). Of the candidate RNA switches tested, 536 (14%) caused significant downregulation of eGFP relative to their scrambled control, and 538 (14%) showed a significant upregulation (Fig. 2b ). While our study focused on characterizing the RNA switches that act in the context of 3ʹUTRs, the SwitchSeeker framework can be readily applied to the study of other types of RNA switches with the use of appropriate reporter constructs.

figure 2

a , Overview of SwitchSeeker, the platform for RNA switch identification, applied to the 3ʹUTRs of the human transcriptome. b , Examples of regulatory elements identified by the functional screen. Each row represents a single candidate RNA switch, each column represents a single bin defined by the reporter gene expression (eGFP fluorescence, normalized by mCherry fluorescence). Bin 1 corresponds to the cells with the lowest eGFP fluorescence, bin 8 corresponds to the highest. The value in each cell is the relative abundance of the given RNA switch in the given bin, normalized across the eight bins. The three plots show examples of candidate switches with repressive, neutral and activating effects on gene expression. The plots below show cumulative sequence abundances across all of the candidate switches in each group. c , The set-up of the massively parallel mutagenesis analysis. For each candidate RNA switch, we design four mutated sequence variants. Two of them lock the switch into conformation 1, and the other two lock it into conformation 2. A sequence library is then generated (Extended Data Fig. 2d ), in which each candidate RNA switch is represented by the four mutated sequence variants, along with the reference sequence. d , Example of a high-confidence candidate RNA switch identified using the massively parallel mutagenesis analysis. Bottom: Two alternative conformations as predicted by SwitchSeeker. The RNA secondary structure probing data collected with the Structure Screen is shown in color. The Gibbs free energy difference between the two predicted conformations is 2.4 kcal per mol. Top: The effect of the candidate RNA switch locked in one or another conformation on reporter gene expression. Each row corresponds to a single sequence variation that locks the RNA switch into one of the two conformations. Each column represents a single bin defined by the reporter gene expression. The value in each cell is the relative abundance of the given RNA switch in the given bin, normalized across the eight bins.

In the second iteration of SwitchSeeker, guided by in vivo RNA structure data, we refined our predictions, eliminating false positives and focusing on switches with consistent structural configurations in vivo. Comparing outcomes of this iteration with the first iteration, we found a significant increase in the proportion of regulatory active switches ( P  = 1 × 10 −6 , Extended Data Fig. 2f ), validating the enhanced accuracy through in vivo data integration. This process prioritized 1,454 putative RNA switches that occupy two alternative conformational minima and are regulatory active in vivo.

Having identified a large set of candidate RNA switches that affect gene expression, we aimed to assess the degree to which the two stable conformations show divergent regulatory function. For this, we extended our MPRA to include targeted mutations designed to shift the equilibrium between the two conformations of each candidate RNA switch. This was achieved by either disrupting or strengthening conformation-specific stem loops by introducing either individual mutations or reciprocal mutation pairs (Fig. 2c ). This additional screen enabled us to identify bona fide RNA switches with strong conformation-dependent activity. We found 245 RNA switches that differentially regulated reporter gene expression when locked in a specific structural conformation. An example candidate switch (located in the 3ʹUTR of TCF7 (transcription factor 7)) is shown in Fig. 2d : the TCF7 RNA switch landscape has two local minima, corresponding to two alternative conformations supported by in vivo DMS-MaPseq data (Fig. 2d , bottom). Two mutations in different parts of the switch sequence that favor conformation 1 resulted in lower expression of the eGFP reporter (top). Conversely, two mutations that favor conformation 2 increased eGFP expression. This observation indicates that the two conformations of the TCF7 RNA switch elicit divergent regulatory functions.

A bi-stable RNA switch in the 3ʹUTR of RORC

To demonstrate the validity of SwitchSeeker’s predictions, we aimed to biochemically characterize one of the identified RNA switches. We selected the switch that had the most pronounced difference in regulatory functions between its two conformations: a 186 nucleotide element located in the 3ʹUTR of the RORC mRNA. Based on the predicted secondary structures, we designated the three regions involved in the base pairing as ‘Box 1’ (61–69 nucleotides), ‘Box 2’ (73–81 nucleotides) and ‘Box 3’ (116–123 nucleotides). Our data indicate that Box 1 can form base pairs either with Box 2 or with Box 3, resulting in two mutually exclusive conformations that each exert distinct effects on gene expression (Fig. 3a ). To confirm that the RORC RNA switch exists as an ensemble of two stable conformations, we designed mutation–rescue pairs of sequences that first shift the equilibrium towards one conformation (mutation), and then shift it towards the other conformation (rescue) (Fig. 3b and Supplementary Data Files ), and used in vitro RNA SHAPE (selective 2ʹ-hydroxyl acylation analyzed by primer extension) 31 to monitor the resultant RNA structures. We found that mutating Box 3 (117-AC) reduced the reactivity of the Box 2 region (Fig. 3c ), supporting the idea that Box 1 would switch its contacts from Box 3 to Box 2, thereby stabilizing conformation 2. Introducing the rescue mutation (65-GT,117-AC) into Box 1 restored the original reactivity profile of the element. Complementary experiments using the mutation (77-GA) to stabilize conformation 1, and the rescue mutation (63-TC,77-GA) to stabilize conformation 2, had a similar outcome. Even though we did not observe a substantial decrease in reactivity of Box 3 upon the 77-GA mutation, the rescue significantly increased its reactivity (Extended Data Fig. 3a,b ). These findings support the role of the three highlighted regions in forming an ensemble of states in which Box 2 and Box 3 compete for base pairing to Box 1.

figure 3

a , Arc representation of the two alternative conformations of the RORC RNA switch as predicted by SwitchSeeker. The two conformations are shown in blue and red, respectively. Left: The schematic representations of the two conformations, as used throughout the article. b , The set-up of mutation–rescue experiments. The switching regions are color coded as in a . A-U and C-G base pairing is shown with compatible shapes (triangle and half-circle). The two conformations of the switch reside in the equilibrium state. Mutation of the Box 3 region disrupts the base pairing between the Box 1 and the Box 3 regions. This causes a shift of the equilibrium towards conformation 2. Rescue mutation of the Box 1 switching region restores the base pairing between Box 1 and Box 3, but at the same time it disrupts the base pairing between Box 1 and Box 2. Therefore, the equilibrium shifts towards conformation 1. c , In vitro SHAPE reactivity of the RORC RNA switch sequence in vitro. Left: SHAPE reactivity profiles for the reference sequence (in gray) and for the mutation–rescue pair of sequences (blue, 65-GT,117-AC; red, 117-AC). Shown is the average for three replicates with the respective error bars (s.d.). The SHAPE reactivity changes in the nonmutated regions are highlighted with bold arrows. Right: Barplots of cumulative SHAPE reactivity in the switching regions. d , Secondary structures of the two conformations of RORC RNA switch predicted by the RNAstructure algorithm 56 guided by the DMS reactivity data. The base pairing of Box 1 with either Box 3 (conformation 1) or Box 2 (conformation 2) is highlighted by a red frame. The two clusters were identified using the DRACO unsupervised deconvolution algorithm 28 . e , Accessibility of the Box 2 (x axis) and Box 3 (y axis) regions of the RORC element across cell lines, as measured with DMS-MaPseq (normalized reactivity, see Methods ). The cell lines were engineered to express a GFP reporter containing the RORC switch sequence in the 3ʹUTR, and the accessibility of the reporter mRNA was measured with DMS-MaPseq. Linear regression is shown with an orange line. f , Accessibility of the Box 2 (x axis) and Box 3 (y axis) regions of the RORC element in the endogenous RORC mRNA, as measured with DMS-MaPseq (normalized reactivity, see Methods ).

To extend our in vitro observations to living cells, we performed high-coverage DMS-MaPseq of the RORC switch in vivo in the reporter context (Extended Data Fig. 3c ). Using a DMS concentration sufficient to cause multiple modifications to the same RNA molecule, we implemented the DRACO computational approach 28 , which identified two distinct clusters in both biological replicates, representing the two conformations, at relative proportions of 27% to 73% (Fig. 3d and Extended Data Fig. 3e ). The profiles of these clusters were distinct ( P  = 0.18 and P  = 0.72 in replicates 1 and 2, respectively) but showed high correlation within each cluster across replicates (Extended Data Fig. 3d ). To ascertain whether sequence mutations similarly influence the conformational equilibrium in vivo, we conducted DMS-MaPseq on the two rescue mutant sequences (Extended Data Fig. 3f ). This analysis corroborated our SHAPE findings: the (63-TC,77-GA) mutation stabilized conformation 2, while the (65-GT,117-AC) mutation favored conformation 1. The alignment of in vitro SHAPE and in vivo DMS-MaPseq results reinforces the notion that the RORC switch consistently exhibits its conformational dynamics across both experimental settings.

To determine whether the RORC element functions as a dynamic RNA switch or simply represents a static equilibrium of two conformations, we investigated whether the proportions of its alternative conformations change inside cells. To this end, we introduced a reporter containing the RORC sequence into five cell lines representing diverse genetic backgrounds: LNCaP (prostate), MCF-7 (breast), HepG2 (liver), ZR-75-1 (breast), 293T (kidney) and LS174T (colon). Using DMS-MaPseq, we assessed the conformational dynamics of the RORC switch in these cell lines. Our findings confirm not only that the relative proportions of the two conformations vary among these cell lines but they also demonstrate a strong anticorrelation in the accessibility of Boxes 2 and 3 ( R  = −0.75) (Fig. 3e ). This anticorrelation supports the hypothesis of their competitive base pairing with Box 1, further suggesting dynamic switching behavior.

To extend our analysis from the reporter to the endogenous context, we performed DMS-MaPseq targeting the endogenous RORC mRNA across the same five cell lines. This approach yielded similar observations: a strong anticorrelation in accessibility ( R  = −0.81, Fig. 3f ) and variability in the relative proportions of the two conformations. Importantly, the conformational ratios across cell lines were highly correlated between the reporter and endogenous contexts ( R  = 0.93, Extended Data Fig. 3g ), demonstrating the high relevance of the reporter screening approach to understanding the behavior of RNA switches in the context of their endogenous mRNA. These data strongly support the hypothesis that the RORC element functions as an RNA switch, adopting two alternative conformations, the balance of which is influenced by the cellular landscape.

Finally, we used single-particle cryo-EM to investigate the tertiary structures of the two RORC RNA switch conformations that we identified using SHAPE and DMS-MaPseq. Micrographs of the reference RORC RNA switch contain a mixture of compact and extended particles, with features suggestive of RNA secondary structure (Fig. 4a and Extended Data Fig. 4a–c ), including elongated tertiary features consistent with A-form helices, as well as bends and junctions consistent with complex RNA folding (Extended Data Fig. 4d–f ). Strikingly, particles of the conformation 1 mutant (77-GA) appear more extended, while those of the conformation 2 mutant (117-AC) are mostly compact (Fig. 4a ). Cryo-EM image processing shows that reference RORC RNA can be classified into three structural classes (Classes A, B, and C), with the Class B structure absent in the (77-GA) mutant and Class A absent in the (117-AC) mutant (Fig. 4b ). This analysis suggests that Class A can be assigned to the more extended conformation 1, and Class B to the compact conformation 2 (Fig. 4b ). We propose that Class C, which is present in all three datasets, represents a folding intermediate lacking the tertiary interactions made by either Boxes 2 or 3. Although the extreme flexibility of the RNA limits the resolution of the reconstructions to approximately 10 Å (Extended Data Fig. 5g–i ), it is sufficient for discrimination of these different RNA folds. These results confirm that the RORC RNA switch indeed adopts distinct tertiary structures in solution and that the designed mutations heavily bias toward one conformation or the other.

figure 4

a , Cryo-EM of wild-type RORC mRNA, 77-GA mutant and 117-AC mutant, as representative examples of qualitatively different compact and extended RNA-like particles. Different morphologies are indicated by numbered labels. Source micrographs were phase-flipped, Gaussian filtered and contrast inverted for display (see Extended Data Fig. 5 ). Scale bars, 50 nm. b , Three structural classes of the refolded RORC 3ʹ mRNA element as determined on cryo-EM processing, with RNA-like features (top). Further cryo-EM imaging and 3D classification of the 77-GA mutant (middle) and 117-AC mutant (bottom) indicate that Class A is present in wild-type and 77-GA samples but absent from the 117-AC sample, and Class B is conversely present in wild-type and 117-AC samples but absent from the 77-GA mutant. Class C is common to all three samples. We thus assign Class A as the conformation 1 state, and Class B as the conformation 2 state. We propose Class 3 to represent a partly folded intermediate that is not disrupted in the mutated constructs.

Alternative conformations of the RORC RNA switch play divergent roles in gene regulation

Having validated that the RORC RNA switch can adopt two stable conformations, we next explored the distinct regulatory activities of each conformation. We engineered HEK293 cell lines to express eGFP reporters carrying RORC switch variants in the 3ʹUTR and assessed eGFP expression changes using flow cytometry. To specifically lock the switch in each conformation, we implemented two parallel strategies: for conformation 1, one strategy involved mutating Box 2 to prevent its pairing with Box 1 (mutant ‘73-CCCTATGA’), and another introduced mutations into both Boxes 1 and 3 to disrupt their interaction with Box 2 (mutant ‘61-TATATAA,116-TTATATA’). Remarkably, both strategies, despite modifying different parts of the sequence, induced similar eGFP expression changes for each conformation: both mutants that stabilized conformation 1 increased reporter gene expression (Fig. 5a ), while analogous strategies applied to stabilize conformation 2 decreased expression. We then investigated whether the modulation in gene expression was primarily influenced by the RNA’s secondary structure rather than its sequence composition. Using cell lines stably expressing mutants from our earlier rescue–mutation experiments (Fig. 3b ), we evaluated the impact on eGFP expression. Across three tested mutation–rescue pairs, the mutants favoring conformation 2 consistently showed reduced eGFP expression compared with those favoring conformation 1 (Fig. 5b ). These findings from the reciprocal mutation–rescue experiments underscore the pivotal role of RNA secondary structure in the specific regulatory functions of the RORC RNA switch.

figure 5

a – c , Box plots of the relative expression of the reporter construct across different RNA conformations and sequences in HEK293 cells ( a ), reciprocal mutations ( b ) and primary Th17 cells ( c ). Relative expression is quantified as the ratio of eGFP to mCherry fluorescence for individual cells, as measured by flow cytometry ( n  = 10,000 cells). The boxes shows the quartiles of the dataset, with the central line indicating the median value; the whiskers extend from the 10th to the 90th percentile. The colors denote specific RNA conformations or sequences: conformation 1 in blue, conformation 2 in red, reference sequence in gray, and a scrambled sequence in yellow. The diagrams below the box plots show the balance of the two conformations in the RNA populations, with existing conformations marked by a ‘+’ sign. Statistical significance was determined with a two-sided independent t -test. a , The mutations left to right: 73-CCCTATGA; 61-TATATAA,116-TTATATA; reference; 116-CCCTAAG; 62-GCACAGT,73-ACTGTGC. P  values left to right: 1.1e−10, 2.6e−22, 1.6e−06, 0.00025. b , Effect of the shift in equilibrium between two conformations of the RORC switch on reporter gene expression for reciprocal mutations. The mutation–rescue experiments were performed as shown in Fig. 3b . The mutations left to right: reference; 65-GT,117-AC; 117-AC; 66-AC; 66-AC,74-GT; 77-GA; 63-TC,77-GA. P  values left to right: 7.1e−117, 3.6e−50, 5.9e−260. c , Effect of shift in the equilibrium between two conformations of the RORC switch on reporter gene expression in primary Th17 T cells. Human CD4+ T cells were infected with lentiviral constructs carrying one of the three sequences in the reporter gene’s 3ʹUTR, and subsequently differentiated into Th17 cells. The mutations left to right: scrambled RORC RNA switch; 77-GA; reference. P  values left to right: 1.7e−124, 2.6e−24. d , e , Scatterplots of the relationship between the relative conformation ratio of the RORC element, as measured with DMS-MaPseq in reporter-expressing cell lines, and stability of the reporter mRNA ( n  = 3 replicates) ( d ) and the endogenous RORC mRNA ( n  = 2 replicates) ( e ), as measured by RT-qPCR following the α-amanitin treatment. The reporter contains the eGFP ORF, followed by the 3ʹUTR containing the RORC RNA switch sequence. Horizontal lines represent the mean of mRNA stability. Correlation of mean stability and the relative conformational ratio was measured using the Pearson correlation coefficient. f , Effect of ASOs on endogenous RORC mRNA expression, as measured by RT-qPCR. The targeting ASOs are complementary to Box 2 of the RNA switch; the control ASOs have the same nucleotide composition as the targeting ones but do not target the RORC RNA switch sequence. P  values were determined using the two-sided independent t -test, comparing the RORC-targeting and control ASOs, independent of the ASO chemistry. n  = 2 replicates. LNA, locked nucleic acids.

The RORC gene encodes the nuclear receptor ROR-γ that plays a crucial role in T-helper (Th)17 cell differentiation, a key process in the immune response, which is also implicated in autoimmune diseases 32 , 33 . To explore the functional impact of the RORC RNA switch in Th17 cells, we introduced into primary human CD4+ T cells a reporter construct carrying the RORC RNA switch sequence in the eGFP 3ʹUTR. We then differentiated these cells into Th17 cells (Extended Data Fig. 6 , ref. 34 ). Incorporating the native RORC RNA switch markedly reduced eGFP expression compared with a control with a scrambled sequence (Fig. 5c ). Additionally, altering the switch’s conformation with a 77-GA mutation (towards conformation 1) weakened this repression, confirming the activity of the RORC RNA switch in Th17 cells.

Having demonstrated the distinct regulatory effects of the RORC RNA switch’s two conformations, we next asked whether their relative proportions in different cell types would result in differential regulation of the RORC transcript. To assess this, we compared the stability of the reporter mRNA containing the RORC switch between cell lines following inhibition of RNA polymerase II with α-amanitin. We discovered a strong correlation between the conformational ratio and reporter mRNA stability, indicating that higher proportions of conformation 1 resulted in higher stability, whereas higher proportions of conformation 2 resulted in lower stability ( R  = 0.85, P  = 0.03, Fig. 5d ). We extended this analysis to the endogenous RORC mRNA, where we observed a similar strong correlation ( R  = 0.96, P  = 0.004, Fig. 5e ).

Next, we investigated whether, instead of sequence mutations, trans-acting agents such as antisense oligonucleotides (ASOs) complementary to parts of the RNA switch sequence could shift the equilibrium between the two conformations and thereby influence gene expression 35 . We designed two ASOs to target the Box 2 region, aiming to shift the equilibrium towards conformation 1, which we would expect to increase the levels of RORC mRNA expression. We transfected three cell lines, representing different conformational ratios (LNCaP, MCF-7 and LS174T), with these ASOs carrying either 2ʹ-O-(2-methoxyethyl) (2-MOE) oligoribonucleotides or locked nucleic acids. In both cases, ASO treatment led to a significant increase in RORC mRNA levels compared with nontargeting control ASO (Fig. 5f ). Notably, this effect was more pronounced in cell lines with a higher proportion of conformation 2 (LNCaP, P  = 0.006; MCF-7, P  = 0.005) compared with those with a lower proportion (LS174T, P  = 0.71). Together, these data further underscore the link between structural conformation and resultant gene expression, solidifying the role of the RORC element as a regulatory switch in its native gene context.

Genome-scale genetic screens reveal molecular mechanisms underlying the RORC RNA switch

To investigate how the RORC RNA switch influences gene expression at the molecular level, we performed genome-wide CRISPRi screens in Jurkat T cells expressing one of two eGFP reporter constructs: one with the native RORC switch and another with the 77-GA mutation that favors conformation 1 (Extended Data Fig. 7a ). These screens were intended to identify gene products, the depletion of which altered RORC RNA switch-mediated control of reporter gene expression, indicating their functional connection to the RNA switch mechanism 36 . We focused on identifying two gene groups: those essential for repression induced by the RORC switch (as indicated by an increase in reporter gene expression), and those affecting the conformational dynamics of the switch (as indicated by a change in the ratio of reporter expression between the native switch and the 77-GA mutant).

To identify factors influencing the RORC RNA switch’s repressive function, we analyzed the abundance of single-guide RNAs in cells with high versus low reporter gene expression in both screens. This analysis highlighted the NMD pathway, with top hits including core NMD factors such as SMG8, UPF1, UPF2 and UPF3B (Fig. 6a ). Pathways associated with general gene expression, including ribosome biogenesis and endoplasmic reticulum stress, were also notable (Extended Data Fig. 7b ). To pinpoint factors affecting the divergent activities of the switch’s two conformations, we compared the distribution of sgRNAs across the high and low reporter expression bins between cells expressing the native switch and the 77-GA mutant. This comparison reinforced the central role of the NMD pathway (Fig. 6b ), given that the knockdown of NMD components lessened the reporter expression difference between the native and mutant switch. Surprisingly, while knockdowns of SURF complex (that is, SMG1–UPF1-eRF1–eRF3 ; the complex that initiates NMD on stalled ribosomes 37 ) components produced strong effects, the exon–junction complex (EJC) components did not produce significant changes in either screen, suggesting that the RORC RNA switch operates via a noncanonical EJC-independent NMD pathway 38 , 39 . Moreover, our findings suggest that the NMD pathway acts preferentially on conformation 2 of the RORC RNA switch, as evidenced by the stronger increase in expression of the 77-GA mutant compared with the native RORC sequence.

figure 6

a , Top: Expression change: high versus low: comparison of sgRNA representation between the bottom and the top quantiles of reporter gene expression (across both reference and 77-GA mutant cell lines), represented as a volcano plot. Genes, annotated as part of the NMD pathway by gene ontology (GO), are colored in red. The core components of the canonical NMD pathway are colored in purple and labeled. All other genes are colored in green. Bottom: Gene set enrichment analysis (GSEA) plot for the NMD pathway for the above comparison. −logP: negative logarithm of P value. b , Differences between conformations: wild type versus the 77-GA mutant. Comparison of ratios between top and bottom expression quantiles for the two cell lines. Higher values on the x axis indicate that sgRNAs targeting this gene have a stronger effect on reporter gene expression in the reference cell line compared with the 77-GA mutant cell line. Top: ‘ratio of ratios’ comparison 57 represented as a volcano plot. Genes are colored as in a . Bottom: GSEA plot for the NMD pathway for the above comparison. −logP: negative logarithm of P value. c , d , The effect of knockdown of SURF ( c ) and EJC ( d ) member proteins on the RORC RNA switch reporter gene expression, relative to a scrambled sequence. The individual genes were knocked down using the CRISPRi system in both the reference and the scrambled cell lines, then the change of reporter gene expression was measured using flow cytometry ( n  = 2 replicates). The bar plots show the ratio of the expression of the scrambled sequence to that of the wild-type sequence of the RORC RNA switch. P  values were calculated using the two-sided Student’s t -test. e , Bar plots of the fractions of reads carrying the wild-type RORC switch sequence or B77-GA mutant variant in the UPF1 cross-linking and immunoprecipitation (CLIP) library. Left: input RNA libraries, extracted from the wild-type and 77-GA mutant-expressing Jurkat cells, mixed at a 1:1 ratio. Right: libraries after anti-UPF1 immunoprecipitation (IP). The fractions are normalized by the variant fractions in the input libraries. The P  value was calculated using the translation efficiency ratio test 58 . FC, fold change. n  = 2 replicates. f , The effect of NMDI14 on the accessibility of the Box 2 and the Box 3 regions of the RORC element, as measured by DMS-MaPseq. Changes in individual nucleotide accessibility are shown on the inner plot. Statistical significance was determined using a two-sided independent t -test. g , The effect of UPF1 knockdown on endogenous RORC mRNA expression, as measured by RT-qPCR (control, n  = 4 replicates; UPF1 knockdown, n  = 6 replicates). siCTRL, non-targeting dicer-substrate small interfering RNA; siUPF1, UPF1-targeting dicer-substrate small interfering RNA. P  values were calculated using the two-sided Student’s t -test. h , i , Effect of the proteasome inhibitors carfilzomib ( h ) and bortezomib ( i ) on the RNA switch-mediated expression change ( n  = 4 replicates). Data are given as the mean ± s.d. Statistical significance was determined using dose–response modeling followed by ANOVA, to compare the fitted models to assess differences in the effect of the inhibitors on the RNA switch-mediated expression.

To confirm these results, we applied CRISPRi to individually knock down NMD factors in cells expressing the reference switch, the 77-GA mutant, or a scrambled sequence. Knockdowns of SURF complex members, but not EJC components, significantly affected the switch’s repressive function, confirming our genome-wide screen results (Fig. 6c,d ). Furthermore, reducing SURF complex expression also diminished the expression difference between the reference and 77-GA mutant, primarily by increasing reporter expression in the mutant (Extended Data Fig. 7d ). This evidence indicates that NMD predominantly acts on conformation 2 of the RORC RNA switch.

Given its affinity for structured RNAs 40 , we reasoned that UPF1 might bind the two RORC RNA switch conformations with different affinities. To test this, we mixed together the reference and the Box 2 mutant (77-GA) reporter lines at a 1:1 ratio and measured UPF1 binding using CLIP-qPCR (cross-linking and immunoprecipitation followed by qPCR). The reference RORC UTR sequence (containing a mixture of conformations 1 and 2) had significantly stronger binding to UPF1 than its 77-GA mutant that could form only conformation 1 (Fig. 6e ). Similarly, we observed a strong preference for UPF1 to bind to a mutant 116-CCCTAAG that favors conformation 2 than to the 77-GA mutant, and this effect was even more pronounced than the difference between reference and 77-GA (logarithm of fold change of 1.12 versus 0.41). Together, these results underscore the preference of UPF1 to bind to conformation 2 of the RORC switch (Extended Data Fig. 7e ).

We reasoned that conformation-specific NMD would deplete mRNA molecules with conformation 1, thereby resulting in a relative increase in the proportion of conformation 2. To test this, we used NMDI14, a molecule that disrupts SMG7–UPF1 interactions, to inhibit NMD 41 . Assessing the accessibility of Boxes 2 and 3 in endogenous RORC mRNA using DMS-MaPseq, we found a significant decrease in the accessibility of Box 2 upon NMD inhibition ( P  = 0.03, Fig. 6f ), indicative of a shift towards conformation 2, possibly due to slower decay and accumulation of mRNAs in this conformation. Hence, inhibiting NMD led to a shift in the relative proportions of the two conformations.

Having demonstrated the conformation-specific effect of NMD on the RORC switch in the reporter context, we sought to extend our analysis to the endogenous RORC mRNA. We knocked down UPF1 in various cell lines and assessed the levels of endogenous RORC mRNA using quantitative polymerase chain reaction with reverse transcription. UPF1 knockdown in various cell lines led to a substantial increase in RORC mRNA expression, notably more pronounced in cell lines with a higher prevalence of conformation 2 (LNCaP, P  = 0.005; MCF-7, P  = 0.02) compared with those with a lower prevalence (LS174T, P  = 0.09) (Fig. 6g ). This result emphasizes the role of UPF1 in regulating endogenous RORC mRNA stability in a conformation-dependent manner.

Considering the NMD pathway’s role in directing proteins translated from aberrant mRNA to proteasomal degradation 42 , we reasoned that the RORC RNA switch might similarly target its gene product. To test this, we treated reporter cells with the proteasome inhibitors carfilzomib and bortezomib, each acting through different mechanisms. Proteasome inhibition resulted in a significantly greater increase in eGFP expression in cells expressing the RORC switch compared with the control (Fig. 6h,i ), indicating that NMD-induced proteasomal degradation of the switch-containing gene product contributes to the observed effect on gene expression.

We propose that UPF1 preferentially recognizes switch conformation 2 over conformation 1, and that the recruitment of the SURF complex by UPF1 consequently leads to decreased gene expression through proteasome-mediated degradation of translation products and mRNA decay, preventing repeated rounds of translation (Fig. 7b ). Moreover, sequence mutations that influence the conformational equilibrium not only alter the RNA’s energy landscape but also modulate SURF recruitment and RNA stability, reflecting the nuanced control of gene repression by the switch. The mechanisms underlying the switching between conformations, however, remain an area for further investigation.

figure 7

a , Schematic diagram of a shallow energy landscape for the RORC 3ʹ mRNA element. Shallow global minima characterizing the conformation 1 (cryo-EM Class A) and conformation 2 (cryo-EM Class B) structures themselves comprise multiple local minima in which various secondary structure elements fold or unfold while preserving overall tertiary structure and biological activity. These local minima are illustrated by secondary structure models for various DRACO cluster members. The two global minima are separated by a kinetic barrier that represents a partially folded intermediate (cryo-EM Class 3). The two dashed lines indicate alterations to the global landscape exhibited by the mutant sequences, blue for the 77-GA mutant and red for the 117-AC mutant. These altered landscapes eliminate one of the global minima without disrupting the intermediate. b , Proposed mechanism of the RORC RNA switch. The RNA switch exists in an ensemble of two states. One of them is recognized by the SURF complex; such recognition triggers mRNA degradation (likely to be mediated by SMG5) and protein degradation (mediated by the proteasome), thus affecting gene expression.

Collectively, we show that the RORC RNA switch influences gene expression through conformation-specific engagement of NMD factors that lead to control of mRNA and protein stability. Importantly, the RORC switch is only one example out of 245 functionally validated human RNA switches identified in this work, emphasizing the power of our SwitchSeeker approach to illuminate new areas of eukaryotic RNA biology.

Historically, RNA switches were identified primarily through biochemical experimentation, measuring direct ligand interactions 43 , 44 , and comparative genomics to identify conserved noncoding regions that act as cis-regulatory elements in bacteria 45 , 46 . These methods, however, present challenges in eukaryotic contexts due to the dynamic nature of mRNA structures and the complexity of eukaryotic gene regulation 22 , 24 . Additionally, the vast genomic landscape and low sequence conservation in eukaryotes complicate the direct application of these approaches 47 , 48 , 49 . While numerous tools and algorithms exist for riboswitch prediction (reviewed in refs. 50 , 51 ), few of those focus on de novo discovery that is family-agnostic. The exceptions include SwiSpot 10 , which focuses on identifying the putative switching sequence, and the conditional probability-based method 52 . None of these algorithms has been shown to predict functional RNA switches from novel families in eukaryotic genomes. Addressing these challenges, SwitchSeeker integrates biochemistry, systems biology and functional genomics to create a comprehensive platform for RNA switch discovery and characterization in eukaryotes. By covering the entire discovery process, from de novo predictions to the annotation of mechanisms, SwitchSeeker overcomes the limitations of existing methods. Looking forward, its capability to scale across complete transcriptomes sets the stage for a thorough characterization of RNA switches across diverse cell types and organisms, enhancing our understanding of their roles across the tree of life.

Advancements in genomic technologies such as RNA secondary structure probing (DMS-seq, SHAPE-seq) and single-particle cryo-EM have been instrumental in our systematic exploration of RNA switches, enabling us to delve into the diverse conformations of RNA molecules and their three-dimensional structures despite challenges such as size and flexibility 28 , 29 , 53 . This has opened up opportunities to study the functional differences between alternative RNA conformations and their role in gene expression control. Our DMS-MaPseq and cryo-EM data suggest that the RORC 3ʹ mRNA element inhabits a shallow energy landscape with two rugged minima linked to two major molecular conformations (Fig. 7a ), thereby validating the SwitchSeeker approach to identifying RNA molecules with bi-stable energy landscapes. Genome-wide CRISPRi screens identified the EJC-independent NMD pathway as a key mediator of the gene regulatory mechanism of the RORC switch. Together, our studies of the RORC switch not only uncover new regulatory biology but also provide a blueprint on how the SwitchSeeker pipeline can enable rapid functional and mechanistic characterization of new RNA switches.

RNA structure is known to influence gene expression in health and disease 35 , as shown by our recent identification of specific RNA structures that influence splicing in metastatic cancers 54 . However, dynamic RNA structures such as RNA switches are a relatively unexplored aspect of gene expression control in eukaryotes. Our observations indicate a prevalence of RNA switches in the human transcriptome, suggesting that RNA conformation-dependent gene regulation is a widespread phenomenon. In our study we chose stringent criteria for selecting RNA switches, requiring them to be bi-stable in vivo, meaning that they populate two mutually exclusive structural conformations. However, it is important to note that not all RNA switches may conform to this binary model; some, such as the HIV-1 TAR RNA, have transient but functional conformations 55 , and others might present multistability, adding layers to regulatory control. Modifications to the SwitchSeeker platform will be necessary to explore these distinct classes of RNA structural elements.

While SwitchSeeker offers a robust framework for identifying functional RNA structural switches, there are several caveats and limitations to consider. First, identifying RNA switches that operate under specific cellular conditions requires structure probing assays to be conducted in those exact conditions, which can be challenging and resource intensive. Additionally, SwitchSeeker does not identify ligands for RNA switches; this necessitates complementary approaches to uncover the specific molecules interacting with these RNA elements. Future technological advancements could significantly enhance the tool’s efficacy. Currently, the absence of high-quality RNA structure datasets across full transcriptomes limits the comprehensive application of SwitchSeeker. The development of such datasets would enable more efficient and accurate RNA switch identification. Moreover, integrating additional functional assays, such as those targeting RNA switches that influence splicing, could broaden the scope and impact of SwitchSeeker.

The known examples of human RNA switch mechanisms include mutually exclusive binding of RNA-binding proteins by two different RNA conformations 8 and m6A modification-based switching 7 . In this study, we introduce a novel switch mechanism that operates via the NMD pathway, suggesting a vast potential for diverse metabolic pathways in RNA switch functionality. SwitchSeeker’s utility lies in its ability to identify and elucidate these mechanisms in high throughput, irrespective of their specific pathways. The modulation of gene expression through shifts in RNA conformation, as achieved with ASOs in this study, opens new possibilities for targeting RNA switches in future therapeutics. SwitchSeeker is available for use and adaptation, and we hope that it will pave the way for many new discoveries in RNA-based regulation in eukaryotes.

SwitchFinder: detailed description of the algorithm

Conflicting base pairs identification.

Conflicting base pairs were detected using a modification of the MIBP algorithm developed by L. Lin and W. McKerrow 59 . First, a large number of folds (default N = 1,000) is sampled from the Boltzmann distribution. If structure probing data (such as DMS-seq or SHAPE-seq) is provided, the Boltzmann distribution modeling software (part of the RNAstructure package 56 ) incorporates the data as a pseudofree energy change term. Then, the base pairs are filtered: the base pairs that are present in almost all of the folds or are absent from almost all of the folds are removed from the further analysis. Then, mutual information for each pair of base pairs is estimated. To do so, each base pair is represented as a binary vector of length N, where N is the number of folds considered; in this binary vector, a given fold is represented as 1 if this base pair is present there, or as 0 if it is not. Mutual information between each two base pairs is calculated as in ref. 60 . This results in an M × M table of mutual information values, where M is the number of base pairs considered. Then, the sum of each row of the square table is calculated. In the resulting vector K of length M, each base pair is represented by a sum of mutual information values across all of the other base pairs. Then, only the base pairs for which the sum of mutual information values passes the threshold of U × MAX(K) are considered, where U is a parameter (default value 0.5). We call the base pairs that pass this threshold the ‘conflicting base pairs’.

Conflicting stems identifications

Once the conflicting base pairs are identified, they are assembled into conflicting stems, or series of conflicting base pairs that directly follow each other and therefore could potentially form a stem-like RNA structure. More specifically, the base pairs (a, b) and (c, d) form a stem if either (a == c − 1) and (b == d + 1), or (a == c + 1) and (b == d − 1). The stem is defined as a pair of intervals ((u, v), (x, y)), where v − u == y − x. Then, the conflicting stems are filtered by length: only the stems that are longer than a certain threshold value (default value: 3) are considered. Among these stems, the stems that directly conflict with each other are identified. Two stems ((u 1 , v 1 ), (x 1 , y 1 )) and ((u 2 , v 2 ), (x 2 , y 2 )) conflict with each other if there is an overlap longer than a threshold value between either (u 1 , v 1 ) and (u 2 , v 2 ), or (u 1 , v 1 ) and (x 2 , y 2 ), or (x 1 , y 1 ) and (u 2 , v 2 ), or (x 1 , y 1 ) and (x 2 , y 2 ). The default threshold value is 3. The pairs of conflicting stems are sorted by the average value of their K values (sums of mutual information). The highest scoring pair of conflicting stems is considered the winning prediction, representing the major switch between two of the local minima present in the energy folding landscape of the given sequence. If no pairs of conflicting stems pass the threshold, SwitchFinder reports that no potential switch is identified for the given sequence.

Identifying the two conflicting structures

Given the prediction of the two conflicting stems, the folds that represent the two local minima of the energy folding landscape are predicted. Importantly, SwitchFinder focuses on optimizing the prediction accuracy, as opposed to the commonly used approach of energy minimization 61 . The MaxExpect program from the RNAstructure package 56 is used; the base pairings of each of the conflicting stems are provided as folding constraints (in Connectivity Table format). Furthermore, the two predicted structures are referred to as conformations 1 and 2.

Activation barrier estimation

The RNApathfinder software 62 is used to estimate the activation energy needed for a transition between the conformations 1 and 2.

Classifier for prediction of RNA switches

The curated representative alignments for each of the 50 known riboswitch families were downloaded from the Rfam database 9 . Each sequence is complemented by its shuffled counterpart (while preserving dinucleotide frequencies 63 ). For all of the sequences, the two conflicting conformations, their folding energies and their activation energies are predicted as above. To estimate the performance of SwitchFinder for a given riboswitch family, all of the sequences from this family are placed into the test set, while all of the sequences from the other families are placed into the training set. Then, a linear regression model is trained on the training set, in which the response variable is binary and indicates whether the sequence is a real riboswitch or is a shuffled counterpart, and the predictor variables are the average folding energy of the two conformations and the activation energy of the transition between them. The trained linear regression model is then run on the test set, and its performance is estimated using the receiver operating characteristic curve.

Prediction of RNA switches in human transcriptome

The coordinates of 3ʹUTRs of the human transcriptome were downloaded from UCSC Table Browser 64 , table tb_wgEncodeGencodeBasicV28lift37. The sequences of 3ʹUTRs were cut into overlapping fragments of 186 nucleotides in length (with overlaps of 93 nucleotides). For all of the sequences, the two conflicting conformations, their folding energies and their activation energies were predicted as above. A linear regression model was trained as described above on all 50 known riboswitch families. The model was applied to the 3ʹUTR fragments from the human genome, and the fragments were sorted according to the model prediction scores. The top 3,750 predictions were selected for further investigation.

Incorporation of in vivo probing data

In vivo probing data, such as DMS-MaPseq, is used to apply pseudoenergy restraints when sampling folds from the Boltzmann distribution (that is, using the –SHAPE parameter in RNAstructure package commands 56 ). To test the hypothesis of whether the in vivo probing data support the presence of two conflicting conformations in a given sequence, the following workflow was used. First, the two conflicting folds were predicted with SwitchFinder using in silico folding only. Then, SwitchFinder was run on the same sequence with the inclusion of in vivo probing data. If the same two conflicting folds were predicted among the top conflicting folds, the probing data were considered supportive of the presence of the two predicted conformations.

Mutation generation

To shift the RNA conformation ensemble towards one or another state, mutations of two types were introduced.

‘Strengthen a stem’ mutations: given two conflicting stems ((u 1 , v 1 ), (x 1 , y 1 )) and ((u 2 , v 2 ), (x 2 , y 2 )), one of the stems (for example, the first one) was changed in a way that would preserve its base pairing but deny the possibility of forming the second stem. To do so, the nucleotides in the interval (u 1 , v 1 ) were replaced with all possible sequences of equal length, and the nucleotides (x 1 , y 1 ) were replaced with the reverse complement sequence. Then, the newly generated sequences were filtered by two predetermined criteria: (i) the second stem cannot form more than a fraction of its original base pairs (default value 0.6), and (ii) the modified first stem cannot form long paired stems with any region of the existing sequence (default threshold length 4). The sequences that passed both criteria were ranked by the introduced change in the sequence nucleotide composition; the mutations that changed the nucleotide composition the least were chosen for further analysis. Each mutated sequence was additionally analyzed by SwitchFinder to ensure that the Boltzmann distribution is heavily shifted towards the desired conformation.

‘Weaken a stem’ mutations: given two conflicting stems ((u 1 , v 1 ), (x 1 , y 1 )) and ((u 2 , v 2 ), (x 2 , y 2 )), one of the stems (for example, the second one) was changed in such a way that this stem would not be able to form base pairing, while the base pairing of the other stem (in this example, the first stem) would be preserved. To do so, the nucleotides in either of the intervals (u 2 , v 2 ) or (x 2 , y 2 ) were replaced with all possible sequences of equal length. The newly generated sequences were filtered by three predetermined criteria: (i) the first stem stays unchanged, (ii) the second stem cannot form more than a fraction of its original base pairs (default value 0.6), and (iii) the modified part of the sequence cannot form long paired stems with any region of the existing sequence (default threshold length 4). The sequences that passed all of the criteria were ranked by the introduced change in the sequence nucleotide composition: the mutations that changed the nucleotide composition the least were chosen for further analysis. Each mutated sequence was additionally analyzed using SwitchFinder to ensure that the Boltzmann distribution is heavily shifted towards the desired conformation.

Cell culture

All cells were cultured in a 37 °C 5% CO 2 humidified incubator. The HEK293 cells (purchased from ATCC, cat. no. CRL-3216) were cultured in DMEM high-glucose medium supplemented with 10% FBS, l -glutamine (4 mM), sodium pyruvate (1 mM), penicillin (100 units ml −1 ), streptomycin (100 μg ml −1 ) and amphotericin B (1 μg ml −1 ) (Gibco). The Jurkat cell line (purchased from ATCC, cat. no. TIB-152) was cultured in RPMI-1640 medium supplemented with 10% FBS, glucose (2 g l −1 ), l -glutamine (2 mM), 25 mM HEPES, penicillin (100 units ml −1 ), streptomycin (100 μg ml −1 ) and amphotericin B (1 μg ml −1 ) (Gibco). All cell lines were routinely screened for mycoplasma with a PCR-based assay.

Cryo-electron microscopy

Sample preparation and data collection.

A total of 3.5 µl target mRNA at an approximate concentration of 1.5 mg ml −1 was applied to gold, 300 mesh transmission electron microscopy grids with a holey carbon substrate of 1.2 µm and 1.3 µm spacing (Quantifoil). The grids were blotted with no. 4 filter papers (Whatman) and plunge frozen in liquid ethane using a Mark IV Vitrobot (Thermo Fisher), with blot times of 4–6 s, blot force of −2, at a temperature of 8 °C and 100% humidity. All grids were glow discharged in an easiGlo (Pelco) with rarefied air for 30 s at 15 mA, no more than 1 h prior to preparation. Duplicate wild-type and mutant RNA specimens were imaged under different conditions on several microscopes as per Data File S 8 ; all were equipped with K3 direct electron detector (DED) cameras (Gatan), and all data collection was performed using SerialEM 65 . Detailed data collection parameters are listed in Data File S 8 .

Image processing

Dose-weighted and motion-corrected sums were generated from raw DED movies during data collection using University of California, San Francisco (UCSF) MotionCor2 66 . Images from super-resolution datasets were downsampled to the physical pixel size before further processing. Estimation of the contrast transfer function (CTF) was performed in CTFFIND4 67 , followed by neural net-based particle picking in EMAN2 68 . Two-dimensional (2D) classification, ab initio three-dimensional (3D) classification, and gold-standard refinement were done in cryoSPARC 69 . CTFs were then re-estimated in cryoSPARC and particles repicked using low-resolution (20 Å) templates generated from chosen 3D classes. Extended datasets were pooled when appropriate, and particle processing was repeated through gold-standard refinement as before. All structure figures were created using UCSF ChimeraX (ref. 70 ). Further details are given in Data File S 7 and Extended Data Fig. 5 .

Reporter vector design and library cloning

First, mCherry-P2A-Puro fusion was cloned into the BTV arbovirus backbone (Addgene, cat. no. 84771). Then, the vector was digested with MluI-HF and PacI restriction enzymes (NEB), with the addition of Shrimp Alkaline Phosphatase (NEB). The digested vector was purified with the Zymo DNA Clean and Concentrator-5 kit.

DNA oligonucleotide libraries (one for functional screen and one for massively parallel mutagenesis analysis) consisting of 7,500 sequences in total were synthesized by Agilent. The second strand was synthesized using Klenow Fragment (3ʹ → 5ʹ exo-) (NEB). The double-stranded DNA library was digested with MluI-HF and PacI restriction enzymes (NEB) and run on a 6% TBE (Tris base, boric acid, EDTA) polyacrylamide gel. The band of the corresponding size was cut out and the gel was dissolved in the DNA extraction buffer (10 mM Tris, pH 8, 300 mM NaCl, 1 mM EDTA). The DNA was precipitated with isopropanol. The digested DNA library and the digested vector were ligated with T4 DNA ligase (NEB). The ligation reaction was precipitated with isopropanol and transformed into MegaX DH10B T1R electrocompetent cells (Thermo Fisher). The library was purified with ZymoPURE II Plasmid Maxiprep Kit (Zymo). The representation of individual sequences in the library was verified by sequencing the resulting library on an MiSeq instrument (Illumina).

Massively parallel reporter assay

The DNA library was co-transfected with pCMV-dR8.91 and pMD2.G plasmids using TransIT-Lenti (Mirus) into HEK293 cells, following the manufacturer’s protocol. Virus was collected 48 h after transfection and passed through a 0.45 µm filter. HEK293 cells were then transduced overnight with the filtered virus in the presence of 8 µg ml −1 polybrene (Millipore); the amount of virus used was optimized to ensure an infection rate of ~20%, as determined by flow cytometry The infected cells were selected with 2 µg ml −1 puromycin (Gibco). Cells were collected at 90–95% confluency for sorting and analysis on a BD FACSaria II sorter. The distribution of mCherry : GFP ratios was calculated. For sorting a library into subpopulations, we gated the population into eight bins each containing 12.5% of the total number of cells. A total of 1.2 million cells were collected for each bin to ensure sufficient representation of sequence in the population in two replicates each. For each subpopulation, we extracted genomic DNA and total RNA with the Quick-DNA/RNA Miniprep kit. gDNA was amplified by PCR with Phusion polymerase (NEB) using the primers CAAGCAGAAGACGGCATACGAGAT–i7– GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCACTGCTAGCTAGATGACTAAACGCG and AATGATACGGCGACCACCGAGATCTACAC–i5– ACACTCTTTCCCTACACGACGCTCTTCCGATCTGTGGTCTGGATCCACCGGTCC. Different i7 indexes were used for eight different bins, and different i5 indexes were used for the two replicates. RNA was reverse transcribed with Maxima H Minus Reverse Transcriptase (Thermo Fisher) using primer CTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNTGGTCTGGATCCACCGGTCCGG. The complementary DNA was amplified with Q5 polymerase (NEB) using primers CAAGCAGAAGACGGCATACGAGAT–i7–GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCCTGCTAGCTAGATGACTAAACGC and CAAGCAGAAGACGGCATACGAGAT–i5–GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTACCCGTCATTGGCTGTCCA. Different i7 indexes were used for eight different bins, and different i5 indexes were used for the two replicates. The amplified DNA libraries were size purified with the Select-a-Size DNA Clean and Concentrator MagBead Kit (Zymo). Deep sequencing was performed using the HiSeq4000 platform (Illumina) at the UCSF Center for Advanced Technologies.

The adapter sequences were removed using cutadapt 71 . For RNA libraries, the unique molecular identifier (UMI) was then removed from the reads and appended to read names using UMI tools 72 . The reads were matched to the fragments using the bwa mem command. The reads were counted using featureCounts 73 . The read counts were normalized using median of ratios normalization 74 . The one-way chi-squared test was used to estimate how different its distribution across the sorting bins is from the null hypothesis (that is uniform distribution). mRNA stability was estimated by comparing the RNA and DNA read counts with MPRAnalyze 75 .

Massively parallel mutagenesis analysis

Library design and measurement.

For each candidate switch, two alternative conformations were identified using SwitchFinder. Each conformation is defined by a stem structure: ((u1, v1), (x1, y1)) and ((u2, v2), (x2, y2)), representing two conflicting stems. The SwitchFinder mutation generation algorithm was used to design four mutations in the candidate switch sequence: A, ‘strengthen a stem’ mutation favoring conformation 1: the regions (u1, v1) and (x1, y1) are altered while preserving complementarity; B, ‘weaken a stem’ mutation favoring conformation 1: either the region (u2, v2) or (x2, y2) is modified, preserving the regions (u1, v1), (x1, y1); C, ‘strengthen a stem’ mutation favoring conformation 2: the regions (u2, v2), (x2, y2) are changed while maintaining complementarity; and D, ‘weaken a stem’ mutation favoring conformation 2: either the region (u1, v1) or (x1, y1) is altered, ensuring that the regions (u2, v2), (x2, y2) remain intact.

Subsequently, the mutated sequences for selecting candidate RNA switches, along with the reference sequence, were pooled into a single DNA oligonucleotide library. The impact of each sequence on reporter gene expression was evaluated in cells, as outlined in the Massively Parallel Reporter Assay section. Consequently, each candidate RNA switch in the library is represented by its reference sequence, two mutated sequences favoring conformation 1 (A and B), and two mutated sequences favoring conformation 2 (C and D).

Candidate RNA switch ranking

For each candidate RNA switch, its effect on reporter gene expression was assessed in cells, following the protocol described in the Massively Parallel Reporter Assay section. This resulted in 16 measurements, corresponding to normalized read counts in sorting bins 1 (lowest expression) to bin 8 (highest expression), across two replicates; these arrays of counts are referred to as ‘bin_counts’. Measurements were obtained for mutants A, B, C, D, and the reference sequence. Correlations between the effects of mutations designed to favor the same or opposite conformations were computed as follows: correlation_same_1 = Pearsonr(bin_counts(mutant A), bin_counts(mutant B)); correlation_same_2 = Pearsonr(bin_counts(mutant C), bin_counts(mutant D)); correlation_opposite_1 = Pearsonr(bin_counts(mutant A), bin_counts(mutant C)); and correlation_opposite_2 = Pearsonr(bin_counts(mutant A), bin_counts(mutant D)). The score of each candidate switch was then calculated as: score = mean(correlation_same_1, correlation_same_2) − mean(correlation_opposite_1, correlation_opposite_2). Candidate switches were ranked based on this score. Those with a score exceeding the mean + 1 s.d. were considered significant.

DMS-MaPseq was performed as described in ref. 54 . In brief, cells were incubated in culture with 1.5% DMS (Sigma) at room temperature for 7 min, the media was removed, and DMS was quenched with 30% BME (β-mercaptoethanol). Total RNA from DMS-treated cells and untreated cells was then isolated using Trizol (Invitrogen). RNA was reverse transcribed using TGIRT-III reverse transcriptase (InGex) and target-specific primers. PCR was then performed to amplify the desired sequences and to add Illumina-compatible adapters. The libraries were then sequenced on a HiSeq4000 instrument (Illumina).

Pear (v0.9.6) was used to merge the paired reads into a single combined read. The UMI was then removed from the reads and appended to read names using UMI tools (v1.0). The reads were then reverse complemented (fastx toolkit) and mapped to the amplicon sequences using bwa mem (v0.7). The resulting bam files were then sorted and deduplicated (umi_tools, with method flag set to unique). The alignments were then parsed for mutations using the CTK (CLIP Tool Kit) software. The mutation frequency at every position was then reported. The signal normalization was performed using boxplot normalization 76 . The top 10% of positions with the highest mutation rates were considered outliers 77 . The clustering of DMS-MaPseq signal was performed with DRACO 28 .

SHAPE chemical probing of RNAs

Chemical probing and mutate-and-map experiments were carried out as described previously 78 . In brief, 1.2 pmol RNA was denatured at 95 °C in 50 mM Na-HEPES, pH 8.0, for 3 min, and folded by cooling to room temperature over 20 min, and then adding MgCl 2 to a 10 mM concentration. RNA was aliquoted in 15 µl volumes into a 96-well plate and mixed with nuclease-free H 2 O (control), or chemically modified in the presence of 5 mM 1-methyl-7-nitroisatoic anhydride (1M7) 79 , for 10 min at room temperature. Chemical modification was stopped by adding 9.75 µl quench and purification mix (1.53 M NaCl, 1.5 µl washed oligo-dT beads, Ambion), 6.4 nM FAM-labeled, reverse-transcriptase primer (/56-FAM/AAAAAAAAAAAAAAAAAAAAGTTGTTCTTGTTGTTTCTTT), and 2.55 M Na-MES. RNA in each well was purified by bead immobilization on a magnetic rack and two washes with 100 µl 70% ethanol. RNA was then resuspended in 2.5 µl nuclease-free water prior to reverse transcription.

RNA was reverse transcribed from annealed fluorescent primer in a reaction containing 1× First Strand Buffer (Thermo Fisher), 5 mM dithiothreitol, 0.8 mM dNTP mix and 20 U SuperScript III Reverse Transcriptase (Thermo Fisher) at 48 °C for 30 min. RNA was hydrolyzed in the presence of 200 mM NaOH at 95 °C for 3 min, then placed on ice for 3 min and quenched with 1 volume 5 M NaCl, 1 volume 2 M HCl, and 1 volume 3 M sodium acetate. cDNA was purified on magnetic beads, then eluted by incubation for 20 min in 11 µl Formamide-ROX350 mix (1,000 µl Hi-Di Formamide (Thermo Fisher) and 8 µl ROX350 ladder (Thermo Fisher)). Samples were then transferred to a 96-well plate in ‘concentrated’ form (4 µl sample + 11 µl ROX mix) and ‘dilute’ form (1 µl sample + 14 µl ROX mix) for saturation correction in downstream analysis. Sample plates were sent to Elim Biopharmaceuticals for analysis by capillary electrophoresis.

Antisense oligonucleotide infection

ASOs were purchased from Integrated DNA Technologies; the Morpholino ASOs were purchased from Gene Tools LLC (see sequences in Data File S 9 ). A total of 95,000 HEK cells were seeded into the wells of a 24-well cell culture-treated plate in a total volume of 500 µl. At 24 h later, either 1 nmol Morpholino ASO together with 3 µl EndoPorter reagent (Gene Tools LLC), or 6 pmol other ASO were added to each well. LNCaP, MCF-7 and LS174T cells were infected with ASOs using Lonza SE Cell Line 4D-Nucleofector X Kit S (cat. no. V4XC-1032) according to the manufacturer’s protocol. At 48 h later, the mCherry and eGFP fluorescence was measured on a BD FACSCelesta Cell Analyzer, or RNA was isolated for RT-qPCR measurement with the Zymo QuickRNA Microprep isolation kit with in-column DNase treatment per the manufacturer’s protocol.

CRISPRi screen

Reporter screens were conducted using established flow cytometry screen protocols 80 (Horlbeck et al., 2016; Sidrauski et al., 2015). Jurkat cells with previously verified CRISPRi activity were used (Horlbeck et al., 2018). The CRISPRi-v2 (5 sgRNA/TSS, Addgene cat. no. 83969) sgRNA library was transduced into Jurkat cells at a multiplicity of infection of <0.3 (the percentage of blue fluorescent protein (BFP)-positive cells was ~30%). For the flow-based CRISPRi screen with the Jurkat cells, the sgRNA library virus was transfected at an average of 500-fold coverage after transduction (day 0). Puromycin (1 µg ml −1 ) selection for positively transduced cells was performed at 48 h (day 2) and 72 h (day 3) after transduction (day 3). On day 11, cells were collected in PBS and sorted with the BD FACSAria Fusion cell sorter. Cells were gated into the 25% of cells with the highest GFP : mCherry fluorescence intensity ratio, and the 25% of cells with the lowest ratio. The screens were performed with two conditions: cells with a reference RORC element–GFP reporter and a mutated 77-23 RORC element–GFP reporter. Screens were additionally performed in duplicate. After sorting, genomic DNA was collected (Macherey-Nagel Midi Prep kit) and amplified using NEB Next Ultra II Q5 master mix and primers containing TruSeq Indexes for next-generation sequencing. Sample libraries were prepared and sequenced on a HiSeq 4000. Guides were then quantified with the published ScreenProcessing ( https://github.com/mhorlbeck/ScreenProcessing ) method and phenotypes generated with an in-house processing pipeline, iAnalyzer ( https://github.com/goodarzilab/iAnalyzer ). In brief, iAnalyzer relies on fitting a generalized linear model to each gene. Coefficients from this generalized linear model were z-score normalized to the negative control guides and finally the largest coefficients were analyzed as potential hits. For the comparison of gene phenotypes between the two cell lines, the DESeq2 ratio of ratios test was used 57 .

CRISPRi-mediated and small interfering RNA-mediated gene knockdown

Jurkat cells expressing the dCas9–KRAB fusion protein were constructed by lentiviral delivery of pMH0006 (Addgene, cat. no. 135448) and FACS isolation of BFP-positive cells.

Guide RNA sequences for CRISPRi-mediated gene knockdown were cloned into pCRISPRia-v2 (Addgene, cat. no. 84832) via BstXI-BlpI sites. After transduction with sgRNA lentivirus, Jurkat cells were selected with 2 µg ml −1 puromycin (Gibco). The fluorescence of eGFP and of mCherry was measured on a BD FACSCelesta Cell Analyzer.

For UPF1 siRNA-mediated knockdown, the TriFECTa DsiRNA Kit from Integrated DNA Technologies (cat. no. hs.Ri.UPF1.13) was used. LNCaP, MCF-7 and LS174T cells were infected with siRNAs using the Lonza SE Cell Line 4D-Nucleofector X Kit S (cat. no. V4XC-1032) according to the manufacturer’s protocol. At 48 h later, RNA was collected using the Zymo QuickRNA Microprep isolation kit with in-column DNase treatment as per the manufacturer’s protocol.

Reporter cell line generation

Mutated or reference sequences of RORC 3ʹUTR were cloned into the dual GFP–mCherry reporter using the MluI-HF and PacI restriction enzymes (NEB) as described above. The reporters were lentivirally delivered to HEK293 and Jurkat cells and analyzed with flow cytometry as described above.

Drug treatment

Jurkat cells were seeded at a density of 0.25 × 10 7  cells per ml. Either the proteasome inhibitors (Carfilzonib or Bortezomib, Cayman Chemical) or negative control (dimethyl sulfoxide, DMSO) were added at the given concentration. After 24 h of incubation, the fluorescence of eGFP and of mCherry was measured on a BD FACSCelesta Cell Analyzer.

MCF-7 cells were treated either with 50 µM NMDI14 (TargetMol), or with DMSO, for 24 h. Afterwards, cells were treated with DMS as describe above and the RNA was collected as described above.

mRNA stability measurements

Jurkat cells were treated with 10 μg ml −1 α-amanitin (Sigma-Aldrich, cat. no. A2263) for 8–9 h prior to total RNA extractions. Total RNA was isolated using the Zymo QuickRNA Microprep isolation kit with in-column DNase treatment as per the manufacturer’s protocol. mRNA levels were measured with RT-PCR, using 18S ribosomal RNA (transcribed by RNA Pol I) as the control.

T-cell isolation, transduction and Th17 cell differentiation

Th17 cells were derived as described previously 34 . Plates were coated with 2 µg ml −1 anti-human CD3 (UCSF monoclonal antibody core, clone: OKT-3) and 4 µg ml −1 anti-human CD28 (UCSF monoclonal antibody core, clone: 9.3) in PBS with calcium and magnesium for at least 2 h at 37 °C or overnight at 4 °C with the plate wrapped in parafilm. Human CD4+ T cells were isolated from human peripheral blood using the EasySep human CD4+ T cell isolation kit (17952; STEMCELL) and stimulated in ImmunoCult-XF T-cell expansion medium (10981; STEMCELL) supplemented with 10 mM HEPES, 2 mM l -glutamine, 100 µM 2-MOE, 1 mM sodium pyruvate and 10 ng ml −1 transforming growth factor-β. At 24 h after T-cell isolation and initial stimulation on a 96-well plate, 7 µl lentivirus was added to each sample. After 24 h, the media was removed from each sample without disturbing the cells and replaced with 200 µl fresh media. After 48 h, cells were stimulated with 1.2 µM ionomycin, 25 nM propidium monoazide and 6 µg ml −1 brefeldin-A, resuspended by pipetting, incubated for 4 h at 37 °C, and collected for analysis. Half of each sample was stained for CD4, FoxP3, interleukin (IL)-13, IL-17A, interferon (IFN)-γ and analyzed on a BD LSRFortessa cell analyzer (see below). The other half of the sample was not stained and was analyzed for the expression of eGFP and mCherry on a BD LSRFortessa cell analyzer.

Cultured human T cells were collected, washed and stained with antibodies against cell surface proteins and transcription factors. Cells were fixed and permeabilized with the eBioscience Foxp3/Transcription Factor Staining Buffer Set or the Transcription Factor Buffer Set (BD Biosciences). Extracellular nonspecific binding was blocked with the anti-CD16/CD32 antibody (clone 2.4G2; UCSF Monoclonal Antibody Core). Intracellular nonspecific binding was blocked with anti-CD16/CD32 antibodies) and 2% normal rat serum. Dead cells were stained with Fixable Viability Dye eFluor 780 (eBioscience) or Zombie Violet Fixable Viability Kit (BioLegend). Cells were stained with the following fluorochrome-conjugated anti-human antibodies: anti-CD4 (Invitrogen, cat. no. 17-0049-42), anti-FOXP3 (eBioscience, cat. no. 25-4777-61), anti-IL-13 (eBioscience, cat. no. 11-7136-41), anti-IL-17A (eBioscience, cat. no. 12-7179-42) and anti-IFNγ (BioLegend, cat. no. 502520). All of the antibodies were used at 1:200 dilution. Samples were analyzed on a BD LSRFortessa cell analyzer. Data were analyzed using FlowJo 10.7.1 and BD FACSDiva v9 software.

Analysis of capillary electrophoresis data with HiTRACE

Capillary electrophoresis runs from chemical probing and mutate-and-map experiments were analyzed with the HiTRACE MATLAB package 81 . Lanes were aligned, bands fitted to Gaussian peaks, background subtracted using the no-modification lane, corrected for signal attenuation, and normalized to the internal hairpin control. The end result of these steps is a numerical array of ‘reactivity’ values for each RNA nucleotide that can be used as weights in structure prediction.

UPF1 targeted CLIP-seq

Jurkat cells expressing RORC reporters (reference, 77-GA mutant variant or 116-CCCTAAG mutant variant) were collected and crosslinked by ultraviolet radiation (400 mJ cm −2 ). Cells were then lysed with low salt wash buffer (1x PBS, 0.1% SDS, 0.5% sodium deoxycholate, 0.5% IGEPAL). To probe preferential UPF1 binding towards different reporters, lysates from 77-GA mutant cells were mixed with lysates from either wild-type or 116-CCCTAAG mutant cells at a 1:1 ratio prior to immunoprecipitation. Samples were then treated with a high dose (1:3,000 RNase A and 1:100 RNase I) and a low dose (1:15,000 RNase A and 1:500 RNase I) of RNase A and RNase I separately and combined after treatment. To immunoprecipitate UPF1 –RNA complex, a UPF1 antibody (Thermo, cat. no. A301-902A) was incubated with Protein A/G beads (Pierce) first and then incubated with the mixed cell lysates for 2 h at 4 °C. Immunoprecipitated RNA fragments were then dephosphorylated (T4 PNK, NEB), polyadenylated and end-labeled with 3ʹ-azido-3ʹ-dUTP and IRDye 800CW DBCO Infrared Dye (LI-COR) on beads. SDS–PAGE was then performed to separate protein–RNA complexes, and RNA fragments were collected from nitrocellulose membrane by proteinase K digestion. cDNA was then synthesized using Takara smarter small RNA sequencing kit reagents with a custom UMI-oligoDT primer (CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTTTTTTTTTT). The RORC reporter locus was then amplified with a custom primer (ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGGGGTGATCCAAATACCACC) and sequencing libraries were then prepared with SeqAmp DNA Polymerase (Takara). Libraries were then sequenced on an illumina Hiseq 4000 sequencer.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Sequencing data have been deposited in the Gene Expression Omnibus (GEO accession GSE266070 ). Cryo-EM density maps have been deposited in EMDB, accession numbers EMD- 42275 (WT Class A), EMD- 42276 (WT Class B), EMD- 42277 (WT Class C), EMD- 42400 (77-GA Class C), EMD- 42401 (77-GA Class A), EMD- 42403 (117-AC Class C) and EMD- 42404 (117-AC Class B). Rfam database 14.10 ( https://rfam.org/ ) was used in the study.

Code availability

SwitchFinder source code is available at https://github.com/goodarzilab/SwitchFinder .

Gilbert, W. Origin of life: the RNA world. Nature 319 , 618 (1986).

Article   Google Scholar  

Saad, N. Y. A ribonucleopeptide world at the origin of life. J. Syst. Evol. 56 , 1–13 (2018).

Vitreschak, A. G., Rodionov, D. A., Mironov, A. A. & Gelfand, M. S. Riboswitches: the oldest mechanism for the regulation of gene expression? Trends Genet. 20 , 44–50 (2004).

Article   CAS   PubMed   Google Scholar  

Sun, E. I. et al. Comparative genomics of metabolic capacities of regulons controlled by cis-regulatory RNA motifs in bacteria. BMC Genomics 14 , 597 (2013).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Serganov, A. & Nudler, E. A decade of riboswitches. Cell 152 , 17–24 (2013).

Wachter, A. Riboswitch-mediated control of gene expression in eukaryotes. RNA Biol. 7 , 67–76 (2010).

Liu, N. et al. N(6)-methyladenosine-dependent RNA structural switches regulate RNA–protein interactions. Nature 518 , 560–564 (2015).

Ray, P. S. et al. A stress-responsive RNA switch regulates VEGFA expression. Nature 457 , 915–919 (2009).

Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49 , D192–D200 (2021).

Barsacchi, M., Novoa, E. M., Kellis, M. & Bechini, A. SwiSpot: modeling riboswitches by spotting out switching sequences. Bioinformatics 32 , 3252–3259 (2016).

Manzourolajdad, A. & Arnold, J. Secondary structural entropy in RNA switch (Riboswitch) identification. BMC Bioinformatics 16 , 133 (2015).

Article   PubMed   PubMed Central   Google Scholar  

Wheeler, T. J. & Eddy, S. R. nhmmer: DNA homology search with profile HMMs. Bioinformatics 29 , 2487–2489 (2013).

Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29 , 2933–2935 (2013).

Bengert, P. & Dandekar, T. Riboswitch finder: a tool for identification of riboswitch RNAs. Nucleic Acids Res. 32 , W154–W159 (2004).

Abreu-Goodger, C. & Merino, E. RibEx: a web server for locating riboswitches and other conserved bacterial regulatory elements. Nucleic Acids Res. 33 , W690–W692 (2005).

Chang, T.-H. et al. Computational identification of riboswitches based on RNA conserved functional sequences and conformations. RNA 15 , 1426–1430 (2009).

Mukherjee, S. & Sengupta, S. Riboswitch Scanner: an efficient pHMM-based web-server to detect riboswitches in genomic sequences. Bioinformatics 32 , 776–778 (2016).

Ding, Y. & Lawrence, C. E. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31 , 7280–7301 (2003).

Wang, X. et al. Systematic comparison and rational design of theophylline riboswitches for effective gene repression. Microbiol. Spectr. 11 , e0275222 (2023).

Article   PubMed   Google Scholar  

Vezeau, G. E., Gadila, L. R. & Salis, H. M. Automated design of protein-binding riboswitches for sensing human biomarkers in a cell-free expression system. Nat. Commun. 14 , 2416 (2023).

Mortimer, S. A., Kidwell, M. A. & Doudna, J. A. Insights into RNA structure and function from genome-wide studies. Nat. Rev. Genet. 15 , 469–479 (2014).

Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505 , 701–705 (2014).

Leamy, K. A., Assmann, S. M., Mathews, D. H. & Bevilacqua, P. C. Bridging the gap between in vitro and in vivo RNA folding. Q. Rev. Biophys. 49 , e10 (2016).

Sun, L. et al. RNA structure maps across mammalian cellular compartments. Nat. Struct. Mol. Biol. 26 , 322–330 (2019).

Beaudoin, J.-D. et al. Analyses of mRNA structure dynamics identify embryonic gene regulatory programs. Nat. Struct. Mol. Biol. 25 , 677–686 (2018).

Zubradt, M. et al. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat. Methods 14 , 75–82 (2017).

Mortimer, S. A., Trapnell, C., Aviran, S., Pachter, L. & Lucks, J. B. SHAPE-Seq: high-throughput RNA structure analysis. Curr. Protoc. Chem. Biol. 4 , 275–297 (2012).

Morandi, E. et al. Genome-scale deconvolution of RNA structure ensembles. Nat. Methods 18 , 249–252 (2021).

Tomezsko, P. J. et al. Determination of RNA structural diversity and its role in HIV-1 RNA splicing. Nature 582 , 438–442 (2020).

Oikonomou, P., Goodarzi, H. & Tavazoie, S. Systematic identification of regulatory elements in conserved 3ʹ UTRs of human transcripts. Cell Rep. 7 , 281–292 (2014).

Wilkinson, K. A., Merino, E. J. & Weeks, K. M. Selective 2ʹ-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat. Protoc. 1 , 1610–1616 (2006).

Eberl, G. RORγt, a multitask nuclear receptor at mucosal surfaces. Mucosal Immunol. 10 , 27–34 (2017).

Zhong, C. & Zhu, J. Small-molecule RORγt antagonists: one stone kills two birds. Trends Immunol. 38 , 229–231 (2017).

Montoya, M. M. & Ansel, K. M. Small RNA transfection in primary human Th17 cells by next generation electroporation. J. Vis. Exp. (122), 55546

Bose, R., Saleem, I. & Mustoe, A. M. Causes, functions, and therapeutic possibilities of RNA secondary structure ensembles and alternative states. Cell Chem. Biol. 31 , 17–35 (2024).

de Boer, C. G., Ray, J. P., Hacohen, N. & Regev, A. MAUDE: inferring expression changes in sorting-based CRISPR screens. Genome Biol. 21 , 134 (2020).

López-Perrote, A. et al. Human nonsense-mediated mRNA decay factor UPF2 interacts directly with eRF3 and the SURF complex. Nucleic Acids Res. 44 , 1909–1923 (2016).

Yi, Z. et al. Mammalian UPF3A and UPF3B can activate nonsense-mediated mRNA decay independently of their exon junction complex binding. EMBO J. 41 , e109202 (2022).

Kurosaki, T., Popp, M. W. & Maquat, L. E. Quality and quantity control of gene expression by nonsense-mediated mRNA decay. Nat. Rev. Mol. Cell Biol. 20 , 406–420 (2019).

Fischer, J. W., Busa, V. F., Shao, Y. & Leung, A. K. L. Structure-mediated RNA decay by UPF1 and G3BP1. Mol. Cell 78 , 70–84 (2020).

Martin, L. et al. Identification and characterization of small molecules that inhibit nonsense-mediated RNA decay and suppress nonsense p53 mutations. Cancer Res. 74 , 3104–3113 (2014).

Kuroha, K., Tatematsu, T. & Inada, T. Upf1 stimulates degradation of the product derived from aberrant messenger RNA containing a specific nonsense mutation by the proteasome. EMBO Rep. 10 , 1265–1271 (2009).

Winkler, W., Nahvi, A. & Breaker, R. R. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419 , 952–956 (2002).

Mironov, A. S. et al. Sensing small molecules by nascent RNA: a mechanism to control transcription in bacteria. Cell 111 , 747–756 (2002).

Rodionov, D. A. Comparative genomic reconstruction of transcriptional regulatory networks in bacteria. Chem. Rev. 107 , 3467–3497 (2007).

Vitreschak, A. G., Rodionov, D. A., Mironov, A. A. & Gelfand, M. S. Regulation of the vitamin B12 metabolism and transport in bacteria by a conserved RNA structural element. RNA 9 , 1084–1097 (2003).

Backofen, R., Gorodkin, J., Hofacker, I. L. & Stadler, P. F. Comparative RNA genomics. Methods Mol. Biol. 2802 , 347–393 (2024).

Leypold, N. A. & Speicher, M. R. Evolutionary conservation in noncoding genomic regions. Trends Genet. 37 , 903–918 (2021).

Ureta-Vidal, A., Ettwiller, L. & Birney, E. Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat. Rev. Genet. 4 , 251–262 (2003).

Clote, P. Computational prediction of riboswitches. Methods Enzymol. 553 , 287–312 (2015).

Antunes, D., Jorge, N. A. N., Caffarena, E. R. & Passetti, F. Using RNA sequence and structure for the prediction of riboswitch aptamer: a comprehensive review of available software and tools. Front. Genet. 8 , 231 (2017).

Manzourolajdad, A. & Spouge, J. L. Structural prediction of RNA switches using conditional base-pair probabilities. PLoS One 14 , e0217625 (2019).

Kappel, K. et al. Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nat. Methods 17 , 699–707 (2020).

Fish, L. et al. A prometastatic splicing program regulated by SNRPA1 interactions with structured RNA elements. Science 372 , eabc7531 (2021).

Kelly, M. L. et al. RNA conformational propensities determine cellular activity. Preprint at https://doi.org/10.1101/2022.12.05.519207 (2022).

Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11 , 129 (2010).

DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling). https://support.bioconductor.org/p/61509/

Navickas, A. et al. An mRNA processing pathway suppresses metastasis by governing translational control from the nucleus. Nat. Cell Biol. 25 , 892–903 (2023).

Lin, L., McKerrow, W. H., Richards, B., Phonsom, C. & Lawrence, C. E. Characterization and visualization of RNA secondary structure Boltzmann ensemble via information theory. BMC Bioinformatics 19 , 82 (2018).

Cover, T. M. & Thomas, J. A. Elements of Information Theory (John Wiley & Sons, 2006).

Lu, Z. J., Gloor, J. W. & Mathews, D. H. Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA 15 , 1805–1813 (2009).

Dotu, I., Lorenz, W. A., Van Hentenryck, P. & Clote, P. Computing folding pathways between RNA secondary structures. Nucleic Acids Res. 38 , 1711–1722 (2010).

wassermanlab/BiasAway. altschulEriksonDinuclShuffle.py. GitHub https://github.com/wassermanlab/BiasAway/blob/master/altschulEriksonDinuclShuffle.py (2013).

Karolchik, D., Hinrichs, A. S. & Kent, W. J. The UCSC Genome Browser. Curr. Protoc. Bioinformatics Chapter 1 , 1.4.1–1.4.33 (2012).

PubMed   Google Scholar  

Mastronarde, D. N. SerialEM: a program for automated tilt series acquisition on Tecnai microscopes using prediction of specimen position. Microsc. Microanal. 9 , 1182–1183 (2003).

Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat. Methods 14 , 331–332 (2017).

Ctffind4. https://grigoriefflab.umassmed.edu/ctffind4

Tang, G. et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol. 157 , 38–46 (2007).

Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14 , 290–296 (2017).

Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27 , 14–25 (2018).

Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17 , 10–12 (2011).

Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res. 27 , 491–499 (2017).

Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30 , 923–930 (2014).

Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11 , R106 (2010).

Ashuach, T. et al. MPRAnalyze: statistical framework for massively parallel reporter assays. Genome Biol. 20 , 183 (2019).

Low, J. T. & Weeks, K. M. SHAPE-directed RNA secondary structure prediction. Methods 52 , 150–158 (2010).

Hajdin, C. E. et al. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc. Natl Acad. Sci. USA 110 , 5498–5503 (2013).

Palka, C., Forino, N. M., Hentschel, J., Das, R. & Stone, M. D. Folding heterogeneity in the essential human telomerase RNA three-way junction. RNA 26 , 1787–1800 (2020).

Turner, R., Shefer, K. & Ares, M. Jr. Safer one-pot synthesis of the ‘SHAPE’ reagent 1-methyl-7-nitroisatoic anhydride (1m7). RNA 19 , 1857–1863 (2013).

Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159 , 647–661 (2014).

Yoon, S. et al. HiTRACE: high-throughput robust analysis for capillary electrophoresis. Bioinformatics 27 , 1798–1805 (2011).

Download references

Acknowledgements

The authors thank C. Mathy, A. Natale, M. Imakaev, Y. Gomez, M. Zimanyi, A. Smith and A. Pawluk for helpful discussions. H.G. is an Era of Hope Scholar (W81XWH-2210121) and supported by R01CA240984 and R01CA244634. This work was partly supported by National Institutes of Health (NIH) grants 1R35GM140847 (Y.C.). L.A.G. is funded by an NIH New Innovator Award (DP2 CA239597), a Pew-Stewart Scholars for Cancer Research award and the Goldberg-Benioff Endowed Professorship in Prostate Cancer Translational Biology. Cryo-EM equipment at UCSF is partially supported by NIH grants S10OD020054, S10OD021741 and S10OD026881. Y.C. is an Investigator at Howard Hughes Medical Institute. Sequencing was performed at the UCSF CAT, supported by UCSF PBBR, RRP IMIA and NIH 1S10OD028511-01 grants. A.N. was supported by the DoD PRCRP Horizon Award W81XWH-19-1-0594. L.F. was supported by an NIH training grant T32CA108462-15.

Author information

Daniel Asarnow

Present address: Department of Biochemistry, University of Washington, Seattle, WA, USA

Albertas Navickas

Present address: Institut Curie, UMR3348 CNRS, U1278 Inserm, Orsay, France

Authors and Affiliations

Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA

Matvei Khoroshkin, Daniel Asarnow, Shaopu Zhou, Albertas Navickas, Jackson Goudreau, Johnny Yu, Lisa Fish, Ashir Borah, Kian Yousefi, Christopher Carpenter, Yifan Cheng & Hani Goodarzi

Department of Urology, University of California, San Francisco, San Francisco, CA, USA

Matvei Khoroshkin, Shaopu Zhou, Albertas Navickas, Aidan Winters, Jackson Goudreau, Johnny Yu, Lisa Fish, Ashir Borah, Kian Yousefi, Christopher Carpenter, Luke A. Gilbert & Hani Goodarzi

Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA

Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA

Matvei Khoroshkin, Shaopu Zhou, Albertas Navickas, Aidan Winters, Jackson Goudreau, Johnny Yu, Lisa Fish, Ashir Borah, Kian Yousefi, Christopher Carpenter & Hani Goodarzi

Department of Biological and Medical Informatics, University of California, San Francisco, San Francisco, CA, USA

Aidan Winters

Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA

Aidan Winters & Luke A. Gilbert

Arc Institute, Palo Alto, CA, USA

Aidan Winters, Luke A. Gilbert & Hani Goodarzi

Sandler Asthma Basic Research Center, University of California, San Francisco, San Francisco, CA, USA

Simon K. Zhou & K. Mark Ansel

Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA, USA

Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA

Christina Palka

Howard Hughes Medical Institute, University of California San Francisco, San Francisco, CA, USA

Yifan Cheng

You can also search for this author in PubMed   Google Scholar

Contributions

M.K. and H.G. designed the study. M.K. developed SwitchFinder. A.B. and C.C. designed a docker environment for SwitchFinder. M.K. and A.N. performed the massively parallel reporter assays. M.K., S.Z. and L.F. performed the DMS-MaPseq experiments. M.K. and C.P. performed the SHAPE experiments. D.A. and Y.C. performed the cryo-EM experiments. M.K. performed the mutagenesis experiments. M.K., K.Y. and J.G. performed the antisense oligonucleotide transfection experiments. M.K., S.K.Z. and K.M.A. performed the Th17 differentiation experiments. M.K, A.W. and L.A.G. performed the CRISPRi screens. M.K. performed the CRISPRi knockdown experiments. M.K. and J.Y. performed the proteasome inhibition experiments. S.Z. performed the CLIP-seq experiments. M.K. and H.G. wrote the manuscript with input from all of the authors.

Corresponding author

Correspondence to Hani Goodarzi .

Ethics declarations

Competing interests.

M.K. and H.G. are inventors on a provisional patent related to this study. L.A.G. has filed patents on CRISPR functional genomics. The other authors have no competing interests.

Peer review

Peer review information.

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1 switchfinder identifies saddles in rna folding energy landscape..

a Example of SwitchFinder locating the thiamine pyrophosphate RNA switches within the mRNA sequence. Top: arc representation of the RNA base pairs that change between the two conformations of the E.coli TPP RNA switch, as in (Barsacchi et al. 10 ). The two conformations are shown in red and blue, respectively. Bottom: the two conformations of the RNA switch as predicted by SwitchFinder. Middle: SwitchFinder score reflecting the likelihood of a given nucleotide to be involved in two mutually exclusive base pairings. b Scheme of SwitchFinder model. SwitchFinder analyzes RNA folding energy landscape of a given RNA sequence and assigns higher score to the landscapes that demonstrate riboswitch-like features. c The set-up for evaluating the ability of a model to find RNA switches from novel families. At the classifier training step, riboswitches from one of the Rfam families get separated into the ‘test set’, while the model gets trained on the riboswitches from other Rfam families. The test set then is used to evaluate the model performance.

Extended Data Fig. 2 Overview of high-throughput screening approaches for improved RNA switch predictions.

a Overview of DMS-MaPseq workflow. Mammalian cells are treated with DMS. DMS-modified nucleotides cause mutations when cDNA is synthesized from RNA templates. The cDNA libraries are sequenced, the DMS-caused mutations are counted, providing the Watson-Crick face accessibility estimates for each A- or C- nucleotide. b Cumulative mutation frequency in DMS-treated candidate RNA switches, separated by nucleotide. c Cumulative mutation frequency in nontreated candidate RNA switches, separated by nucleotide. d Overview of the library generation workflow for Massively Parallel Reporter Assay (MPRA). Sequences of candidate RNA switches are synthesized as DNA oligonucleotides and cloned into a reporter vector into 3ʹUTR region of a eGFP cDNA. The plasmid library is packaged into lentiviral particles, and used for infecting mammalian cells. The infection is performed at low MOI (infection rate) to ensure that most cells get only a single plasmid copy. e Overview of the MPRA workflow. A population of mammalian cells is separated into bins based on GFP/mCherry fluorescence ratio. In the schematic, cells are colored according to the sequence they carry in the 3ʹUTR of the GFP reporter. f Cumulative density plot of dysregulation values, comparing the candidate RNA switches predicted in first and second (DMS-MaPseq informed) iterations of SwitchFinder. Dysregulation values are estimated using chi-square test for every individual candidate RNA switch across 8 expression bins. Median difference (∆M) and P value (calculated using Mann-Whitney U-test) are shown. g Correlations of read counts of gDNA libraries between the biological replicates of massively parallel mutagenesis analysis. h Correlations of read counts of RNA libraries between the biological replicates of massively parallel mutagenesis analysis.

Extended Data Fig. 3 In vitro SHAPE reactivity of the RORC RNA switch sequence in vitro.

a SHAPE reactivity profiles for the reference sequence and for the mutation–rescue pair of sequences (blue - ‘77-GA’, red - ‘63-TC,77-GA’). Shown is the average for 3 replicates with the respective error bars (SD). The SHAPE reactivity changes in the nonmutated regions are highlighted in bold arrows. b Barplots of cumulative SHAPE reactivity within the switching regions for the reference sequence (in gray) and for the mutation–rescue pair of sequences (blue - ‘77-GA’, red - ‘63-TC,77-GA). N replicates = 3. c Scatter plot showing the reproducibility of the DMS signal between two replicates. Each dot represents a single nucleotide. Normalized DMS signal is shown on both axes. Correlation and P value is determined with Pearson correlation coefficient (P = 1.59-42). d Scatter plots showing the reproducibility of the DRACO clusters between replicates (N = 2). Each replicate’s reads were clustered with DRACO, the DMS reactivity was calculated for each cluster; the clusters were subsequently matched between replicates. Shown are DMS reactivities for a given cluster in a given replicate; each dot represents a single nucleotide. Correlation and P value is determined with Pearson correlation coefficient. P  values left to right: 2.60e-23,3.62e-07,0.18,0.73. e DMS reactivities of the two clusters identified by the DRACO unsupervised deconvolution algorithm (Morandi et al. 28 ). The algorithm was run on two replicates independently, and identified the same clusters in both of them. The ratios of the clusters reported by DRACO are 22% to 78% in replicate 1 and 32% to 68% in replicate 2. The ratio shown is an average between the two replicates. The switching regions are shown in color. f The effect of sequence mutations in the ‘Box 2’ and ‘Box 3’ regions of RORC element on their reactivity, as measured by DMS-MaPseq in a reporter cell line. P  values were determined using the two-sided independent T-test. g Correlation of relative proportions of the two conformations between the reporter context and the endogenous RORC mRNA. Linear regression is shown with a line. The relative conformations’ proportion is defined as the ratio of reactivities of Box 2:Box 3.

Extended Data Fig. 4 Qualitative modeling of cryo-EM data.

( a–c ) Source cryo-EM images for the example particles shown in Fig. 4a , with phase-flipping to correct contrast and CTF delocalization. The WT image (A) evinces a greater diversity of particles, while 77-GA (B) appears to contain primarily elongated particles and those of 117-AC (C) seem more compact. The data collection statistic is available in Data file S 7 . ( d-f ) Cryo-EM 3D classes A, B, and C of the WT RORC RNA overlaid with stereotypical RNA tertiary structures from the PDB including dsRNA B-helix and RNA hairpin. Features representing the major groove and a hairpin are visible in regions of the maps. ( g, h ) Pairs of high-scoring models created by DRRAFTER for WT 3D classes B and C with density overlaid. The pre-positioned, idealized RNA structures used as initial models are indicated by a bracket. Although the individual models are of low-confidence, they demonstrate that the class densities likely represent all or the majority of the RNA molecule.

Extended Data Fig. 5 Cryo-EM image processing and validation.

( a-c ) Representative micrographs and 2D class averages for RORC RNA switch WT sequence (A), 77-GA (B) and 117-AC (C). The data collection statistic is available in Data file S 7 . ( d ) Schematic cryo-EM image processing pipelines for WT RORC RNA. During template picking, templates and micrographs were low-pass filtered to 20 Å. ( e, f ) Schematic cryo-EM image processing pipelines for 77-GA (E), and 117-AC (F) mutants. During template picking, templates and micrographs were low-pass filtered to 20 Å. ( g ) Gold-standard half-map refinement volume, FSC curves, and orientation distribution plot for 3D classes from WT RNA sample. ( h ) Gold-standard half-map refinement volume, FSC curves, and orientation distribution plot for 3D classes from 77-GA sample. ( i ) Gold-standard half-map refinement volume, FSC curves, and orientation distribution plot for 3D classes from 117-AC sample.

Extended Data Fig. 6 Differentiation of Th17 cells from primary human CD4+ cells.

Representative fluorescence-activated cell sorting plots of human primary Th17 cells, infected with RORC RNA switch 3ʹUTR reporter. On the day 5 of differentiation, each sample was split in half; one half was analyzed for mCherry and GFP expression (shown in Fig. 5c ), the other half was stained for the expression of CD4, FoxP3, IL-13, IL-17A, IFN-gamma. The cells expressing a given marker are highlighted with a frame and a fraction of the parental cellular population is given. Each sample was analyzed in 4 replicates; a single representative replicate is displayed. CD4 is a marker for T-helper cells, including Th17. FoxP3 is typically associated with regulatory T cells, contrasting the pro-inflammatory role of Th17 cells. IL-13 and IL-17A are cytokines indicative of Th2 and Th17 cell activity, respectively, with IL-17A being a key marker for Th17 cell identity. IFN-gamma is a signature cytokine of Th1 cells.

Extended Data Fig. 7 CRISPRi screen highlights the pathways acting downstream of the RORC RNA switch.

a Overview of the flow cytometry-based CRISPRi screen workflow. b Gene set enrichment analysis of the data depicted in Fig. 6a (left) and Fig. 6b (right). The genes were distributed into equally populated bins based on their comparative abundance between high expression and low expression quartiles (left), or based on their comparative phenotype in the CRISPRi screens performed in WT or 77-GA mutant backgrounds (right). Then the enrichment of a given gene set was calculated in each bin using iPAGE, a mutual information-based algorithm (Goodarzi et al. 2009) . c Experiment design table. d The effect of knockdown of SURF and EJC complex member proteins on the expression change upon the conformation equilibrium shift. The individual genes were knocked down using the CRISPRi system in both WT and 77-GA mutant cell lines, then the change of reporter gene expression was measured by flow cytometry (N replicates = 2). The bar plots demonstrate the expression ratios of WT to 77-GA mutation cell lines. e The bar plots demonstrate the fractions of reads carrying the Box 2 (77-GA) mutant sequence or Box 3 (116-CCCTAAG) mutant sequence in UPF1 cross-linking and immunoprecipitation (CLIP) library. Box 2 mutant favors conformation 1, Box 3 mutant favors conformation 2. Left: input RNA libraries, extracted from the Box 3 and Box 2 mutant-expressing Jurkat cells, mixed at 1:1 ratio. Right: libraries after anti- UPF1 immunoprecipitation. P value was calculated using Translation Efficiency Ratio test as in (Navickas et al. 58 ). N replicates = 2. f Density plots showing the correlation of sgRNA counts between the replicates of the CRISPRi screens performed in the WT (left) and 77-GA mutant (right) backgrounds. g Density plots showing the correlation of gene counts between the replicates of the CRISPRi screens performed in the WT (left) and 77-GA mutant (right) backgrounds. The counts of all the sgRNAs targeting a given gene are pooled and reported as a single number (N = 5 sgRNAs per gene). h Scatter plots showing the correlation of sgRNA phenotypes between the replicates of the CRISPRi screens performed in the WT (left) and 77-GA mutant (right) backgrounds. Logarithmic fold changes between the sgRNA abundance ‘high’ and ‘low’ expression bins are shown on both axes. Nontargeting sgRNAs are shown in orange; all the other sgRNAs are shown in blue. The correlation values are reported separately for nontargeting and targeting sgRNAs. i Density plots showing the correlation of gene phenotypes between the replicates of the CRISPRi screens performed in the WT (left) and 77-GA mutant (right) backgrounds. Logarithmic fold changes between the abundance of sgRNAs targeting a given gene in ‘high’ and ‘low’ expression bins are shown on both axes.

Supplementary information

Supplementary information.

Supplementary Protocol

Reporting Summary

Peer review file, supplementary data.

Data file S1 (Microsoft Excel format): AUCs for SwitchFinder prediction of RNA switches from common riboswitch Rfam families. Data file S2 (Microsoft Excel format): SwitchFinder predictions for the 3,750 RNA fragments selected for further in vivo screening. Data file S3 (Microsoft Excel format): mRNA and gDNA read counts in the sorted bins of the functional screen. Data file S4 (Microsoft Excel format): DMS-MaPseq reactivity profiles and the second iteration of SwitchFinder predictions for 1,454 high-confidence RNA switches. Data file S5 (Microsoft Excel format): mRNA and gDNA read counts in the sorted bins for the massively parallel mutagenesis analysis. Data file S6 (Microsoft Excel format): List of RORC mutant sequences referred to in the paper. Data file S7 (Microsoft Excel format): Cryo-EM data collection, refinement and validation statistics. Data file S8 (Microsoft Excel format): Cryo-EM data collection parameters. Data file S9 (Microsoft Excel format): Antisense oligonucleotides (ASO) used for RORC structural ensemble perturbation. Data file S10 (Microsoft Excel format): gDNA read counts for the CRISPRi screens.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Khoroshkin, M., Asarnow, D., Zhou, S. et al. A systematic search for RNA structural switches across the human transcriptome. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02335-1

Download citation

Received : 26 February 2023

Accepted : 29 May 2024

Published : 16 July 2024

DOI : https://doi.org/10.1038/s41592-024-02335-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research on dna structure

IMAGES

  1. Figure 4 from Discovery of DNA Structure and Function: Watson and Crick

    research on dna structure

  2. DNA Structure

    research on dna structure

  3. DNA sequencing

    research on dna structure

  4. Chemical structure of DNA, displaying four nucleobase pairs made by

    research on dna structure

  5. Figure 4 from Discovery of DNA Structure and Function: Watson and Crick

    research on dna structure

  6. DNA

    research on dna structure

VIDEO

  1. Primary Structure of DNA

  2. Ever Wondered? DNA Length

  3. DNA Double Helix Discovery #shorts

  4. Unbelievable DNA Discovery! Decoding the Genetic Code #shorts

  5. Surprising DNA Links: Humans, Flies, and Chimps #interesting facts about dna

  6. Mind blowing facts about DNA

COMMENTS

  1. The structure of DNA

    The discovery of the helical structure of double-stranded DNA settled the matter — and changed biology forever. ... but also housed the Medical Research Council's Unit for Research on the ...

  2. DNA

    DNA, organic chemical of complex molecular structure found in all prokaryotic and eukaryotic cells. It codes genetic information for the transmission of inherited traits. The structure of DNA was described in 1953, leading to further understanding of DNA replication and hereditary control of cellular activities.

  3. Discovery of the structure of DNA (article)

    The structure of DNA, as represented in Watson and Crick's model, is a double-stranded, antiparallel, right-handed helix. The sugar-phosphate backbones of the DNA strands make up the outside of the helix, while the nitrogenous bases are found on the inside and form hydrogen-bonded pairs that hold the DNA strands together.

  4. The Structure and Function of DNA

    The Structure and Function of DNA. Biologists in the 1940s had difficulty in accepting DNA as the genetic material because of the apparent simplicity of its chemistry. DNA was known to be a long polymer composed of only four types of subunits, which resemble one another chemically. Early in the 1950s, DNA was first examined by x-ray diffraction ...

  5. DNA

    DNA (deoxyribonucleic acid) is the nucleic acid polymer that forms the genetic code for a cell or virus. Most DNA molecules consist of two polymers (double-stranded) of four nucleotides that each ...

  6. What Rosalind Franklin truly contributed to the discovery of DNA's

    Chemist Rosalind Franklin independently grasped how DNA's structure could specify proteins. Credit: Photo Researchers/Science History Images/Alamy. James Watson and Francis Crick are two of the ...

  7. Biochemistry, DNA Structure

    The remarkable structure of deoxyribonucleic acid (DNA), from the nucleotide up to the chromosome, plays a crucial role in its biological function. The ability of DNA to function as the material through which genetic information is stored and transmitted is a direct result of its elegant structure. In their seminal 1953 paper, Watson and Crick unveiled two aspects of DNA structure: pairing the ...

  8. DNA

    Chemical structure of DNA; hydrogen bonds shown as dotted lines. Each end of the double helix has an exposed 5' phosphate on one strand and an exposed 3′ hydroxyl group (—OH) on the other.. DNA is a long polymer made from repeating units called nucleotides. The structure of DNA is dynamic along its length, being capable of coiling into tight loops and other shapes.

  9. DNA structure and function

    The information encoded by DNA is both digital - the precise base specifying, for example, amino acid sequences - and analogue. The latter determines the sequence-dependent physicochemical properties of DNA, for example, its stiffness and susceptibility to strand separation. Most importantly, DNA chirality enables the formation of supercoiling ...

  10. Understanding biochemistry: structure and function of nucleic acids

    The structure of DNA. ( A) A nucleotide (guanosine triphosphate). The nitrogenous base (guanine in this example) is linked to the 1′ carbon of the deoxyribose and the phosphate groups are linked to the 5′ carbon. A nucleoside is a base linked to a sugar. A nucleotide is a nucleoside with one or more phosphate groups.

  11. DNA function & structure (with diagram) (article)

    DNA structure and function. DNA is the information molecule. It stores instructions for making other large molecules, called proteins. These instructions are stored inside each of your cells, distributed among 46 long structures called chromosomes. These chromosomes are made up of thousands of shorter segments of DNA, called genes.

  12. 9.1: The Structure of DNA

    The building blocks of DNA are nucleotides, which are made up of three parts: a deoxyribose (5-carbon sugar), a phosphate group, and a nitrogenous base (Figure 9.1.2 9.1. 2 ). There are four types of nitrogenous bases in DNA. Adenine (A) and guanine (G) are double-ringed purines, and cytosine (C) and thymine (T) are smaller, single-ringed ...

  13. (PDF) DNA structure and function

    A) Structures of A-DNA and B-DNA. Note the difference in groove width and the relative displacements of the base pairs from the central axis. Reproduced with permission from Arnott [12].

  14. Classic experiments: DNA as the genetic material

    Scientists first thought that proteins, which are found in chromosomes along with DNA, would turn out to be the sought-after genetic material. Proteins were known to have diverse amino acid sequences, while DNA was thought to be a boring, repetitive polymer, due in part to an incorrect (but popular) model of its structure and composition 1 ‍ .

  15. PDF Structure & History of DNA

    DNA structure: • Maurice Wilkins and Rosalind Franklin at King's College in London • Linus Pauling, an American Chemist at the California Institute of Technology ... Miscommunication with regards to their roles in the DNA research lead to tension amongst Wilkins and Franklin Many think this opened the door to Watson and Crick

  16. PDF DNA Structure & Chemistry

    tion from DNA to protein. But first, in this chapter we look closely at the structure and chemistry of DNA in order to learn how its double-helical architecture allows information to be stored, es and deoxyribose sugarsAs its name implies, the double helix is composed of two polynucleotide chains that are wrapped aro.

  17. Human Molecular Genetics and Genomics

    The IOM's early years coincided with paradigm-shifting discoveries related to DNA, as biologic research swiftly incorporated Boyer and Cohen's recombinant method, Sanger's DNA-sequencing ...

  18. Discovery of DNA Structure and Function: Watson and Crick

    Dahm, R. Discovering DNA: Friedrich Miescher and the early years of nucleic acid research. Human Genetics 122, 565-581 (2008) Levene, P. A. The structure of yeast nucleic acid. IV. Ammonia ...

  19. DNA: Properties, Structure, Composition, Types, Functions

    It is a double-stranded molecule and has a unique twisted helical structure. DNA is made up of nucleotides, each nucleotide has three components: a backbone made up of a sugar (Deoxyribose) and phosphate group and a nitrogen-containing base attached to the sugar. Each strand has many nucleotides or says numerous sugar, a phosphate group, and ...

  20. PDF No. 4356 April 25, 1953 NATURE 737

    This figure is purely diagrammatic. The two ribbons symbolize the two phosphate-sugar chains, and the hori zontal rods the pairs of bases holding the chains together. The vertical line marks the fibre axis. radically different structure for the salt of deoxyribose nucleic acid. This structure has two helical chains each coiled round the same ...

  21. What Is DNA?- Meaning, DNA Types, Structure and Functions

    DNA was first recognized and identified by the Swiss biologist Johannes Friedrich Miescher in 1869 during his research on white blood cells. The double helix structure of a DNA molecule was later discovered through the experimental data by James Watson and Francis Crick. Finally, it was proved that DNA is responsible for storing genetic ...

  22. A Mammoth DNA Discovery Helps Map an Ancient Genome in 3-D

    A "fossil chromosome" preserves the structure of a woolly mammoth's genome — and offers a better grasp of how it once worked. Science | A Mammoth First: 52,000-Year-Old DNA, in 3-D

  23. Study deciphers intricate 3D structure of DNA aptamer for disease

    In a study published in PNAS, a research team has resolved the first high-resolution structure of the sgc8c DNA aptamer that targets protein tyrosine kinase 7 (PTK7), engineered two optimal sgc8c ...

  24. Australian research discovers DNA structure's role in forming memories

    CANBERRA, July 19 (Xinhua) -- A specific type of DNA structure could be key in regulating how the brain forms memories, Australian research has found. In a study published on Friday, international ...

  25. Thirty years of structural changes

    The concluding statement of Watson and Crick's historic paper on the structure of DNA1 enshrines a key tenet of molecular mechanistic cell biology: "… the specific pairing we have postulated ...

  26. Structure-guided discovery of ancestral CRISPR-Cas13 ribonucleases

    Nevertheless, the CRISPR arrays appeared to be actively acquiring new spacers, which we found target double-stranded DNA (dsDNA) phages (fig. S5 and tables S4 and S5). Consistent with this, four of the ten Cas13an encoding genomes have cas1 and cas2 genes in trans belonging to Type II CRISPR-Cas systems (fig. S6).

  27. Australian research discovers DNA structure's role in ...

    CANBERRA, July 19 (Xinhua) -- A specific type of DNA structure could be key in regulating how the brain forms memories, Australian research has found. In a study published on Friday, international researchers led by a team from the Australian National University (ANU) discovered that G-quadruplex DNA (G4-DNA) plays a role in transcribing memories.

  28. Unique characteristics of previously unexplored protein discovered

    Mar. 7, 2024 — A research team has recently made a significant breakthrough in understanding how the DNA copying machine helps pass on epigenetic information to maintain gene traits at each cell ...

  29. Nursing aide turned sniper: Thomas Crooks plot to kill Donald Trump

    BUTLER, Pa. - Donald Trump and would-be assassin Thomas Crooks started on their violent collision course long before the former president's political rally ended in gunshots and death. Crooks ...

  30. A systematic search for RNA structural switches across the ...

    The small minority of tools enabling de novo prediction of RNA switches lack experimental verification of RNA structure and function 10,11. Therefore, there is an unmet need for scalable methods ...