MOST people will primarily associate protein with its role as a major food group. But at a cellular level, there are millions of possible proteins: they are large, complex molecules that play crucial roles in biochemical processes.
A single protein molecule is made up of amino acids linked together in a long chain. The precise order of these amino acids is specified by the exact sequence of DNA in its corresponding gene.
Although the double helix structure of DNA was known in 1953, it remained a mystery for the rest of the decade how the sequence of a gene contains the “information” needed to transform the four chemicals given the shorthand A, T, C, and G into a protein.
The early metaphors to explain the biochemistry of this mysterious transformation were borrowed from linguistics and cryptography. The DNA molecule is said to be “transcribed” into a “messenger” called RNA, which is then itself “translated” into protein.
The “genetic code” is the relationship between DNA and amino acids. For scientists today, these words have lost most of their original meaning, and the genetic code is simply a given that you can look up in a table. But those details and the code had to first be cracked by painstaking biochemistry.
A big announcement came in August 1961 at the International Congress of Biochemistry in Moscow, where a US scientist called Marshall Nirenberg presented his work at a small side session of about 30 scientists. He reported that adding a synthetic RNA called poly-uracil to E coli bacteria made them produce proteins containing only one sort of amino acid, phenylalanine.
Whereas most proteins could be compared to an interesting and unique story with lots of words, these proteins were instead a monotonous repetition of the same word over and over again.
This was crucial evidence for the first “word” from the genetic code that could be deciphered. Listening to the presentation, Francis Crick was so excited he arranged for Nirenberg to present it again to the main congress of 1,000 scientists.
The genetic code is a triplet code: each possible three-letter sequence of DNA — a “codon” — corresponds to one amino acid. A letter can be one of the four chemicals, A, C, T or G, meaning there are 4x4x4=64 total possibilities for these codons.
Nirenberg had found the first such relationship, and over the following years, the remaining 63 codons would subsequently be deciphered. A gene should be “read” as a sequence of codons, corresponding to a chain of amino acids that, once made, folds and scrunches up to make the final structure of the protein.
All living organisms use an essentially identical genetic code (with a few small idiosyncratic exceptions). This is one of the strongest pieces of evidence for the universal shared ancestry of all life on Earth.
It was this shared ancestry that prompted the French scientist Jacques Monod to quip, “What is true for E coli is true for the elephant.” Studying bacteria therefore enables us to understand the basic biochemistry of more complex organisms that we can’t study so easily in the lab. E coli is the most studied of all bacteria.
We know that its genome normally has something like 4,500 genes, each encoding one protein. The average length of those proteins is around 300 amino acids (and therefore 900 “letters” of DNA). If the bacterial cell were the size of a bathtub, most proteins would be around the size of a small marble about 1cm in diameter.
However, there is evidence for the existence of bigger proteins — much bigger. In a recent preprint published online, a team of researchers deliberately went looking for genes that encode proteins made of tens of thousands of amino acids, monsters hundreds of times bigger than normal proteins.
In this recent study, the researchers looked at DNA taken from a variety of places including soil, wetlands and water from rocks deep below the Earth’s surface. This is a vast ocean of genetic data to search through for gigantic genes; the genetic equivalent of whale-watching.
The researchers looked in particular at a sort of bacteria called Omnitrophota. These bacteria were identified first from DNA sequencing of water and sediment, rather than by culturing them in a laboratory. Only two species of these bacteria have ever been observed using a microscope, and they have never been successfully grown in a Petri dish.
Despite this, they appear to be everywhere on Earth. Their DNA pops up again and again in genetic data of all sorts, showing they are there even when the organism itself wasn’t observed. They are there, hidden in plain sight.
In the Omnitrophota the researchers who were out to catch a big one found nearly 50 gene sequences that encoded proteins at least 30,000 amino acids long. In a world where most bacterial proteins are 100 times smaller, the existence of so many behemoth proteins in this group of bacteria is a huge surprise.
What is ironic is that Omnitrophota themselves are extremely small — they have short genomes and tiny cell sizes. The fact that such large proteins are so common in their genomes suggests there may be some connection.
Some of the Omnitropha survive as predators on other bacteria (their name means “eaters of everything”). Last year, a different group of researchers showed that this predatory behaviour for one Omnitrophota species was linked to one of the huge proteins. This giant, around 40,000 amino acids in size, can integrate into the cell wall of its prey, eventually dissolving the cell.
Whether this is a phenomenon shared in the other proteins identified by genetic analysis remains to be seen. In particular, though the gigantic proteins could be identified by simply reading the presence of the proteins out from the DNA letters alone, what can’t be known with certainty is what happens after the protein is made. It may be that many of the gigantic proteins are in fact broken up into smaller pieces, rather than functioning as a huge single unit.
Hunting giant proteins might seem esoteric, but the tiny Omnitropha bacteria undoubtedly play a vital role in processes of decay in sediments and wastewater. Beyond being a fun expedition of genetic discovery, understanding these giant proteins may give us new insight into these crucial processes.
The research is also a reminder of how much of the diversity of life is still unknown: in the soil and the water around you, there are trillions of tiny microbes that we know remarkably little about.
An update from the Science and Society team: When we started co-writing this column as an experiment in early 2019, we had little idea we would still be doing it almost five years later. Having written more than 100 articles together as a three-person writing team, we’re sad to say that Joel is taking a well-earned retirement from Science and Society — he will be missed. For future articles, we’re pleased to announce that Rox and Liam will be joined by Miriam Gauntlett, a geophysicist at the University of Oxford.