Category Archives: Computation

Biology as an axiomatic process

The replication mechanisms of living beings can be compared with the self-replication of automatons in the context of computability theory. In particular, DNA replication, analyzed from the perspective of the recursion theorem, indicates that its replication structure goes beyond biology and the quantum mechanisms that support it, as it is analyzed in the article Biology as an Axiomatic Process.

Physical chemistry establishes the principles by which atoms interact with each other to form molecules. In the inorganic world the resulting molecules are relatively simple, not allowing establishing a complex functional structure. On the other hand, in the organic world, molecules can be made up of thousands or even millions of atoms and have complex functionality. It highlights what is known as molecular recognition, through which the molecules interact with each other selectively and which is the basis of biology.

Molecular recognition plays a fundamental role in the structure of DNA, in the translation of the genetic code of DNA into proteins and in the biochemical interaction of proteins, which ultimately form the basis on which living beings are based.

The detailed study of these molecular interactions makes it possible to describe the functionality of the processes, in such a way that it is possible to establish formal models, to such an extent that they can be used as a computing technology, as is the case of DNA-based computing.

From this perspective, this allows us to ask if the process of information is something deeper and if in reality it is the foundation of biology itself, according to what is established by the principle of reality.

For this purpose, this section aims to analyze the basic processes on which biology is based, in order to establish a link with axiomatic processing and thus investigate the nature of biological processes. For this, it is not necessary to describe in detail the biological mechanisms described in the literature. We will simply describe its functionality, so that they can be identified with the theoretical foundations of information processing. To this end, we will explain the mechanisms on which DNA replication and protein synthesis are based.

DNA and RNA molecules are polymers formed from the ribose and deoxyribose nucleotides, respectively, bound by phosphates. On the basis of this nucleotide chain, one of the four possible nucleic acids can be linked. There are five different nucleic acids, adenine (A), guanine (G), cytosine (C), thymine (T) and uracil (U). In the case of DNA, nucleic acids that can be coupled by covalent bonds to nucleotides are A, G, C and T, whereas in the case of RNA they are A, G, C and U. As a consequence, molecules are structured in a helix shape, fitting the nucleic acids in a precise and compact way, due to the shape of their electronic clouds.

The helix structure allows the nucleic acids of two different strands to be bound together by hydrogen bonds, forming pairs A-T and G-C in the case of DNA, and A-U and G-C in the case of RNA, as shown in the following figure.

Base-pairing of nucleic acids in DNA

As a result, the DNA molecule is formed by a double helix, in which two chains of nucleotides polymers wind one on top of the other, remaining together by means of hydrogen bonds of nucleic acids. Thus, each strand of the DNA molecule contains the same genetic code, one of which can be considered the negative of the other.

Double helix structure of DNA molecule

The genetic information of an organism, called a genome, is not contained in a single DNA molecule, but is organized into chromosomes. These are made up of DNA strands bound together by proteins. Thus, in the case of humans, the genome is formed by 46 chromosomes, and so, the number of bases in the DNA molecules that compose it being about 3×109. Since each base can be encoded by means of 2 bits, the human genome, considered as an object of information, is equivalent to 6×109 bits.

The information contained in the genes is the basis for the synthesis of proteins, which are responsible for executing and controlling the biochemistry of living beings. The proteins are formed by the bonding of amino acids, through covalent bonds, which is done from the sequences of the bases contained in the DNA. The number of existing amino acids is 20 and since each base codes 2 bits, 3 bases (6 bits, 64 combinations) are necessary to be able to code each one of the amino acids. This means that there is some redundancy in the assignment of base sequences to amino acids, in addition to control codes for the synthesis process (Stop), as shown in the following table.

Translation of nucleic acids (Codons) to amino acids

However, protein synthesis is not done directly from DNA, since it requires the intermediation of RNA. This is called the translation process and involves two types of different RNA molecules, the messenger ARM (mRNA) and the transfer RNA (tRNA). The first step is the synthesis of mRNA from DNA. This process is called transcription, in such a way that the information corresponding to a gene is copied into the mRNA molecule, which is done through a process of recognition between the molecules of the nucleic acids, carried out by the hydrogen bonds, such as shows the following figure.

DNA transcription

Once the mRNA molecule is synthesized, the tRNA molecule is responsible for mediating between mRNA and amino acids to synthesize proteins, for which it has two specific molecular mechanisms. On the one hand, tRNA has a chain of three amino acids called anticodon at one end. On the opposite side, tRNA binds to a specific amino acid, according to the translation table of nucleic acid sequences into amino acids. In this way, tRNA is able to translate mRNA into a protein, as shown in the figure below. 

Protein synthesis (mRNA translation)

But perhaps the most complex process is undoubtedly DNA replication, so that each molecule produces two identical replicas. Replication is performed by unwinding each strand of the molecule and inserting the nucleic acid molecules on each of the strands, in a similar way to that shown in the mRNA synthesis. DNA replication is controlled by enzymatic processes supported by proteins. Without going into detail and in order to show its complexity, the table below shows the proteins involved in the replication process and their role.

The role of proteins in the DNA replication process

The processes described above are defined as the central dogma of molecular biology and are usually schematically represented schematically as shown in the following figure. It also depicts the reverse transcription that occurs in retroviruses, which synthesizes a DNA molecule from RNA.

Central dogma of molecular biology

The biological process from the perspective of computability theory

Molecular processes supported by DNA, RNA and proteins can be considered from an abstract point of view as information processes. As a result, input statements corresponding to a language are processed resulting in new output statements. Thus, the following languages can be identified:

  • DNA molecule. Sentence consisting of a sequence of characters corresponding to a 4-symbol alphabet.
  • RNA molecule – protein synthesis. Sentence consisting of a sequence of characters belonging to a 21-symbol alphabet.
  • RNA molecule-reverse transcription. Sentence composed of a sequence of characters belonging to a 4-symbol alphabet.
  • Protein molecule. Sentence composed of a sequence of characters belonging to a 20-symbol alphabet.

This information is processed by the machinery established by the physicochemical properties of control molecules. To better understand this functional structure, it is advisable to modify the scheme corresponding to the central dogma of biology. To do this, we must represent the processes involved and the information that flows between them, as shown in the following block diagram.

Functional structure of DNA replication

This structure highlights the flow of information between processes, such as DNA and RNA sentences, where the functional blocks of information processing are the following:

  • PDNA. Replication process. The functionality of this process is determined by the proteins involved in DNA synthesis, producing two replicas of DNA from a single molecule.
  • PRNA. Transcription process. It synthesizes a RNA molecule from a gene encoded in DNA.
  • PProt. Translation process. It synthesizes a protein from an RNA molecule.

This structure clearly shows how information emerges from biological processes, something that seems to be ubiquitous in all natural models and allows the implementation of computer systems. In all cases this capacity is finally supported by quantum physics. In the case of biology in particular, this is produced from the physicochemical properties of molecules, which are determined by quantum physics. Therefore, the information process is something that emerges from an underlying reality and ultimately from quantum physics. This is true as far as knowledge goes.

This means that, although there is a strong link between reality and information, information is simply an emerging product of reality. But biology provides a clue to the intimate relationship between reality and information, which are ultimately indistinguishable concepts. If we look at the DNA replication process, we see that DNA is produced in several stages of processing:

DNA → RNA → Proteins → DNA.

We could consider this to be a specific feature of the biological process. However, computability theory indicates that the replication process is subject to deeper logical rules than the physical processes themselves that support replication. In computability theory, the recursion theorem determines that replication of information requires at least the intervention of two independent processes.

This shows that DNA replication is subject to abstract rules that must be satisfied not only by biology, but by every natural process. Therefore, the physical foundations that support biological processes must verify this requirement. Consequently, this shows that the information processing is essential in what we understand by reality.

Natural language: A paradigm of axiomatic processing

The Theory of Computation (TC) aims to establish computational models and determine the limits of what is computable and the complexity of a problem when it is computable. The formal models established by TC are based on abstract systems ranging from simple models, such as automatons, to the general computer model established by the Turing Machine (TM).

Formally, the concept of algorithm is based on TM, so that each of the possible implementations will perform a specific function that we call algorithm. The TC demonstrates that it is possible to build an idealized machine, called Universal Turing Machine (UTM), capable of executing all possible computable algorithms. In the case of commercial computers, these are equivalent to UTM, with the difference that their memory and runtime are limited. On the contrary, in the UTM these resources are unlimited.

But the question we can ask is: What does this have to do with language? The answer is simple. In TC, an L(TM) language is defined as the set of bit sequences that “accepts” a given TM. In which the term “accept” means that the TM analyzes the input sequence and reaches the Halt state. Consequently, a language is the set of mathematical objects accepted by a given TM.

Without going into details that can be consulted in the specialized literature, the TC classifies the languages into two basic types, as shown in the figure. Thus, a language is Turing-decidable (DEC) when the TM accepts the sequences belonging to the language and rejects the rest, reaching the Halt state in both cases. On the contrary, a language is Turing-recognizable or RE if it is the language of a TM. This means that, for the set of languages belonging to RE but not belonging to DEC, TM does not reach the Halt state when the input sequence does not correspond to the language.

It is necessary to emphasize that there are sequences that are not recognized by any TM. Therefore, if the formal definition of language is taken into account, they should not be considered as such, although in general they are defined as non-RE languages. It is important to note that the latter concept is equivalent to Gödel’s incompleteness theorem. As a consequence, they are the set of undecidable or unsolvable problems, that is, they have a cardinality superior to the one of the natural numbers.

Within DEC languages, two types, regular ​​and context-free (CFL) can be identified. Regular languages ​​are those composed of a set of sequences on which the TM can decide individually, so they do not have a grammatical structure. Examples of these are the languages ​​of the automatons we handle every day, such as elevators, device controls, etc. CFLs are those that have a formal structure (grammar) in which language elements can be nested recursively. In general, we can consider CFLs to programming languages, such as JAVA, C ++. This is not strictly true, but it will facilitate the exposure of certain concepts.

But the question is: What does this have to do with natural language? The answer is easy again. Natural language is, in principle, a Turing-decidable language. The proof of this is trivial. Maybe a few decades ago this was not so, but nowadays information technology shows it us clearly, without the need for theoretical knowledge. On the one hand, natural language is a sequence of bits, since both spoken and written language are coded as bit sequences in audio and text files, respectively. On the other hand, humans do not loop when we get a message, at least permanently ;-).

However, it can be argued that we did not reach the Halt state either. But in this context, this does not mean that we literally end our existence, although there are messages that kill! This means that information processing concludes and that, as a result, we can make a decision and tackle a new task.

Therefore, from an operational or practical point of view, natural language is Turing-decidable. But we can find arguments that can be in conflict with this and that materialize in the form of contradictions. Although it may seem surprising, this also happens with programming languages, since their grammar may be context sensitive (CSG). But for now, we are going to leave aside this aspect, in order to make the reasoning easier.

What can intuitively be seen is a clear parallel between the TM model and the human communication model, as shown in the figure. This can be extended to other communication models, such as body language, physicochemical language between molecules, etc.

In the case of TC, the input and output objects to the TM are language elements, which is very suitable because the practical objective is human-to-machine or machine-to-machine communication. But this terminology varies with the context. Thus, from an abstract point of view, objects have a purely mathematical nature. However, in other contexts such as physics, we talk about concepts such as space-time, energy, momentum, etc.

What seems to be clear, from the observable models, is that a model of reality is equivalent to bit sequences processed by a TM. In short, a model of reality is equivalent to an axiomatic processing of information, where the axioms are embedded in the TM. It should be clear that an axiom is not self-evident, and therefore does not need proof. On the contrary, an axiom is a proposition assumed within a theoretical body. Possibly, this misunderstanding is originated by the apparent simplicity of some axiomatic systems, produced by our perception of reality. This is obvious, for example, in Euclidean geometry based on five postulates or axioms, in which such postulates seem to us evident, due to our perception of space. On this, we will continue to insist since the axiomatic processing is surely one of the great mysteries that nature encloses.

Returning to natural language, it should be possible to establish a parallelism between it and the axiomatic processing determined by TM, as suggested in the figure. As with programming languages, the structure of natural language is defined by a grammar, which establishes a set of axiomatic rules that determine the categories (verb, predicate) of the elements of language (lexicon) and how they are combined to form expressions (sentences). Both the elements of language and the resulting expressions have a meaning, which is known as semantics of language. The pertinent question is: What is the axiomatic structure of a natural language?

To answer, let’s reorient the question: How is the semantics of natural language defined? To do this, we can begin by analyzing the definition of the lexicon of a language, collected in the dictionary. In it we can find the definition of the meaning of each word in different contexts. But we soon find a formal problem, since the definitions are based on one another in a circular fashion. What is the same, the defined is part of the definition, so it is not possible to establish the semantics of language from the linguistic information.

For example, according to the Oxford dictionary:

  • Word: A single distinct meaningful element of speech or writing, used with others (or sometimes alone) to form a sentence and typically shown with a space on either side when written or printed.
  • Write: Mark (letters, words, or other symbols) on a surface, typically paper, with a pen, pencil, or similar implement. 
  • Sentence: A set of words that is complete in itself, typically containing a subject and predicate, conveying a statement, question, exclamation, or command, and consisting of a main clause and sometimes one or more subordinate clauses. 
  • Statement: A definite or clear expression of something in speech or writing
  • Expression: A word or phrase, especially an idiomatic one, used to convey an idea. 
  • Phrase: A small group of words standing together as a conceptual unit, typically forming a component of a clause

Therefore:

  • Word: A single distinct … or marks (letters, words, or other symbols) on … to form a set of words that … conveying a definite or clear word or a small group of words standing together … or marking (letters, words, …. ) …

In this way, we could continue recursively replacing the meaning of each component within the definition, arriving at the conclusion that natural language as an isolated entity has no meaning. So it is necessary to establish an axiomatic basis external to the language itself. By the way: What will happen if we continue to replace each component of the sentence?

Consequently, we can rise what will be the result of an experiment in which an entity of artificial intelligence disconnected from all reality, except from the information on which the written language is based, analyzes the information. That is, the entity will have access to grammar, dictionary, written works, etc. What will be the result of the experiment? To what conclusions will the entity arrive?

If we mentally perform this experiment, we will see that the entity can come to understand the reality of language, and all the stories based on it, provided that it has an axiomatic basis. Otherwise, the entity will experience what in information theory is known as “information without meaning”. This explains the impossibility of deciphering archaic scripts without having cross-references to other languages ​​or other forms of expression. In the case of humans, the axiomatic basis is acquired from cognitive experiences external to the language itself.

To clarify the idea of what the axiomatic processing means, we can use simple examples related to programming languages. So, let’s analyze the semantics of the “if… then” statement. If we consult the programming manual we can determine its semantics, since in our brain we have implemented rules or axioms to decipher the written message. This is equivalent to what happens in the execution of program sentences, in which it is the TM that executes those expressions axiomatically. In the case of both the brain and TM, axioms are defined in the fields of biochemistry and physics, respectively, and therefore outside the realm of language.

This shows once again how reality is structured in functional layers, which can be seen as independent entities by means of the axiomatic processing, as has been analyzed in:

But this issue, as well as the analysis of the existence of linguistic contradictions, will be addressed in later posts.