p251

Go’s understanding of protein folding (footnote 38)

His understanding was far ahead of the studies , e.g., in the US. The following summary is taken form Integrative Natural History entries 6.5.3 and 6.5.4.

6.5.3 Levinthal paradox and its structural resolution 

The essence of the Levinthal paradox is: if a target state is a very small subset of a phase space (high-dimensional search space), serial search is ineffective. The problem of obtaining the native structure from the primary sequence is a NP hard problem [1]. This implies that

(i) Biologically relevant polypeptides (= proteins) are special molecules.

(ii) Special proteins must at least be capable of avoiding serial search for its native state.

Zwanzig and coworkers [2] thought the problem is similar to evolution through accumulating small changes as discussed by Dawkins: how long will a random search take to produce Hamlet’s remark “Methinks it is like a weasel”? The example is a mere local search and is not relevant to the protein folding problem; as Karplus stresses [3], the problem becomes NP complete due to the long range order. Thus, biased search

is not necessarily a solution. Actually, there is a good reason to believe that it is not in the present case.

 N. G¯o [4] seems to be the first to recognize the keys to the resolution of paradox as Karplus admits [3]. He conceived the Consistency Principle; the consistency between the short and long range interactions responsible for the stability of the native state. In real proteins no parts are sacrificed and both types of interactions are optimized at the same time. Bryngelson and Wolynes’ ‘principle of minimal frustration’ means essentially the same (ten years later). The avoidance of conflicts between short and long range interactions may be reflected in the distribution of the ends of proteins [5]. Also orderly arrangements of

residues in proteins seem to be the rule.

[1] J.T. Ngo and J. Marks, Computational complexity of a problem in molecular structure prediction, Protein Eng 5 313 (1992); R. Unger and J. Moult, Finding the lowest free energy conformation of a protein is an NP-hard problem: proof and implications, Bull. Math. Biol 55, 1183 (1993); A.S. Fraenkel, Complexity of protein folding, Bull. Math. Biol 55, pp. 1199 (1993); J.T. Ngo, J. Marks and M. Karplus, Computational complexity, protein structure prediction, and the Levinthal paradox. In: K.  Merz Jr and S. Le Grand, Editors,  The Protein Folding Problem and Tertiary Structure Prediction , Birkh¨auser, Boston, MA 6.5.4 Optimizational resolution of Levinthal paradox  (1994), pp. 435.

Relation to undecidability It should be noted that, if we follow the argument of C. Goodman-Strauss, “Can’t Decide? Undecided!” NAMS 57, 343 (2010), the protein folding problem PRO(M, n) [an instance of a polypeptide that can be folded by an algorithm M (you may regard it as a TM) by n steps] can be related to the following undecidability statement: For a given M, whether infinitely many instances of PRO(M, n) are folded is undecidable.

[2] R. Zwanzig, A. Szabo and B. Bagchi, “Levinthal’s paradox,” P  86 , 20 (1992).

[3] M. Karplus, “The Levinthal paradox: yesterday and today” Folding and Design 2 S69 (1997).

[4] Summarized in N. G¯o, “Physics and biology of protein folding,” PTP Supp 170, 198 (2007). The original is N. G¯o, Ann. Rev. Biophys. Bioeng. 12,183 (1983); N. G¯o and H. Abe, Adv. Biophys., 18, 149 (1984) [As Karplus et al., N 373, 665 (1995) admits. G¯o models are introduced in Go, N. & Abe, H., Noninteracting local-structure model of folding and unfolding transition in globular proteins. I. Formulation. Biopolymers 20, 991 (1981).

[5]  Ends of proteins are excluded from the core See Grosberg et al., “Statistics of Knots, Geometry of Conformations, and Evolution of Proteins,” PLoS Comp. Biol. 2, e45 (2006). Although some rather complex knots are identified, native conformations of proteins have statistically fewer knots than random compact loops, and the local geometrical properties, such as the crumpled character of the conformations at a certain range of scales, are consistent with the rarity of knots. Proteins are ‘easily stretchable’, i.e., both ends of the polypeptide chains are excluded from the core and placed peripherally. See Unger, “A

 ale of two tails: why are terminal residues of proteins exposed?,” Bioinformatics 23, e225 (2007): For a single domain proteins 80% termini are outside, and they are much more distant from the center of mass of their proteins than other residues. 

6.5.4. Optimizational resolution of Levinthal paradox

The serial search of the native state takes eons simply because the folding problem is of combinatorial nature. Therefore, G¯o concludes [1] that, in the process of folding, some partially folded structures must be formed which in a later stage of folding work as structural units. As he points out, this is corroborated by the fair success of various methods of prediction of secondary structures in globular proteins. “Because all of the methods of secondary structure prediction are based on information about short range interactions, their air success appears to mean that the local and global structures do not interfere very much” (this does

not mean independent; this is the G¯o’s consistency principle) [1.4]. This allows parallel search of high-dimensional space.

 As is well known, Levinthal himself suggested that there was a virtually unique folding path (but this is a naive understanding of his idea as seen in [1.5]). However, to find such a path (or its starting point) [2] in the vast conformation space is just as difficult as to find the folded state. Thus, this proposal actually does not solve the difficulty, but simply relocates it. The same criticism applies to the nucleation theory and transition state theories; these require realization of a special state that is confined in a extremely small domain of the conformation space. Thus, the paradox is only shifted from the final folded state to some

other (precursor) states [3]. What G¯o suggested above is that decoupling of global and local structural organization allows reduction of the search space dimension considerably. This is the essence of parallel search. This reduction of dimension is the key element of the so-called funnel.

[1] Summarized in N. G¯o, “Physics and biology of protein folding,” PTP Supp 170, 198 (2007).

[1.4] Barry Honig says, “Indeed global hydrophobic collapse in the absence of secondary structure formation has not to my knowledge been observed.” in the paper quoted in [1.5]. Sometimes the folding nucleus has been described in terms of local secondary structure stabilized by tertiary interactions [Fersht, A. R. Nucleation mechanisms in protein folding,Curr. Opin. Struct. Biol., 7, 3 (1997); Fersht has used the term nucleation-condensation to describe this type of process and distinguishes it from classical nucleation in which structure grows from a strong localized nucleus.].

[1.5] This is the usual description, but actually his idea is not this simple: B. Honig writes [B. Honig, “Protein Folding: From the Levinthal Paradox to Structure Prediction,” JMB 293 283 (1999)], “Levinthal and I wrote [in Honig, B., Ray, A. & Levinthal, C. (1976). Conformational flexibility and protein folding: rigid structural fragments connected by flexible joints in subtilisin BPN. P 73, 1974 (1976).] that “We assume that proteins fold by following a multiply branched pathway in which the first stage is the

formation of local secondary structure governed by interactions that are near each other in the peptide chain. Subsequently, these structures, such as a-helices and antiparallel β-strands, would interact, perhaps being modified in the process, to produce larger structural fragments which then undergo further assembly to yield the native conformation.” “The phrase was intended to imply that even though there is no fixed sequence of events in folding, there is a general order of events that can be described

in a phenomenological sense in terms of the progressive accumulation of tertiary structure from smaller fragments.”

[2] Notice that a 1D manifold in high dimensional space is still extremely hard to locate, because the dimensionality of the search space is reduced only by one.

[3] Nucleation process corresponds to finding a small codimensional space, so one might say this objection cannot be applied. In the case of crystallization, once the problem is restricted to a subspace with a small codimension thanks to the formation of a nucleus, no further search is required. That is why nucleation is decisive; the secondary nucleation process is a mere low-dimensional local search. However, for general proteins, the folding process is not a secondary nucleation process, and another search in the remaining still extremely high dimensional space is required.