Space functions of groups

We consider space functions $s(n)$ of finitely presented groups $G =.$ (These functions have a natural geometric analog.) To define $s(n)$ we start with a word $w$ over $A$ of length at most $n$ equal to 1 in $G$ and use relations from $R$ for elementary transformations to obtain the empty word; $s(n)$ bounds from above the tape space (or computer memory) one needs to transform any word of length at most $n$ vanishing in $G$ to the empty word. One of the main obtained results is the following criterion: A finitely generated group $H$ has decidable word problem of polynomial space complexity if and only if $H$ is a subgroup of a finitely presented group $G$ with a polynomial space function.


Introduction
Time and space complexities are the main properties of algorithms. Their counterparts in Group Theory are the Dehn and filling length (or space) functions of finitely presented groups. In this paper, we study the interrelation of space functions of groups and the space complexity of the algorithmic word problem in groups.
Let G = A | R be a group presentation, where A is a set of generators and R is a set of defining relators. Recall that relators belong to the free group with basis A, and a group word w in generators A (i.e., a word over A ±1 ) represents the identity of G iff there is a derivation w ≡ w 0 → w 1 → · · · → w t−1 → w t ≡ 1 (1.1) where 1 is the empty word, the sign ≡ is used for letter-by-letter equality of words, and for every i = 1, . . . , t, the word w i results from w i−1 after application of one of the elementary R-transformations.
As such transformations one can take free reductions of subwords aa −1 → 1 (a ∈ A ±1 ), removing subwords r ±1 , where r ∈ R, and the inverse transformations. The minimal non-decreasing function f (n) : N → N such that for every word w vanishing in G and having length ||w|| ≤ n, there exists a derivation ( 1.1) with t ≤ f (n), is called the Dehn function of the presentation G = A | R [13]. For finitely presented groups (i.e., both sets A and R are finite) Dehn functions are usually taken up to equivalence to get rid of the dependence on a finite presentation for G (see [18]). To introduce this equivalence ∼, we write f g if there is a positive integer c such that f (n) ≤ cg(cn) + cn f or any n ∈ N ( 1.2) For example, we say that a function f is polynomial if f g for a polynomial g. From now on we use the following equivalence for non-decreasing functions f and g on N.
f ∼ g if both f g and g f (1. 3) It is not difficult to see that the Dehn function f (n) of a finitely presented group G is recursive (or bounded from above by a recursive function) iff the word problem is algorithmically decidable for G (see [11], [6]). In this case, the word problem can be solved by a primitive algorithm that, given a word w of length n, just checks if there exists a derivation ( 1.1) of length ≤ f (n). Therefore the nondeterministic time complexity of the word problem in G is bounded from above by f (n). Moreover if H is a finitely generated subgroup of G, then one can use the rewriting procedure ( 1.1) for H, and so the nondeterministic time complexity of the word problem for H is also bounded by f (n).
It turns out that a converse statement is also true. Assume that the word problem can be solved in a finitely generated group H by a nondeterministic Turing machine (N T M ) with time complexity T (n) . Then H is a subgroup of a finitely presented group G with Dehn function equivalent to n 2 T (n 2 ) 4 [4]. As the main corollary, one concludes that the word problem of a finitely generated group H has time complexity of class N P (i.e., there exists a non-deterministic algorithm of polynomial time complexity, that solves the word problem for H) iff H is a subgroup of a finitely presented group with polynomial Dehn function.
We want to obtain similar statements for space functions. It is clear that to perform the rewriting (1.1) one needs space equal to max 0≤i≤t ||w i ||, and this observation leads to the definition of space function of a finitely presented group. However when handling groups, one can enlarge the set of elementary R-transformations and obtain different definitions of space functions. One can either (1) consider only the transformations we defined above and obtain the filling length function introduced by Gromov [13] (also see [12], [3]); or (2) also allow replacement of words by their cyclic permutations as was suggested by Bridson and Riley [7]; or (3) additionaly use the replacement of a word w ≡ uv by the pair (u, v) if both u and v are trivial in G (see [7] again). Starting with these different sets of transformations one comes to different space functions called in [7], respectively, filling length function (F L), free filling length functions (F F L), and fragmenting free filling length functions (F F F L). Each of these functions has a visual geometric interpretation in terms of the transformations of loops in the Cayley complex of G using, respectively, null-homotopy, free null-homotopy, and free null-homotopy with bifurcations. It is proved in [7] that these functions behave differently for the same finitely presented group G, for instance, F F F L can grow linearly while F L and F F L have exponential growth. There are many other features of these functions presented in [7] to justify their "inclusion in the pantheon of filling invariants".
In this paper, we choose the third version (F F F L), and this choice is justified by the theorems on the connections between such functions and the space complexity of the word problem for groups. 1 Thus we operate with finite sequences of words W = (w 1 , . . . , w s ) over a group alphabet A. Given a finitely presented group G, we say that a finite sequence W ′ = (w ′ 1 , . . . , w ′ s ′ ) results from W after application of an elementary R-transformation if s ′ ∈ {s − 1, s, s + 1} and one of the following is done for some w i (i = 1, . . . , s): • a subword aa −1 is removed from or inserted to w i (a ∈ A ±1 ); • a subword r or r −1 is removed from or inserted to w i (r ∈ R); • w i is replaced by a cyclic conjugate; • w i ≡ uv, and w i is replaced by the pair u, v, i.e. W ′ = (w 1 , . . . , w i−1 , u, v, w i+1 , . . . , w s ); • w i is removed if it is empty, i.e. W ′ = (w 1 , . . . , w i−1 , w i+1 , . . . , w s ).
Clearly, we have w = 1 in the group G iff there exists an R-rewriting starting with (w) and ending with the empty string ( ).
For every finite sequence W = (w 1 , . . . , w s ) we set ||W || = s i=1 ||w i ||. By definition, the space of a rewriting W 0 → · · · → W t is max t j=0 ||W j ||. If a word vanishes in G, then space(w) = space G (w) is the minimum of spaces of all rewritings starting with (w) and ending with the empty string. The space function of the group presentation G = A | R (or briefly, of the group G) is the function S G (n) = max(space(w), where w = 1 in G and ||w|| ≤ n) 1 An embedding statement for the F L-functions was conjectured by J.-C. Birget in [3].
The space functions of finitely presented groups will be regarded up to the equivalence defined by ( 1.2) and (1.3), and so their growth will be at least linear. It is observed in [7] that the equivalence class of S G does not depend on a finite presentation of the group G, moreover this class is invariant under quasi-isometries.
An accurate definition of the space complexity (function) f (n) for a Turing machine (T M ) will be recalled in Subsection 2. 1. Now we just note that it is conventional that for a multi-tape T M, the function f (n) counts only the space of work tapes. However since the space complexities of machines are taken here up to the same equivalence as the space functions of groups, the adding of the space of the input tape does not change the equivalence class.
The sequence W 0 → · · · → W t can be easily produced by an N T M such that the computation needs at most 2 max t i=1 ||W i || + const tape squares. (See also Section 3 of [27] or Remark 2.4 in [7].) This immediately implies Proposition 1. 1. The space function of a finitely presented group G is equivalent to the space complexity of a non-deterministic two-tape T M . The language accepted by this machine coincides with the set of words equal to 1 in the group.
In particular, the non-deterministic space complexity of the word problem in a finitely presented group G does not exceed the space function of G. It follows from [18], [8] that the literal converse statement fails. Moreover, a counter-example can be given by Baumslag's 1-relator group [1] G = a, b | (aba −1 )b(aba −1 ) −1 = b 2 because the space function of G is not bounded from above by any multiexponential function (see papers of S.Gersten [10] and A.Platonov [25]) while the space complexity of the word problem for G is at most exponential. This was proved by M. Kapovich and Schupp (unpublished), moreover, it is polynomial (announced by A. G. Myasnikov, A. Ushakov, and Dong Wook Won). Therefore the correct formulation has to take into consideration that the algorithm from Proposition 1.1 solves the word problem not only for G but also for every finitely generated subgroup of the group G. In the deterministic case we get a sharper formulation: Theorem 1. 2. Let H be a finitely generated group such that the word problem for H is decidable by a deterministic T M (DT M ) with space complexity f (n). Then H is a subgroup of a finitely presented group G with space function equivalent to f (n).
One can choose the group G so that f (n) is also equivalent to the logarithm of the Dehn function of G.
The main corollary of this theorem applies to polynomial space complexity. We say that a finitely generated group G belongs to the class P SP ACE (to N P SP ACE) if the word problem for G is decidable by some DT M (some N T M ) with polynomial space complexity. But N P SP ACE = P SP ACE by the remarkable theorem of Savitch (see [9], Corollary 1.31) (Since there is no similar equality for deterministic time complexity, it therefore has no natural algebraic counterpart.) Therefore in contrast to the main result of [4], Proposition 1.1 and Theorem 1.2 eliminate any non-determinism in Corollary 1. 3. A finitely generated group H belongs to P SP ACE iff H is a subgroup of a finitely presented group G having polynomial space function.
Thus, given a 'good' algorithm solving the word problem in H (e.g., using a matrix representation of H, etc.), it is possible to find a bigger group G whose deterministically modified ('silly') natural algorithm solves the word problem for both H and G, and whose space complexity is not much worse than the space complexity of the original algorithm. Note that most common finitely generated groups belong to P SP ACE, with a wide range from linear groups (two-tape machines need just logspace on the work tape to solve their word problem [16]) to free Burnside groups of large odd exponents (N.Boatman, unpublished).
Another natural question raised in this paper is the realization problem: Which functions f (n) : N → N are, up to equivalence, the space functions of finitely presented groups? There are not many examples; linear and exponential ones can be found in [7], but it is not easy even to specify a group with space function n 2 .
Theorem 1. 4. The space complexity f (n) of arbitrary DT M M is equivalent to the space function of some finitely presented group G.
In addition, one can choose the group G so that f (n) is also equivalent to the logarithm of the Dehn function of G, and there is a one-to-one linear time reduction of the decision problem of M to the word problem of G.
This theorem gives a tremendous class of space functions of groups, including functions equivalent to [exp √ n], [n k ] (k ∈ N), [n k log l n], [n k log l (log log n) m ], etc. Note that we do not assume in the formulation of Theorem 1.4 that the function f (n) is superadditive (i.e., f (m+ n) ≥ f (m)+ f (n)) or grows sufficiently fast. (Compare with Theorem 1.2 [27] on Dehn functions of groups.) It follows, in particular, that there exists a finitely presented group whose space function is not equivalent to any superadditive function.
Recall that it is unknown if the Dehn function of arbitrary finitely presented group is equivalent to a superadditive function; see [15]. J.-C. Birget observed (email communication) that in addition, Theorem 1.4 implies a statement that does not involve space functions of groups (see also Remark 5.9.) The first claim of Corollary 1.5 below follows, in fact, from each of the papers [28] (by B.A.Trakhtenbrot) and [29] (by M.K. Valiev). Let us start with a DT M M solving a P SP ACE complete problem (for the definition and the existence see [9], Theorem 3.23). Then choose a finitely presented group G = G(M ) in accordance with Theorem 1.4, i.e., G has a polynomial space function. By Proposition 1.1, G belongs to N P SP ACE = P SP ACE, and so we have Corollary 1. 5. There is a finitely presented group G with P SP ACE complete word problem. Moreover the group G can be chosen with polynomial space function. Theorem 1.2 implies a non-deterministic corollary. To formulate it we recall that a function f : N → N is called fully space-constructible (F SC) if there exists a two-tape DT M that halts on any input x of length n after visiting exactly f (n) tape squares of the work tape. Most common functions are F SC (see [9]). Corollary 1. 6. Let H be a finitely generated group such that the word problem for H is decidable by an N T M having F SC space complexity f (n). Then H is a subgroup of a finitely presented group G with space function equivalent to f (n) 2 .
Finally, we describe the functions n α which are (up to equivalence) space functions of groups. Our approach is to modify the proof of Savitch's theorem from [9] and the proof from [27], where the similar problem was considered for Dehn functions if α ≥ 4, and close necessary and sufficient conditions were obtained. (See also a dense series of examples with α ≥ 2 presented by Brady and Bridson [5].) Now we have α ≥ 1 in Corollary 1.7 below. Also it is remarkable that for space functions the necessary and sufficient conditions just coincide. To formulate the criterion, we call a real number α computable with space ≤ f (m) if there exists a DT M which, given a natural number m, computes a binary rational approximation of α with an error O(2 −m ), and the space of this computation ≤ f (m).
We have obtained the following criterion.
Corollary 1. 7. For a real number α ≥ 1, the function [n α ] is equivalent to the space function of a finitely presented group iff α is computable with space ≤ 2 2 m .
It follows that functions [n π ], [n √ e ], and [n α ] with any algebraic α ≥ 1 are all space functions of finitely presented groups. The space function is defined for a simply connected geodesic metric space under some weak restrictions, in particular, for the universal cover of any closed connected Riemannian manifold. (See [7] for details; we just note here that to define the space (= F F F L) function, one should consider free homotopy with the possibility of separating a loop into two loops in a bifurcation point.) It is proved in [7] (Theorem E) that if a finitely presented group G acts properly and cocompactly by isometries on such a space X, then the space function of G is equivalent to the space function of X. Since every finitely presented group is a fundamental group of a connected compact Riemannian manifold, we can use corollaries 1.4 and 1.7 to formulate one more Corollary 1. 8. For every space complexity f (n) of a DT M, there exists a closed connected Riemannian manifold M such that the space function of the universal coverM is equivalent to f (n).
In particular, if a real number α ≥ 1 is computable with space ≤ 2 2 m , then there exists such a universal coverM with space function equivalent to n α .
To some extent, our constructions can be traced back to the works of P.Novikov, Boon, Britton and other authors who invented group-theoretical interpretations of T M -s (see [26], ch. 12). The hub relator is a multiple copy of the accept configuration of a machine. Then we use the language of van Kampen diagrams and construct a disc diagram for arbitrary accepting computation. A computational disc has the hub cell in the center, and this hub is surrounded by a number of similar sectors. At first we have to estimate the sizes of computational discs, and to do this we should know the generalized space complexity of a machine. It estimates the space of computations starting with arbitrary accept configuration, not only with input ones. So we should modify the initial machine to be able to control the generalized space complexity. (The modification from [27] helps to control the time complexity but corrupts the space complexity.) The next modification is due to the symmetry of algebraic relations: since u = v always implies v = u, the algebraic version of a machine M always interprets the symmetrization of M. Thus we are concerned that the symmetrization preserves the basic characteristics, e.g. the accepted language and the space functions. We are able to do this only if the initial machine is deterministic or can be transformed to a deterministic machine under the control of basic properties. (The known symmetrization trick from [2] or [27] does not work here since it does not preserve the space complexity.) This causes the restrictions in the formulations of Theorem 1.2 and Corollary 1. 6.
The interpretation problem for groups remains much harder than for semigroups even after modifying the machine because the group theoretic simulation can execute unforeseen computations with nonpositive words. Boon and Novikov secured the positiveness of admissible configurations in discs with the help of an additional 'quadratic letter' (see [26], ch.12), but this involves a difficult control of parameters for the constructed group. A new approach was suggested in [27]. Invented by Sapir, S-machines can work with non-positive words on the tapes. Here we also construct an S-machine which is a somewhat modified composition of a convenient Turing machine with an 'adding machine' Z(A) introduced in [24]. Fortunately, Z(A) does not change the space of computations but controls positiveness of configurations.
Recall that we should not just simulate the work of a machine but construct an embedding of given group H into a finitely presented group G in the spirit of the Higman Embedding Theorem. (See [26], ch.13). To this end we use a version of the two-disc scheme applied in [23]. One can say more simply that the configurations on the boundary of discs of the first type are longer than the words written on the boundary of corresponding discs of the second type, and the surpluses are relations of the group H. Since every relation of H holds in G, we obtain a homomorphism H → G that turns out to be injective.
To estimate the space function of the group G we use van Kampen diagrams and induct on the number of hubs. The basic problem is to cut a diagram ∆ having at least two hubs into two subdiagrams with hubs, so that the perimeters of the subdiagrams do not exceed the perimeter of ∆. We have to introduce a new metric where the length of a word depends on its syllable factorization. To find a shortcut we use the mirror symmetry of sectors in discs, but unfortunately for the two-disc scheme, one of the sectors has no (mirror) copies, which creates technical obstacles. 2 We study both exact and non-accurate copying for various types of complete and incomplete sectors. A number of concepts, e.g. bands, trapezia, discs were introduced in algorithmic group theory early on, in papers [27], [20], [4] and subsequent ones, and we reproduce them in Section 3 (and partly in Sections 2 and 4) in a modern form. Some others (in particular, replica, unfinished diagram) are new.
In future work, we mean to consider similar problems for semigroups where the simulating of machines is easier, and where the definition of space function does not need cyclic shifts and fragmentation of words, i.e., the space functions are F L-functions. For groups, an F L-analogue of the results from this paper is still unachievable.

Machines
Suppose a DT M M of space complexity S(n) solves the word problem in a finitely generated group H. Then one can introduce a finitely presented group G whose relations simulate the work of M and construct a natural homomorphism H → G. This will be done in Section 3. But as mentioned in the Introduction, one can prove neither that this homomorphism is injective nor that the space function of G is bounded by S(n), unless M enjoys some additional properties. Therefore the machine M has to be amended at the beginning. In this section we substitute M for a non-deterministic S-machine S which is needed for the injectivity of the mapping H → G, and prove that S accepts the same language and has the same space complexity as the original M. Moreover, the main Lemma 2.10 of the section claims that the generalized space complexity of S does not exceed its space complexity; this property will be exploited in Subsection 5.2 in order to get an upper estimate of the space function of G.

Definitions
We will use a model of recognizing T M -s which is close to the model from [27]. Recall that a (multi-tape) T M with k tapes and k heads is a tuple is the set of states of the heads of the machine, Θ is a set of transitions (commands), s 1 is the k-vector of start states, s 0 is the k-vector of accept states. (⊔ denotes the disjoint union.) The sets X, Y, Q, Θ are finite.
We assume that the machine normally starts working with states of the heads forming the vector s 1 , with the head placed at the right end of each tape, and accepts if it reaches the state vector s 0 . In general, the machine can be turned on in any configuration and turned off at any time.
A configuration of tape number i of a T M is a word uqv where q ∈ Q i is the current state of the head, u is the word to the left of the head, and v is the word to the right of the head. A tape is empty if u, v are empty words.
A configuration U of the machine M is a word where U i is the configuration of tape i, and α i , ω i are special separating symbols. For unification of notation, we shall treat α i , ω i as heads of the machine too. These heads correspond to tapes that are always empty and do not change during a computation. An input configuration is a configuration where all tapes except the first one are empty, the configuration of the first tape (let us call it the input tape) is of the form uq, q ∈ Q 1 , u is a word in the alphabet X, and the states form the start vector s 1 . The accept configuration is the configuration where the state vector is s 0 , the accept vector of the machine, and all tapes are empty. (The requirement that the tapes must be empty is often removed for auxiliary machines which are used in construction of bigger machines.) To every θ ∈ Θ, there corresponds a command (marked by the same letter θ), i.e., a pair of sequences In order to execute this command, the machine checks if V i is a subword of the configuration of tape i for each i ≤ k, and if this condition holds the machine replaces V i by V ′ i for all i = 1, . . . , k. Therefore we also use the notation: θ : . Suppose we have a sequence of configurations w 0 , ..., w t and a word h ≡ θ 1 . . . θ t in the alphabet Θ, such that for every i = 1, ..., t the machine passes from w i−1 to w i by applying the command θ i . Then the sequence (w 0 → w 1 → · · · → w t ) is said to be a computation with history h. In this case we shall write w 0 · h = w t . The number t will be called the time or length of the computation.
A configuration w is called accepted by a machine M if there exists at least one computation which starts with w and ends with the accept configuration. We do not only consider deterministic T M s, for example, we allow several transitions with the same left side. Moreover, for non-deterministic T M s, one may correspond identically equal executions to different symbols θ, θ ′ ∈ Θ.
A word u in the input alphabet X is said to be accepted by the machine if the corresponding input configuration is accepted. The set of all accepted words over the alphabet X is called the language L M recognized by the machine M .
Let |w i | a (i = 0, ..., t) be the number of tape letters (or tape squares) in the configuration w i . (As in [27], the tape letters are called a-letters.) Then the maximum of all |w i | a will be called the space of computation C : w 0 → w 1 → · · · → w t and will be denoted by space M (C). By space M (w), we denote the minimal natural number s such that there is an accepted computation of space at most s, starting with the configuration w. If u ∈ L M , then, by definition, space M (u) is the space of the corresponding input configuration w.
The number S(n) = S M (n) is the minimum of the numbers space(u) over all words u ∈ L M , with ||u|| ≤ n. The function S(n) will be called the space complexity of the Turing machine.
The definition of the generalized space complexity S ′ (n) = S ′ M (n) is similar to the definition of space complexity but we consider arbitrary accepted configurations w with |w| a = n, not just input configurations as in the definition of S(n). It is clear that S(n) ≤ S ′ (n).
To obtain the definitions of time M (w), time M (u), time complexity T M (n) and generalized time complexity T ′ M (n), one should replace 'space' by 'time' in the previous definitions. One may assume that only input configurations involve the state letters from s 1 and only one command is applicable to the input configurations. (Indeed, given an NTM M, one can add additional states and add new commands changing the states from the new vector s 1 .) Similarly, one may assume that there is a unique accept configuration s 0 and a unique accepting command. Under both of these assumptions, we will say that the machine satisfies the s 10 -condition. These assumptions change neither the language L nor the functions S M (n) and S ′ M (n).

Machines with equivalent space and generalized space complexities
In this subsection, we construct an NTM M 2 which depends on an NTM M 1 , and prove Lemma 2.1 that allows us to replace the original machine M 1 by a machine M 2 inheritting the basic characteristics of M 1 and having equivalent generalized space complexity and space complexity. Let an NTM M 1 have k tapes, and let its first tape be the input tape. Then we add a tape numbered k + 1, which is empty for input configurations, and we organize the work of the 3-stage machine M 2 as sequential work of the following machines M 21 , M 22 , and M 23 .
The machine M 21 uses only one command θ * that does not change states and adds one square with an auxiliary letter * to the (k + 1)-st tape, i.e the command θ * has the form The machine M 21 can execute this command arbitrarily many times while the tapes numbered 1, . . . , k keep the copy of an input configuration of M 1 unchanged. Then a connecting rule θ 12 : ] changes all states of the heads and switches on the machine M 22 . The work of M 22 on the tapes with numbers 1, . . . , k copies the work of M 1 . But the extension θ ′ of every command θ of M 1 to the (k + 1)-st tape is defined so that its application does not change the current space. More precisely, if the application of θ inserts m 1 tape squares and deletes m 2 tape squares on k tapes, then θ ′ inserts m 2 − m 1 (deletes m 1 − m 2 ) squares with letter * on the (k + 1)-st tape if . Note that one cannot apply θ ′ if m 1 − m 2 exceeds the current number of squares on the tape numbered k + 1.
The connecting command θ 23 is applicable when M 22 obtains the accept configuration on the first k tapes. It changes the states and switches on the machine M 23 erasing all squares on the (k + 1)-st tape (one by one), and then M 2 accepts. (The tape alphabet of M 23 has only one letter * .) Let w be a configuration of the machine M 2 such that w · θ * is defined, or such that w is obtained after an application of the connecting command θ 12 . Then we have an input configuration on the tapes with numbers 1, . . . , k (plus several * -s on the (k + 1)-st tape). We will denote by u(w) the input word u written on the first tape. It is an input word for the machine M 1 as well, and if it is accepted by M 1 , the expression space M1 u(w) makes sense.
The connecting commands θ 12 and θ 23 are not invertible in M 2 by definition. Therefore every nonempty accepting computation of M 2 has history of the form h 1 h 2 h 3 , or h 2 h 3 , or h 3 , where h l is the history for M 2l , (l = 1, 2, 3). (To simplify notation we attribute the command θ 12 (the command θ 23 ) to h 2 (to h 3 ).) (c) If w is an accepted configuration for M 2 , and the command θ * is applicable to w, then space M2 (w) = max(space M1 (u(w)), |w| a ).
Proof. Assume that u ∈ L = L M1 . Then u ∈ L ′ = L M2 because the machine M 21 can insert sufficiently many squares (equal to space M1 (u) − ||u||) so that the accepting computation of M 1 can be simulated by M 22 . Also it is clear from the definition of M 2 , that every accepting computation for M 2 having a history h 1 h 2 h 3 as above, simulates, at stage 2, an accepting computation of M 1 with history h 2 . Therefore L ′ = L and S M1 (n) = S M2 (n).
Assume now that C : w = w 0 → · · · → w n is an accepting computation of M 2 with space M2 (C) = space M2 (w) and h ≡ h 1 h 2 h 3 is the history with the above factorization (h 1 or h 1 h 2 can be empty here). If the word h 1 is empty, then ||w 0 || ≥ · · · ≥ ||w n || by the definition of the machines M 22 and M 32 . Hence the space of this computation is equal to |w| a 3 . Then let h 1 be non-empty. It follows that the machine M 2 starts working with a copy of an input configuration of the machine M 1 , i.e., the input tape of this configuration contains an input word u = u(w), and the additional (k + 1)-st tape has m squares for some m ≥ 0. Moreover u ∈ L since the computation of M 2 is accepting. We consider two cases.
This inequality says that the additional tape has enough squares to enable M 22 to simulate the accepting computation of M 1 with the input word u. Hence there is an M 2 -computation w 0 → · · · → w n ′ with history of the form h ′ 2 h ′ 3 , and so its space, as well as the space of our original accepting computation, is |w| a .
Case 2. Suppose m < space M1 (u) − ||u||. Then there is a computation w 0 → · · · → w n ′ such that the commands of its M 21 -stage insert squares until the total number of squares of the (k + 1)-st tape becomes equal to space M1 (u) − ||u||, and then the machines M 21 and M 23 work in their standard manner. The space of this (and the original) computation is space M1 (u).
The estimates obtained in cases 1 and 2 prove statement (c) of the lemma. They also show that and statement (b) is completely proved too.

Symmetric machines
A simulation of the work of a machine M by algebraic relations leads, in fact, to the simulation of the machine M sym capable of inverting every command of M . Therefore we must control the basic properties of such symmetrization, and this will be done in Lemma 2. 4. For every command θ of a T M , given by the vector also gives a command of some T M . These two commands θ and θ −1 are called mutually inverse.
From now on we will assume that the machine M 1 we started with in Subsection 2.2 is a DTM and satisfies the s 10 -condition.
Since the machine M 1 is deterministic, the machine M 2 has no invertible commands at all. The definition of the symmetric machine M 3 = M sym 2 is the following. Suppose M 2 = X, Y, Q, Θ, s 1 , s 0 . Then by definition, M sym 2 = X, Y, Q, Θ sym , s 1 , s 0 , where Θ sym is the minimal symmetric set containing Θ, that is, with every command ; in other words, Θ sym = Θ + ⊔ Θ − , where Θ + = Θ and Θ − = {θ −1 | θ ∈ Θ}. 3 Here and in what follows we keep in mind that the difference ||w i || − |w i |a is a constant for any computation.
A computation w 0 → · · · → w t of M 3 (or of another machine) is called reduced if its history is a reduced word. If the history h ≡ θ 1 . . . θ t contains a subword θ i θ i+1 , where the commands θ i and θ i+1 are mutually inverse, then obviously there is a shorter computation w 0 → · · · → w i−1 ≡ w i+1 → · · · → w t whose space does not exceed the space of the original computation.
Lemma 2.2. Let w ≡ w 0 → w 1 → · · · → w t be an accepted reduced computation of the machine M 3 , and the command θ * be applicable to w. Then (a) the word u(w) belongs to the language L recognized by M 1 and (b) the space of this computation is at least space M1 (u(w)).
is a command of M 23 , then one can modify our accepted computation so that, for j > i + 1, every command θ(j) is a command of M 23 and ||w j || ≤ ||w j−1 ||. Hence we may assume that h has exactly one letter θ 23 followed by the commands of for i < s, and the subwords h i -s contain no connecting commands. We may also assume that the subword h 0 is empty since the command θ * does not change the subword u(w).
Since, by the s 10 -condition, only one command of the machine M 1 (and of its analog M 22 ) accepts, the last command of h s−1 is this unique command of M 22 , and so this last command is positive. Therefore if h s−1 contains a letter θ −1 , where θ is a command of M 22 , then h has a 2-letter subword θ −1 1 θ 2 , where both θ 1 and θ 2 are commands of M 22 . Hence there is a configuration w i such that both θ 1 and θ 2 are applicable to w i . This is impossible since the machine M 1 is deterministic and the history h s−1 is a reduced word. Therefore h s−1 is entirely the history of a computation of Then the word w is−j−1 has similar properties if h s−j−1 consists of the commands of M 21 or their inverses since these commands do not change the content of the tapes numbered 1, . . . , k. Otherwise the commands of h s−j−1 are commands of M 22 (and their inverses), and since this machine is deterministic, the word h s−j−1 has no subwords θ −1 1 θ 2 with positive θ 1 and θ 2 . Therefore we have h s−j−1 ≡ g ′ g ′′−1 , where both g ′ and g ′′ are (positive) histories of M 22 -computations. This implies the equality ( where W is−j−1 and W is−j are the input configurations for the machine M 1 with inputs u(w is−j−1 ) and u(w is−j ), respectively. (Here we use identical letters for the corresponding commands of M 1 and M 22 .) The machine M 1 is deterministic, and so the accepted computation for W is−j must look like W is−j → · · · → W is−j · g ′′ → . . . , and consequently, the configuration W is−j · g ′′ is accepted by M 1 . Therefore we can construct the accepted computation W is−j−1 → · · · → W is−j−1 · g ′ = W is−j · g ′′ → . . . for M 1 , and so the word u(w is−j−1 ) belongs to L, as desired.
The constructed accepted computation of M 1 is decomposed in two parts. It follows from the definition of M 22 that the space of the first part is majorized by the space of the The second part occurs in a deterministic accepted M 1 -computation with input u(w is−j ), and so, by the inductive hypothesis, the space of this part does not exceed the space of the Since w ≡ w i1 , the lemma is proved by induction on j.
Lemma 2. 3. The machines M 1 and M 3 recognize the same language. The generalized space complexities S ′ M2 (n) and S ′ M3 (n) are equivalent. Proof. We recall that every computation of the machine M 2 is also a computation of M 3 . Therefore the first statement follows from Lemmas 2.1 (a) and 2.2 (a).
To prove the second part, it suffices to prove that for every accepted configuration w of (1) Consider an accepted computation w ≡ w 0 → · · · → w t of M 2 whose space is equal to space M2 (w). If the first command of this computation is not a command of M 21 , then ||w 0 || ≥ ||w 1 || ≥ · · · ≥ ||w t ||, and therefore space M2 (w) = |w| a ≤ space M3 (w), so one can choose w ′ ≡ w. If the first command is a command of M 21 , then by Lemmas 2.1(c) and 2.2 (2) Now consider a reduced accepted computation w ≡ w 0 → · · · → w t of M 3 whose space is equal to space M3 (w). If the first command (or its inverse) is a command of M 23 , then the commands of the shortest accepted computation with minimal space just erase squares. Hence space M3 (w) = |w| a = space M2 (w), and we can choose w ′ equal to w. If the first command is a command of M sym 21 , then by Lemma 2.2 (a), the word w is accepted by M 2 . Since every command of M 2 is a command of M 3 , we have space M3 (w) ≤ space M2 (w), and again it suffices to set w ′ ≡ w.
Thus, we may assume that the first command of our computation (or its inverse) is a command of M 22 . Therefore the history h of this computation has a prefix h ′ h ′′ with non-empty h ′′ , where every command of h ′ is a command of M sym 22 and either every command of h ′′ (or the inverse command) is a command of M 21 or every command of h ′′ is a command of M 23 . In the latter case, we may assume that h ≡ h ′ h ′′ and ||w 0 || ≥ ||w 1 || ≥ · · · ≥ ||w t ||, and so space M3 (w) = |w| a ≤ space M2 (w), and w ′ ≡ w.
In the former case, we set w ′ = w · h ′ and note that ||w|| = ||w 1 || = · · · = ||w ′ || since the commands of the computation w → · · · → w ′ with history h ′ do not change the number of tape squares. In , as desired; and the lemma is proved. 4. For every DTM M recognizing a language L and having space complexity S(n) (for every NTM M recognizing a language L and having FSC space complexity f (n)), there exists an NTM M ′ with the following properties. 1. The machine M ′ recognizes the language L.

2.
M ′ is symmetric. 3. The space and the generalized space complexities of M ′ are equivalent to S(n) (respectively, are equivalent to f (n) 2 ).

For every command
, and so all these functions are equivalent. Finally, we modify M 3 to obtain property 4. For example, if for a one-tape machine we have a command aq → bq ′ , then we introduce a new state letter q ′′ and replace this command by two commands aq → q ′′ and q ′′ → bq ′ . It is easy to see that the obtained machine M ′ satisfies property 4 and inherits properties 1 − 3 from M 3 .
If M is non-deterministic, then we first use that the function f (n) is F SC, and therefore, by Savitch's theorem ( [9], Theorem 1.30), there exists a DT M M 1 accepting the same language L with space complexity equivalent to f (n) 2 . So the replacement of S(n) by f (n) 2 in the previous paragraph provides the proof of the non-deterministic version of the lemma.

S-machines
By Lemma 3.9, computations of a machine are faithfully represented by special van Kampen diagrams (trapezia) over the group G. However this statement is true for S-machines but it is false for standard T M -s.
Ordinary Turing machines work with positive words and they can see some letters on the tape near the position where the head is. The command executed by the machine depends not only on the state of the head but also on the letter(s) observed by the head. In contrast, S-machines introduced in [SBR] work with words in group alphabets and they are almost "blind", i.e., the heads do not observe the tape letters. But the heads can "see" each other if there are no tape letters between them. We will use the following precise definition of an S-machine as a rewriting system.
Let k be a natural number. Consider a language of admissible words. It consists of words of the form where q i are letters from disjoint sets Q i (i = 1, . . . , k + 1), u i are reduced words in a group alphabet Y i , (i.e. every letter a belongs to it iff the inverse letter a −1 does) and the sets Y = ⊔Y i and Q = ⊔Q i are finite. The letters from Q are called state letters, and the letters from Y are tape letters. Notice that in every admissible word, there is exactly one representative of each Q i and these representatives appear in this word in the order of the indicees of Q i . (i.e., unlike [24], we consider only the regular order of Q i -s in admissible words).
There is a finite set of commands (or rules) Θ. To every θ ∈ Θ, we associate two sequences of reduced words from the free group The words U i , V i satisfy the following restriction: (*) For every i = 1, ..., k + 1, the words U i and V i have the form Sometimes we will denote the rule θ by [U 1 → V 1 , ..., U k+1 → V k+1 ]. This notation contains no information about the sets Y i (θ). In most cases it will be clear what these sets are. In the S-machines used in this paper, the sets Y i (θ) will mostly be equal to either Y i or ∅. By default Y i (θ) = Y i . We will use the notation i for a part of a rule when the corresponding Y i (θ) is empty (a similar notation has been used in [22]). In particular, u i and u ′ i are empty words in this case.
To apply an S-rule θ to an admissible word W ≡ q 1 w 1 q 2 . . . w k q k+1 means to check if every w i is a word in the alphabet Y i (θ) and then, if W satisfies this condition, to simultaneously replace subwords U i by subwords V i (i = 1, . . . , k + 1). This replacement can be performed in the form followed by reducing the resulted word. The following convention is important in the definition of Smachine: After every application of a rewriting rule, the word is automatically reduced. The reducing is not considered a separate step of an S-machine.
The definitions of computation, its history, input admissible words, the accept word, the language of admissible words, space of a computation, space and generalized complexities, time and generalized time complexities of an S-machine are similar to those for a T M . (One should replace the word "configuration" by "admissible word" in the definitions.) Although S-machines are usually highly non-deterministic, they are better adapted for simulating by finitely presented groups than ordinary T M s. On the other hand, it is mentioned in [27] that every symmetric NTM M can be viewed as an S-machine S(M ): just interpret the commands of the Turing machine as S-rules. (For example, the part of a rule of the form aq → bq ′ is interpreted as q → a −1 bq ′ , and the part of the form α j q → α j q ′ is interpreted by the pair α j ℓ → α j , q → q ′ .) Unfortunately, the language recognized by S(M ) is in general much bigger than the language recognized by M since M works with a positive tape alphabet only. Nevertheless the following statement is true: Every positive admissible word W of S(M ) is a configuration of the Turing machine M . Assume that a ruleθ of S(M ) corresponding to a command θ of M is applicable to this word W and the word W ·θ is positive. Recall that by property 4 of Lemma 2.4, θ involves at most one tape letter (e.g., it cannot replace a tape letter by a tape letter or have a part of the form aq → aq ′ ). Therefore the positiveness of both W and W ·θ implies that the application ofθ just coincides with the application of θ. The second statement of the lemma follows.

Composition with an adding machine
One the one hand, S-machines much better fit for simulating their work by group relations than ordinary Turing machines (and moreover, S-machines are themselves treated in [24] as HNN-extensions of a free group with basis Y ∪ Q). On the other hand, when a symmetric Turing mashine M starts working as an S-machine (see Subsection 2.4), the language of (positive) accepted words can enlarge uncontrolably. Therefore we will consider a composition S of a symmetric Turing machine M and an auxiliarly S-machine Z(A) from [24]. Fortunately, this composition is an S-machine which inherits the language, the space function and the generalized space function of M.
In [24], the main duty of Z(A) was the exponential slowing down of basic computations, while now we will mainly use the capacity of Z(A) (observed in Lemma 3.25 (2) [24]) to check whether an admissible word is positive or not.
The tape alphabet of Z(A) consists of an alphabet A ±1 and two copies A ±1 For every letter a ∈ A let a 1 , a 2 denote its copies in A 1 , A 2 . Thus, the set of state letters is The machine Z(A) has the following positive rules (a is an arbitrary letter from A below). The following comments explain the meanings of these rules.
Comment. The state letter p(1) moves left searching for a letter from A and replacing letters from A 1 by their copies in A 2 .
Comment. When the first letter a of A is found, it is replaced by a 1 , and p(1) turns into p(2).
Comment. The state letter p(2) moves toward R.
Comment. p(2) and R meet, and the cycle starts again.
Comment. If p(1) never finds a letter from A, then the cycle ends, and p(1) turns into p(3); p and L must stay next to each other in order for this rule to be executable.
Here we will not explicitly use the form of these rules and will rather formulate, in Lemma 2.6, the required properties of Z(A) obtained in [24]. Nevertheless the reader can see how this simple S-machine works if it 'agrees' to apply the rules in the order recommended in the comments. In particular, if the input word is La n p(1)R (n > 0) for some a ∈ A, then every maximal subcomputation ending with r 21 and having no other rules r 21 in its history, changes the word over Y 1 in the manner that one changes a binary n-digit number when one adds 1 to it. (Therefore the machine Z(A) is called adding.) However Lemma 2.6 is not so obvious since (1) the machine Z(A) is non-deterministic and (2) it works with arbitrary reduced tape words (while Turing machines deal with positive words only).
If w a word in the alphabet A ∪ A ±1 1 ∪ A ±1 2 , then its projection onto A ±1 takes every letter to its copy in A.
Lemma 2. 6. The following properties of the machine Z(A) hold.
(1) Every positive input word u in the alphabet A is accepted by a canonical computation of Z(A) with a positive history and such that all words appearing in this computation have equal lengths. ( , the projections of the words uv and u ′ v ′ onto A are freely equal. In particular, u ≡ u ′ if the words v and v ′ are empty and the words u and u ′ contain no letters from A ±1 1 . (3)), w ≡ w 0 → · · · → w t is a reduced computation, w t contains the subword p(3)R (respectively, p(1)R), and all a-letters of w 0 and w t are from A ±1 , then u is a positive word and all words of the computation have the same length. The length of this computation is at least 2 ||u|| .
(6) There is no reduced computation w ≡ w 0 → · · · → w t of length t ≥ 1 such that both w 0 and w t contain p(1)R or both of them contain p(3)R and all a-letters of w 0 and w t belong to A ±1 .

Proof.
(1) To construct the canonical computation one should just use the above comments.
(2) This claim is the statement of Lemma 3.18 of [24]. Roughly speaking, the composition of machines M and Z(A) was constructed in [24] as follows. After an application of (the copy of) a positive command from M, the copies of Z(A) successivly implement their accepting computations on each of the tapes, and then (the copy of) the next command of M is applied. Thus Z(A) just (exponentially) slows down the computations of M.
Here we somewhat modify the definition of the composition of a symmetric Turing machine M and the adding machine Z(A). The difference is that machines of the form Z(A) will work not only after the application of every command of M but before applications of commands from M as well. This makes it possible to simulate the work of any symmetric NTM, not only S-machines as was done in [24]. The aim of following construction is to obtain an S-machine S which recognizes the same language and has the same space and generalized space complexity as the symmetric Turing machine M. The S-machine constructed in [SBR] cannot serve in the present paper since the space and the generalized space complexities of that machine are equivalent to its time complexity.

Consider a symmetric NTM
and with the s 10 -condition. The set Θ is a disjoint union of positive and negative commands: Θ = Θ + ⊔ Θ − . Let S(M ) be the associated S-machine defined before Lemma 2. 5. We will assume that the admissible words of S(M ) are of the form To define the composition S = M • Z of M and Z(A), we will insert a p-letter between any two consecutive q-letters k i , k i+1 in an admissible word of S(M ), so that to treat any subword k i ...p...k i+1 as an admissible word for a copy of Z(A). In particular, the start and the accept words of S are obtained from, respectively, the start and accept words of M.
First, for every i = 1, ..., l, we make two copies of the alphabet Y i of S(M ) (i = 1, ..., l): Y i,1 and Y i,2 . The set of state letters of the new machine is The set of tape letters is the components of this union will be denoted byȲ 1 , . ..,Ȳ 2l . The set of positive rulesΘ of M • Z is a union of the set of modified positive rules of S(M ) and of the positive rules of Z i (θ, −) and Z i (θ, +) (θ ∈ Θ, i = 1, ..., l) which are copies of the machines Z(Y i ) (also suitably modified).
More precisely, suppose a positive command θ of S(M ) differs from the unique start and accept commands of S(M ) and has the form is replaced by the rule of the form respectively, by replacing p(j) with p i (θ, j + ), L with k ′ i and R with k ′ i+1 , and for s = i,Ȳ 2s−1 (τ i (θ, +)) = Y i . If θ is the start (the accept) command of S(M ), then we introduce only machines Z i (θ, +) (only Z i (θ, −), respectively), and replace the letters p j (θ, 3 − ) (replace p j (θ, 1 + ), resp.) by p j in the above definition of the commandθ.
In addition, we need the following transition rules ζ(θ, −) and ζ(θ, +) that transform all p-letters from and to their original forms.
Thus, in order to simulate a computation of the symmetric T M M (and of the S-machine S(M )) consisting of a sequence of applications of rules θ 1 , θ 2 , ..., θ s , we first apply all rules corresponding to θ 1 (before and afterθ), then all rules corresponding to θ 2 , then all rules corresponding to θ 3 , etc. The language L S of S consists of some words u in the alphabet Y 1 . In particular, every admissible input word of S is of the form Σ(u) ≡ k 1 up 1 k 2 p 2 k 3 . . . k l−1 p l k l . We denote the accept word of S by Σ 0 = Σ 0 (S).
The modified rulesθ of S(M ) will be called basic rules of the S-machine S. (2) If the history h of a reduced computation C : w 0 → · · · → w t of S contains no basic commands, then ||w i || ≤ max(||w 0 ||, ||w t ||) for every i (0 ≤ i ≤ t). If the only basic letter of h is the last one, then ||w i || ≤ ||w 0 || for i < t.

Computations of machine S
(3) If a computation C of S is reduced then C S(M) is also reduced.
(4) For any reduced computation C : w 0 → · · · → w t of S starting and ending with basic rules we have space S (C) = space S(M) (C S(M) ).
(5) For every positive reduced computation C : w 0 → · · · → w t of the machine S(M ), there is a canonical reduced computation C S of S whose history starts and ends with basic commands, such that  (2) We start with the first statement. If the history of the whole computation consists of the commands of one machine Z i (θ, ±), then the statement follows from Lemma 2.6(3). Otherwise it has s ≥ 1 admissible words w i1 , . . . , w is , such that the history of every subcomputation C 0 : w 0 → · · · → w i1 , . . . , C j : w ij → · · · → w ij+1 , . . . , C s : w is → · · · → w t either consists of the commands of some Z i (θ, ±), being a maximal subcomputation with this property, or is equal to a transition letter ζ(θ, ±) ±1 for some θ. In the former case, all the admissible words participating in C j have the same length for j ∈ [1, s − 1] by Lemma 2.6 (5,6). The same is clearly true in the latter case. Therefore it suffices to prove that the lengths of the admissible words do not decrease in the subcomputations C −1 0 and C s . Let us consider C s only, assuming that it is a computation of some machine Z i (θ, ±). It corresponds to a computation LupvR ≡ W 0 → · · · → W m ≡ Lu ′ pv ′ R of a machine of the form Z(A) with m = t − i s . Since s ≥ 1 and C s is a maximal subcomputation corresponding to Z i (θ, ±), we must have ||v|| = 0. (Otherwise only commands of Z i (θ, ±) could be applied to w is , and so w is−1 → w is → · · · → w t is a longer computation of the same Z i (θ, ±).) Hence the projection of the word uv onto A is reducible, and so ||W 0 || ≤ ||W i || for every i ∈ [1, m] by Lemma 2.6 (2). Therefore either all W i -s have equal lengths or ||W 0 || ≤ ||W 1 || ≤ · · · ≤ ||W m || by Lemma 2.6 (3). In any case, we have ||W 0 || ≤ · · · ≤ ||W m ||. This implies ||w is || ≤ · · · ≤ ||w t ||, as required.
The proof of the second claim is similar: The computation w i1 → · · · → w is is a product of subcomputations C j -s which preserve the lengths of w i -s, while C 0 and C s cannot increase the space.
(3)Assume that τ 1 . . . τ t is a history of a computation w 0 → · · · → w t of S, where τ 1 corresponds to a positive command θ of S(M ), τ t corresponds to θ −1 , and the other rules are not basic. Then the history h ′ of the computation w 1 → · · · → w t−1 is non-empty and h ′ ≡ h 1 is a maximal subword of h ′ corresponding to the work of some machine Z j(i) (θ i , ±). Since h 1 and h s correspond to Z 1 (θ, +), either s = 1 or there is i such that both h i−1 and h i+1 correspond to the same Z j(i)±1 (θ ′ , ±). It follows that the computation with history h i satisfies the assumption of Lemma 2.6 (6), a contradiction.
(4) Property (4) follows from (2) and the definition of the computation C S(M) .
(5) Given C, the computation C S with the same space and with property (C S ) S(M) = C is briefly described above at the end of subsection 2.5 and with more details (though with submachines Z i (θ, +) but without Z i (θ, −)) in subsection 3.7 of [24].
Lemma 2. 8. Let C : w 0 → · · · → w t be a reduced computation of S such that the first commandθ 1 and the last commandsθ t are basic ones, and C S(M) ≡ W 0 → · · · → W s with s ≥ 2. Then the subcomputation W 1 → · · · → W s−1 is positive and t − 2 ≥ 2 |W1|a/l . Ifθ 1 (ifθ t ) is a start (is an accept) command, then the word W 0 (respectively, W t ) is also positive.

Proof.
To justify the first claim of the lemma, it suffices to prove that the word W 1 is positive under the assumption that there are no basic commands in the computation w 1 → · · · → w t−1 , and the word W 1 corresponds to w 1 . Therefore it suffices to prove that the word w 1 is positive.
We first assume that the first commandθ of the history of C is a positive basic command. Thenθ switches on the machine Z 1 (θ, +). Since s ≥ 2, this machine must complete its work before the computation C ends. The computation of Z 1 (θ, +) cannot be empty since otherwiseθ would be followed byθ −1 because w 1 involves the state letter p 1 (θ, 1 + ). But this would contradict the reducibility of C.
Hence, by Lemma 2.6 (6), the work of Z 1 (θ, +) sooner or later leads to an admissible word w i containing the state letter p 1 (θ, 3 + ). By Lemma 2.6(5), the subword of w 1 of the form k 1 u 1 p 1 (θ, 1 + )k 2 is positive, and the time of the work of Z 1 (θ, +) is at least 2 ||u1|| . Then the machine Z 1 (θ, +) finishes working and switches on the machine Z 2 (θ, +), whose work similarly provides the positiveness of the subword of w 1 having form k 2 u 2 p 2 (θ, 1+)k 3 . Finally we obtain that w 1 is covered by positive subwords, and so it is positive itself. The time of the work of all Z 1 (θ, +), . . . , Z l (θ, +) is at least 2 |W1|a/l since |W 1 | a = l j=1 ||u j ||. If the commandθ −1 is positive, then it switches on the machine Z l (θ, −), and we first obtain the positiveness of k l u l p l (θ, 3 − )k l+1 , then the positiveness of k l−1 u l−1 p l−1 (θ, 3 − )k l , and so on.
Since start and accept commands leave tape letters unchanged, the second statement follows from the positiveness of the word W 1 (of the word W t−1 ). The lemma is proved.

Proof.
(1) Assume that a word u belongs to the language L recognized by M . By Lemma 2.5, this word belongs to the language of S(M ), and the accepted computation C is positive. By Lemma 2.7(5), u belongs to the language L S of S. Now suppose u belongs to L S , and C is the accepted computation. Then the computation C S(M) is positive by Lemma 2.8, and therefore this computation is also an accepted computation of the machine M by Lemma 2.5, and so u ∈ L.
(2) The above argument shows that if a reduced computation C : w 0 → · · · → w t of S accepts an admissible input word w 0 , then the computation C S(M) is a positive accepting computation of both S(M ) and M, and so S M (n) ≤ S S (n) by Lemma 2.7 (4). On the other hand, every accepted input configuration W of M is accepted by S(M ). By Lemma 2.7(5), it has a copy accepted by S, and moreover, the accepting computations of M and S need the same space. Therefore S M (n) ≥ S S (n).
(3) Assume that C : W ≡ W 0 → · · · → W t is an accepting computation of M such that space M (C) = space M (W ), and C S : w ≡ w 0 → · · · → w s . For the given w and w s , we also consider a reduced computation C ′ : w → · · · → w s of S with minimal space and the computation ( If t ′ = 0 (no basic rules), then W 0 ≡ W t by Lemma 2.6 (2), and so space M (C) = space S (C ′ ). Then we assume that t ′ > 0 and note that ||W ′ 0 || = ||W 0 || and ||W ′ t ′ || = ||W t || by Lemma 2.6 (2) since the corresponding admissible words of S can be connected by computations without basic rules. Since C ′

S(M)
is positive by Lemma 2.8, it is also an accepted computation of the machine M by Lemma 2.5. Therefore, by Lemma 2.7 Since |w| a = |W | a , the last inequality proves that S ′ M (n) ≤ S ′ S (n) for every n. Now we consider any accepting computation C : w ≡ w 0 → · · · → w t of S with |w| ≤ n, such that space S (C) = space S (w). Without loss of generality, we may also assume that space S (C[m]) = space S (w m ) for every subcomputation C[m] of the form w m → · · · → w t . Let ρ i1 ≡θ 1 , . . . , ρ t ≡ ρ is ≡θ s be the basic commands of the history h ≡ ρ 1 . . . ρ t of C, let C ′ be the subcomputation of C with history h ′ ≡ ρ 1 . . . ρ i1−1 , and let C ′′ have history h ′′ ≡ ρ i1 . . . ρ t ; and so h ≡ h ′ h ′′ and C = C ′ C ′′ .

n). This inequality together with the inequality S
, and C S : w ≡ w 0 → · · · → w s . For the given w and w s , we also consider a reduced computation C ′ : w → · · · → w s ′ of S with minimal time s ′ and the computation (C ′ ) S(M) :   3. Every command of S or its inverse inserts/deletes at most one letter on the left and at most one letter on the right of every state letter. 4. The machine S satisfies the s 10 -property. 5. The unique start command is of the form

Groups and diagrams
Recall that given a finitely generated group H, we want to construct an embedding of H into a finitely presented group G whose space function is equivalent to the deterministic space complexity of the algorithmic word problem in H. The relations of G are presented in this section.

Construction of the embedding
Let H be a finitely generated group with solvable word problem. To prove Theorem 1.2, we will suppose that a Turing machine M solves the word problem in H. This implies that H has a finite set of generators {a 1 , .., a m }, and a word w in these generators is accepted by M iff w = 1 in H. Here we consider only positive words in the generators since M can work with positive words only, and so we assume that the set of generators is symmetric: for every a i , there is a generator a j such that a i a j = 1 in H. Thus every relation holding in H follows from relations in a 1 , . . . , a m , with positive left-hand side.
Further we will assume that one of the hypotheses (a), (b) of Lemma 2.10 holds. Therefore we also have the S-machine S = X, Y, Q, Θ, s 1 , s 0 provided by that lemma. The input alphabet of S is the system of generators a 1 , . . . , a m of the group H together with the symbols of the inverse letters {a −1 1 , .., a −1 m }. Let S ′ S (n) be the generalized space complexity of S. We denote byŜ = X ,Ŷ ,Q,Θ,ˆ s 1 ,ˆ s 0 a copy of the S-machine S. We will assume thatX = X, Y ∩Ŷ = X, Q ∩Q consists of the state letters of the vectors s 1 , and s 0 , and Θ ∩Θ = ∅. Therefore the machines S andŜ have the same admissible input words.
The copy of a command θ of S is calledθ. Similar notation is used for the a-letters fromŶ and q-letters fromQ. The set of rules of the machine and admissible words of S ∪Ŝ is, by definition, the union of the corresponding sets for S and forŜ. The following lemma is a clear consequence of these definitions.
Lemma 3. 1. The sets of accepted admissible input words of the machines S,Ŝ and S ∪Ŝ coincide, and so these machines recognize the same language L. They also have equal space complexities and equal generalized space complexities. For every accepting computation w 1 → . . . of S ∪Ŝ, there is an accepting computation w 1 → . . . of either S orŜ whose length and space does not exceed the length and space of the original computation.
We consider a group G(S, L) associated with the machine S. Furthermore as in [23], we need a very similar groupĜ(S, L) to produce a group embedding required for the proof of Theorem 1. 2. To define G(S, L) we need many copies of every letter used in the work of S; this enables us to apply a kind of hyperbolic argument for the hub structure of van Kampen diagrams. Moreover, the copies alternate with "mirror copies"; this trick will be used in Section 4.
Therefore we "multiply" the machine S as follows. For some even L ≥ 40, we introduce L/2 copies S = S 1 , S 3 , . . 10.
The finite set of generators of the group G(S, L) consists of q-letters corresponding to the states of S(L), a-letters corresponding to the tape letters of S(L), and θ-letters corresponding to the commands. Thus the set of generators consists of the set of (state) q-letters Q(S, L) = ⊔ N i=1 Q i , the set of (tape) a-letters Y = ⊔ N i=1 Y i including the input alphabet X = ⊔ L i=1 X i ; and the θ-letters from N copies of Θ + , i.e., for every θ ∈ Θ + , we have N generators θ 1 , . . . , θ N .
The relations of the group G(S, L) correspond to the rules of the machine S(L); for every θ = [ for all a ∈Ȳ j (θ). (Here θ N +1 ≡ θ 1 . ) The first type of relations will be called (θ, q)-relations, the second type -(θ, a)-relations.
The definition of the machineŜ(L) is similar to that of S(L) but the admissible words are of the form k 1Ŵ1 k 2Ŵ2 ...k LŴL , where everyŴ i is obtained from W i after replacement of every letter x by its copyx, and for i = 1, we, in addition, delete all a-letters, i.e., the wordŴ 1 has no a-letters. (In other words, instead of the first copy of S, we use the "machine" with the same state letters but having no tape letters.) In particular, the wordΣ(u, L) is obtained from Σ(u, L) by omitting the first occurrence of the input word u. Again, it is obvious that properties (1) -(5) from Lemma 2.10 hold for the machinê S(L) as well. The relations of the groupĜ(S, L) arê U iθi+1 =θ iVi , i = 1, ..., N,θ jâ =âθ j (3.5) for allâ ∈ Y j (θ) and j ∈ [K + 1, N ]. We will also use the combined S-machine S(L) ∪Ŝ(L). Its admissible words are either the admissible words for S(L) or the admissible words forŜ(L), the set of rules is the union of the rules for S(L) and S(L). Note that the machine S(L) ∪Ŝ(L) does not satisfy the s 10 -condition. In Section 4 we show that this homomorphism is injective.

Minimal diagrams
As in [21], we enlarge the set of defining relations of the group G by adding some consequences of defining relations. Taking into account Lemma 3.2, we include all cyclically reduced relations of the group H generated by the set {a 1 , . . . , a m }, i.e., all non-empty cyclically reduced words in {a ±1 1 , . . . , a ±1 m } which are equal to 1 in the group H. These relations will be called H-relations.
We denote by G 1 the group given by all generators of the group G, by all H-relations, and by all defining relations of G except for the hub-relation (3.6).
Recall that a van Kampen diagram ∆ over a presentation P = B | R (or just over the group P ) is a finite oriented connected and simply-connected planar 2-complex endowed with a labeling function φ : E(∆) → B ±1 , where E(∆) denotes the set of oriented edges of ∆, such that φ(e −1 ) ≡ φ(e) −1 . Given a cell (that is a 2-cell) Π of ∆, we denote by ∂Π the boundary of Π; similarly, ∂∆ denotes the boundary of ∆. The labels of ∂Π and ∂∆ are defined up to cyclic permutations. An additional requirement is that the label of any cell Π of ∆ is equal to (a cyclic permutation of) a word R ±1 , where R ∈ R. Labels and lengths of paths are defined as for Cayley graphs.
The van Kampen Lemma states that a word W over the alphabet B ±1 represents the identity in the group P if and only if there exists a diagram ∆ over P such that φ(∂∆) ≡ W ( [17], Ch. 5, Theorem 1.1).
We will study diagrams over the groups G and G 1 . The edges labeled by state letters ( = q-letters) will be called q-edges, the edges labeled by tape letters (= a-letters) will be called a-edges, and the edges labeled by the letters from Θ andΘ (= θ-letters) are θ-edges. The cells corresponding to Relation (3.6) are called hubs, the cells corresponding to Relations (3.4) and (3.5) are called (θ, q)-cells if they involve q-letters, and they are called (θ, a)-cells otherwise. The cells corresponding to arbitrary relations of H are H-cells.
The resulting presentation and the diagrams over it are graded by the ranks of defining words and cells as follows. The hubs are the cells of the highest rank, the rank of (θ, q)-cells is higher than the rank of H-cells (and in Lemma 5.1, (θ, k i )-cells, with i = 1, 2 are higher than other (θ, q)-cells), and the (θ, a)-cells are of the lowest rank.
If ∆ and ∆ ′ are diagrams over G, then we say that ∆ has a higher type than ∆ ′ if ∆ has more hubs, or if the numbers of hubs are the same, but ∆ has more cells which are next in the hierarchy, and so on.
Clearly the defined partial order on the set of diagrams satisfies the descending chain condition, and so there is a diagram having the smallest type among all diagrams with the same boundary label. Such a diagram is called minimal.

Bands and trapezia
From now on, we shall mainly consider minimal van Kampen diagrams. In particular the diagrams are reduced, i.e., they contain no pair of cells that have a common edge e such that the boundary paths of these cells starting with e have the same label. As in [27], [4], to explore van Kampen diagrams over the groups G and G 1 we shall use their simpler subdiagrams such as bands and trapezia.
Here we repeat some necessary definitions.
Definition 3. 3. Let Z be a subset of the set of generators X of the group G. A Z-band B is a sequence of cells π 1 , ..., π n in a van Kampen diagram such that • Every two consecutive cells π i and π i+1 in this sequence have a common edge e i labeled by a letter from Z.
• Each cell π i , i = 1, ..., n has exactly two Z-edges, e i−1 and e i (i.e. edges labeled by a letter from Z).
• If n = 0, then B is just a Z-edge.
The counterclockwise boundary of the subdiagram formed by the cells π 1 , ..., π n of B has the factorization e −1 q 1 f q −1 2 where e = e 0 is a Z-edge of π 1 and f = e n is an Z-edge of π n . We call q 1 the bottom of B and q 2 the top of B, denoted bot(B) and top(B). Top/bottom paths and their inverses are also called the sides of the band. The Z-edges e and f are called the start and end edges of the band. If n ≥ 1 but e = f, then the Z-band is called a Z-annulus.
We say that a Z 1 -band and a Z 2 -band cross if they have a common cell and Z 1 ∩ Z 2 = ∅. We shall call a Z-band maximal if it is not contained in any other Z-band.
We will consider q-bands where Z is one of the sets Q i of state letters for the machine S(L) ∪Ŝ(L), θ-bands for every θ ∈ Θ, and a-bands where M = {a} ⊆ Y .
The convention is that a-bands do not contain q-cells, and so they consist of (θ, a)-cells only.
The papers [20], [4], [23] contain the proof of the following lemma in a more general setting. (In contrast to Lemmas 6.1 [20] and 3.11 [23], we have no x-cells here.) Lemma 3. 4. A minimal van Kampen diagram ∆ over G 1 has no q-annuli, no θ-annuli, and no a-annuli. Every θ-band of ∆ shares at most one cell with any q-band and with any a-band.
If W ≡ x 1 ...x n is a word in an alphabet X, Y is another alphabet, and φ : X → Y ∪ {1} (where 1 is the empty word) is a map, then φ(W ) ≡ φ(x 1 )...φ(x n ) is called the projection of W onto Y . We shall consider the projections of words in the generators of G onto Θ⊔Θ (all θ-letters map to the corresponding element of Θ ⊔Θ, all other letters map to 1), and the projection onto the alphabet {Q 1 ⊔ · · · ⊔ Q N } (every q-letter maps to the corresponding Q i , all other letters map to 1).
Definition 3. 5. The projection of the label of a side of a q-band onto the alphabet Θ ±1 is called the history of the band. The projection of the label of a side of a θ-band onto the alphabet {Q 1 , ..., Q n } is called the base of the band. Similarly we define the history of a word and the base of a word. The base of a word W is denoted by base(W ). It will be convenient to use representatives of Q 1 , . .., Q N in base words. For example, if k ∈ Q 1 , q ∈ Q 2 , we shall say that the word kaq has base kq instead of Q 1 Q 2 . Definition 3. 6. Let ∆ be a minimal van Kampen diagram over G 1 which has boundary path of the form p −1 1 q 1 p 2 q −1 2 where: (T R 1 ) p 1 and p 2 are sides of q-bands, (T R 2 ) q 1 , q 2 are maximal parts of the sides of θ-bands such that φ(q 1 ), φ(q 2 ) start and end with q-letters, (T R 3 ) for every θ-band T in ∆, the labels of top(T ) and bot(T ) are reduced.
Then ∆ is called a trapezium. The path q 1 is called the bottom, the path q 2 is called the top of the trapezium, the paths p 1 and p 2 are called the left and right sides of the trapezium. The history of the q-band whose side is p 2 is called the history of the trapezium; the length of the history is called the height of the trapezium. The base of q 1 is called the base of the trapezium.

Remark 3.7.
(1) Property (T R 3 ) is easy to achieve: by folding edges with the same labels having the same initial vertex, one can make the boundary label of a subdiagram in a van Kampen diagram reduced, see [27].
(2) Notice that the top (bottom) side of a θ-band T does not necessarily coincide with the top (bottom) side q 2 (side q 1 ) of the corresponding trapezium of height 1, and q 2 (q 1 ) is obtained from top(T ) (resp. bot(T )) by trimming the first and the last a-edges if these paths start and/or end with a-edges. We shall denote the trimmed top and bottom sides of T by ttop(T ) and tbot(T ). By definition, for arbitrary θband T , ttop(T ) is obtained by such a trimming only if T starts and/or ends with a (θ, q)-cell; otherwise ttop(T ) ≡ top(T ). The definition of tbot(T ) is similar.
The trapezium ∆ is said to be an i-sector if the labels of its top and bottom paths are i-sector words.
Lemma 3. 8. Let Γ be an i-sector, where i = 1. Then the sides p 1 and p 2 are the sides of maximal k iand k i+1 -bands K i and K i+1 of Γ, respectively. If an edge e of Γ, belongs to neither K i nor K i+1 , then φ(e) ∈ A i . In particular, Γ has no H-cells.
Proof. The first assertion follows from Lemma 3.4, since the labels of the top and bottom of an i-sector are of the form k i . . . k i+1 . Then, by the same lemma, the second assertion is true for the edges of all θ-cells since every maximal θ-band must connect K i and K i+1 . Since i = 1 and the labels of the boundary edges of H-cells belong to A 1 , the H-cells of Γ have no edges in common either with the θ-cells of Γ or with the boundary ∂Γ. Now the minimality of the diagram Γ implies that Γ has no H-cells at all, and so the lemma is proved.
The following lemma claims that every i-sector, i = 1, simulates the work of S(L)∪Ŝ(L). It summarizes the assertions of Lemmas 6.1, 6.3, 6.9, and 6.16 from [23]. For the formulation (1) below, it is important that S (and S ∪Ŝ) is an S-machine. The analog of this statement is false for Turing machines. (See [21] for a discussion.) Lemma 3.9. (1) Let ∆ be an i-sector for some i = 1 with history θ 1 . . . θ d . Assume that ∆ has consecutive maximal θ-bands T 1 , . . . T d , and k i W j k i+1 and k i W ′ j k i+1 are the bottom and the top labels of T j , (j = 1, . . . , d). Let U j (resp. V j , i = 1, . . . d) be the copies of W j (resp. W ′ j ) in the alphabet of the machine S ∪Ŝ. Then U j , V j are admissible words for S ∪Ŝ, and (2) For every reduced computation U · h ≡ V of S (ofŜ) with |h| ≥ 1 and for every i ∈ [1, N ] (for every i ∈ [2, N ]), there exists an i-sector ∆ with history h and without H-cells, whose bottom and top labels are k i U ′ k i+1 and k i V ′ k i+1 , where U ′ (resp. V ′ ) is the copy or the mirror copy of the word U (of V ) in the alphabet A i . These copies are mirror copies iff i is even.
We call an i-sector accepted if its top label k i V ′ k i+1 is the i-sector subword of the word Σ 0 . For i = 1, the computation V 1 → · · · → V d of the machine S ∪Ŝ provided by Lemma 3.9 (1) for an accepted i-sector is accepting.

Replicas
Let ∆ be an i-sector, where i = 1. Then by Lemma 3.8, for every i ′ = 1, one can relabel the edges of ∆ (or of the mirror copy of ∆ if i − i ′ is odd) and obtain an i ′ -sector ∆ ′ , which is just a copy (or a mirror copy) of ∆. This symmetry and mirror symmetry are necessary for the diagram surgery we will utilize in the proof of Lemma 4.9 of Section 4. But one cannot construct such a copy of the i-sector if i ′ = 1 since there are no commutativity relationsθ jâ =âθ j ifθ j ∈ A 1 (see (3.5)). However one can construct an ersatz-copy of ∆ called a replica, if ∆ is an accepted sector. This construction will be used in Sections 4 and 5.
For i ′ = 1, we construct the replica of ∆ as follows. (We assume below that i is odd, otherwise one first replaces ∆ by its mirror copy.) At first, the relations (3.4) and (3.5) make it possible to replace the maximal k i -band K i and k i+1band K i+1 of ∆ by their copies K 1 and K 2 , respectively. Similarly, we replace every maximal θ-band of ∆ by its copy if θ is a command of the machine S. Now let T be a maximal θ-band of ∆ and θ a command of the machineŜ. This T consists of (θ, q)-and (θ, a)-cells. To construct the replica T ′ of T we take the 'copies' of (θ, q)-cells only (but no a-edges in these 'copies', the a-edges are contracted to vertices) and build T ′ from them by identifying θ-edges of neighboring cells in the order in which the original (θ, q)-cells appear in T .
It remains to close up the holes between θ-bands T ′ s and T ′ s+1 for consecutive T s and T s+1 . If the corresponding letters θ s and θ s+1 are both commands of S, then we just identify the top of the copy T ′ s and the bottom of the copy T ′ s+1 . This is possible since φ(topT s ) ≡ φ(botT s+1 ) by Lemma 3.9 (1). A similar identification works if θ s and θ s+1 are both the commands ofŜ.
Assume now that θ s is a command of S and θ s+1 is a command ofŜ (or vice versa). Then φ(top(T s )) ≡ φ(botT s+1 )) ≡ k i W s k i+1 , and the copy W ′ s of W s in the alphabet A 1 is an admissible word by both machines S andŜ since both commands θ −1 s and θ s+1 are applicable to it. But the only common tape letters of these machines are the letters of the input alphabet of S, and the only common state letters of these machines are the letters from the start vector s 1 and the accept vector s 0 of S. Then by property (5) of Lemma 2.10, either θ −1 s is the start rule or θ s is the accept rule of S. In the latter case, we have φ(top(T ′ s )) ≡ φ(botT ′ s+1 )) ≡ k i W s k i+1 since the accept words have no a-letters at all, and so identification of the top of the copy T ′ s and the bottom of the copy T ′ s+1 is possible again. Let us consider the former case. Then the rule θ −1 Therefore W ′ s can have tape letters only between q 1 and q 2 , i.e., W ′ s ≡ q 1 uq 2 q 3 . . . q K , where u is a word in the input alphabet. Since the sector ∆ is accepted and the rules of S-machines are invertible, we have that W ′ s is accepted by the machine S(L) ∪Ŝ(L), and so it is accepted by S by Lemma 3.1. Hence the word u belongs to the language recognized by the machine S, and therefore u is a word in the generators of H, and u = 1 in H by the definition of that language. Now, to close up the hole between the bands T ′ s and T ′ s+1 , it suffices to paste in an H-cell π labeled by the cyclically reduced form of the word u between them because φ(top(T ′ s )) ≡ k 1 q 1 uq 2 q 3 . . . q K k 2 and φ(botT ′ s+1 ) ≡ k 1 q 1 q 2 q 3 . . . q K k 2 .
The replica ∆ ′ of the accepted i-sector ∆ is constructed. To summarize our effort: we cut off the (θ, a)-cell and contract up the a-edges of (θ, q)-cells for every command θ of the machineŜ from ∆, replace the maximal k i -and k i+1 -bands by their k 1 -and k 2 -copies, replace all the labels of the remaining edges by their copies from the alphabet A 1 , and then close up all the holes by pasting in several H-cells. The result is the replica ∆ ′ , canonically obtained above.
Remark 3. 10. (1) The i-sector ∆ is a union of K + 1 subsectors Γ 1 , . . . , Γ K+1 , where every Γ j is a trapezia with a base of length 2 and with height equal to the height of ∆. The subsector Γ j has a common maximal q-band C = C j+1 with Γ j+1 for j = 1, . . . , K. If i is odd (even), then Γ 2 (resp., Γ K ) is the input subsector. When we construct a replica, H-cells appear only in the input subsector of the replica.
Every Γ j has its own replica Γ ′ j , which is a subsector of ∆ ′ . Similarly, every q-or θ-edge of C has a replica in the replica C ′ of C in ∆ ′ . Also every vertex o of C belongs to either a q-edge or a θ-edge since the boundary of a (θ, q)-cell has no two consecutive a-edges by Lemma 2.10(5); and so o has a replica o ′ .
(2) The replica of an i-sector is not necessarily a minimal diagram.

Discs
Given a diagram ∆, one can construct a planar graph whose vertices are the hubs of this diagram plus one improper vertex outside ∆, and the edges are maximal k-bands of ∆. Lemma 3.11 says that if the number L of sectors of a hub is large enough, then the degree of the improper vertex is positive provided the graph has at least one proper vertex; and therefore the homomorphism H → G turns out to be injective. A disc diagram (or a disc) is a (sub)diagram ∆ such that (1) it has exactly one hub Π (2) there are no θ-edges on the boundary ∂∆ (3) there are no H-cells of ∆ having an edge on ∂∆. In particular, a hub is a disc diagram.
Let us consider a disc diagram ∆ with hub Π. Denote by K 1 , . . . , K L the k 1 , . . . , k L -bands starting on the hub Π. Since the hub relation has only one letter k i for every i, these k-bands have to end on ∂∆. It therefore follows from Lemma 3.4 that for every i, the bands K i and K i+1 bound, together with ∂∆ and ∂Π, either a subdiagram Ψ i having no cells corresponding to any non-trivial relation of G (in this case the bands K i and K i+1 are also trivial), or Ψ i is a trapezium, and the boundary label of ∆ has exactly one letter from Q i for every i ∈ [1, . . . , N ].
Similarly, consider two hubs Π 1 and Π 2 in a minimal diagram, connected by a k i -band K i and a k i+1 -band K i+1 , where (i, i + 1) = (1, 2), and there are no other hubs between these k-bands. These bands, together with ∂Π 1 and ∂Π 2 , bound either a subdiagram Ψ i having no cells, or a trapezium Ψ i . The former case is impossible since in this case the hubs have a common k i -edge and they are mirror copies of each other contrary to the reducibility of minimal diagrams. We want to show that the latter case is not possible either.
Indeed, in the latter case Ψ i is an accepted trapezium since the i-sector subword of Σ 0 is k i wk i+1 , where w is a (mirror) copy of the accept word of the machine S. Therefore, according to Subsection 3.4, a replica Ψ 1 of Ψ i (as well as the (mirror) copies Ψ j for every j ∈ [2, . . . , N ]) can be constructed. Then one can construct a spherical diagram Γ from Π 1 , Π 2 , and the diagrams Ψ j (j = 1, . . . , N ). There are two subdiagrams Γ ′ and Γ ′′ of Γ with common boundary: Γ ′ , being a copy of a subdiagram of the original diagram, is made of Π 1 , Π 2 , and Ψ = Ψ i , and Γ ′′ is a union of all Ψ j -s with j = i. Hence the subdiagram Γ ′ of the original diagram can be replaced by diagram of lower type with the same boundary label because Γ ′′ has no hubs. This contradicts the minimality of the original diagram.
Thus, any two hubs of a minimal diagram are connected by at most two k-bands, such that the subdiagram bounded by them contains no other hubs. This property makes the hub graph of a minimal diagram (where maximal k-bands play the role of edges connecting hubs) hyperbolic (in a sense) since the degree L of every vertex (=hub) is high (≥ 40). Below we give a more precise formulation (proved for diagrams with such a hub graph, in particular, in [20], Lemma 3.2).  Proof. Assume that a word w in the generators a 1 , . .., a m of the group H is equal to 1 in G. Then by van Kampen's Lemma, there is a minimal diagram ∆ over G whose boundary label is w. Since w has no q-letters, ∆ has no hubs by Lemma 3.11. By Lemma 3.4, ∆ contains neither q-nor θ-annuli, and so it has neither (θ, q)-cells nor (θ, a)-cells, because w has neither q-nor θ-letters. Hence this diagram can contain H-cells only. Since the boundary labels of H-cells are trivial in H, the boundary label w is trivial in H too by van Kampen's Lemma, and so the homomorphism is injective. .

Comparison of paths in diagrams
The main lemma of this section is Lemma 4.9 which says that given a diagram ∆, one can cut off a subdiagram having one hub (and cells of lower rank) such that the perimeter of the remaining diagram does not exceed |∂∆|. This makes it possible to use induction on the number of hubs in the next section. To carry out the surgery of Lemma 4.9 we must be able to compare the lengths of paths in various particular diagrams and to 'finish' some 'unfinished' diagrams.

Paths in sectors
We will consider the words in the generators of the group G and will modify the length function on this set. This modification is helpful in subsequent subsections, in particular, we cannot prove an analog of Lemma 4.6 using the standard length || ||.
The standard length || || of a word (of a path) will be called its combinatorial length. From now on we use the word length for the modified length. We set the length of every q-letter equal 1, and the length of every a-letter equal to a small enough number δ > 0 so that δ < (3N ) −1 .

(4.7)
If a word v has s θ-letters, t a-letters, and no q-letters, then |v| = s + δ max(0, t − s) by definition. For example, the word read between two q-letters of a (q, θ)-relation has length 1 since it has one θ-letter and at most one a-letter by formulas (3.4, 3.5) and property (3) of Lemma 2. 10.
An arbitrary word w is a product v 0 q 1 v 1 q 2 . . . q m v m , where q 1 , . . . , q m are q-letters and the words v 0 , . . . v m have no q-letters. Then, by definition, |w| = m + m j=0 |v j |. The length of a path in a diagram is the length of its label. The perimeter |∂∆| of a van Kampen diagram is similarly defined by a shortest cyclic decomposition of the boundary ∂∆. It follows from this definition that for any product s = s 1 s 2 of two words or paths, we have |s| ≤ |s 1 | + |s 2 |, and |s| = |s 1 | + |s 2 | if s 2 starts or s 1 ends with a q-letter.
If a path p starts at a vertex o and ends at o ′ , we will write o = p − and o ′ = p + . Proof. Since the path t has no q-edges, it follows from the assumption of the lemma that every maximal θ-band T of ∆ has exactly two (θ, q)-cells (the first one and the last one). Therefore C and C ′ can be connected along T by a path x consisting of a-edges only. We denote by t 2 a shortest path among such x-s. Then we define t 1 (t 3 ) as the shortest subpath of the side of C (of C ′ ) connecting o and (t 2 ) − ((t 2 ) + and o ′ ).
Assume that there is an a-band A starting with an a-edge of t 2 and ending with an a-edge e of ∂C ′ . Then e belongs to some path ye where y consists of a-edges and connects C and C ′ . Notice that every maximal a-band crossing the path y must cross t 2 because it cannot cross A, and C has no a-edges. Hence |y| a ≤ |t 2 | a − 1, contrary to the minimality in the choice of t 2 .
Thus every maximal a-band A crossing t 2 must connect the top and the bottom of the trapezium ∆, and therefore the path t must cross every such a-band A. Also t must cross every maximal a-band starting on t 3 whence |t| a ≥ |t 2 | a + |t 3 | a = |t ′ | a . Since the path |t| must cross every maximal θ-band of ∆ we also have the inequality |t| θ ≥ |t 1 | θ + |t 3 | θ = |t ′ | θ . Now it follows from the definition of path length that |t ′ | ≤ |t| as required. Proof. First of all, one may assume that no maximal q-band C = C 1 , C 2 , . . . , C k+2 = C ′ is crossed by the path t twice. Indeed, otherwise t has a subpath s of the form ezf, where e and f are q-edges of some C j separated in this band by m (θ, q)-cells for some m ≥ 0. Therefore the path z must cross at least m maximal θ-bands whence |ezf | ≥ m + 2. But the vertices e − and f + can be connected along C j by a path of length m (see the example after the definition of length | * |), and so the path t can be shortened.
Thus the path t is a product t = t 1 . . . t k+1 , where each t j connects a vertex o(j) lying on C j with a vertex o(j + 1) lying on C j+1 , and for every j = 1, . . . , k, either t j+1 starts or t j ends with a q-edge, and so |t| = k+1 j=1 |t j |. As in the previous paragraph, we have that each of the t j -s crosses every θ-band at most once. (Consider ezf , where e and f are θ-edges of the same θ-band.) Now using the notation of Remark 3.10, it suffices to consider the replica Γ ′ j of the subsector Γ j and to find a path t ′ j connecting the replicas o ′ (j) and o ′ (j + 1), with |t ′ j | ≤ |t j |. We may assume that i is odd. (If i is even one should use a mirror argument.) We first consider the path t 2 crossing the input subsector Γ 2 , assuming that t 2 has no q-edges, since the q-edges (if any) can be attributed to the subpaths t 1 and t 3 . By property (6) of Lemma 2.10, C 2 has no a-edges. Hence, by Lemma 4.1 applied to a subtrapezium of Γ 2 containing t 2 , we may assume that t 2 = s 1 s 2 s 3 , where s 1 and s 3 are the subpaths of the top or bottom paths of q-bands C 2 and C 3 , respectively, and s 2 goes along the top or bottom of a maximal θ-band T . For both paths s 1 and s 3 , we have paths s ′ 1 and s ′ 3 of the same length lying on the boundaries of the q-bands C ′ 2 and C ′ 3 of the replica Γ ′ 2 and connecting the replicas of the vertices (s 1 ) ± and (s 3 ) ± , respectively. The vertices (s ′ 1 ) + and (s ′ 3 ) − are either connected by a copy s ′ 2 of s 2 (if the θ-band T was copied when we constructed the replica ∆ ′ ) or (s ′ 1 ) + = (s ′ 3 ) − (if the corresponding θ-band of ∆ ′ has no a-edges). It follows that in any case we have Assume now that j = 2. The subsector Γ j has no H-cells by Lemma 3.8, and so it is a union of alternating subtrapezia Γ j1 , Γ j2 , . . . whose histories are words either in the alphabet Θ or inΘ. Let where every x k belongs to Γ jk . If the history of the trapezia Γ jk is a word over Θ, then we have the copy Γ ′ jk of Γ jk in ∆ ′ , and a subpath x = x k of t j lying in Γ jk has a copy x ′ in Γ ′ jk . If the history is a word overΘ, then Γ ′ jk has no a-edges, and for every subpath x of t j lying in Γ jk , we can construct a corresponding subpath x ′ in Γ ′ jk which copies only q-and θ-edges of x, but ignores the a-edges of x. Since there are no a-edges in the common boundaries of the neighbors Γ ′ jk and Γ ′ j,k+1 , we have (x ′ k ) + = (x ′ k+1 ) − for every k = 1, . . . , d − 1, and we obtain |t ′ 2 | ≤ |t 2 | for the path t ′ 2 = x ′ 1 . . . x ′ d . Now the required path t ′ is obtained, and the lemma is proved.
(5) the trimmed bottom path of T 1 is a subpath of y ±1 1 , and the trimmed bottom path of T l is a subpath of the top path of T l−1 for every l = 2, . . . , m; (6) one can construct a diagram∆ with boundaryx 1ȳ1x2ȳ2 , and∆ satisfies the analogs of properties (1)-(5), but φ(ȳ 1 ) ≡ k j−1 . . . k i in case 4(a) (φ(ȳ 1 ) ≡ k j . . . k i+1 in case 4(b)), and ∆ is embeddable in∆ so that the k j -band K j and the k i -band K i remain maximal in∆.
Then in case (4)(a) (in case 4(b)), there exists a diagram ∆ ′ over the group G 1 with boundary path are sides of the maximal k j -band K j and k 2j−i -band K 2j−i (of the maximal k i -band K i and k 2i−j -band K 2i−j starting on y ′ 1 ), the labels of x ′ 1 and x ′ 2 are copies of φ(x 1 ) and φ(x 2 ), resp., and |y ′ 2 | ≤ |y 2 |. (The subscripts of k-bands are taken modulo L.) Proof. We consider the case (4)(b) only. The maximal k-bands K j , K j+1 . . . K i+1 subdivide the diagram∆ in the subdiagrams Γ l -s, where Γ l is bounded by K l and K l+1 and Γ l includes these k-bands (l = j, . . . , i).
Every maximal θ-band of ∆ having a cell in Γ i−1 , must cross both bands K i and K i+1 of∆. Therefore the parts of these bands in Γ i−1 and Γ i form an unfinished diagram whose i-sector is a subdiagram ∆ i of Γ i . By Lemma 4.3, the subdiagram Γ i−1 is embedded into the mirror copy ∆ i−1 of ∆ i . Similarly, Γ i−2 is embeddable into a mirror copy of ∆ i−1 which is the copy of ∆ i (we denote this copy by ∆ i−2 ) ,. .., Γ j is embeddable into the copy (or mirror copy) ∆ j of ∆ i .
where t l passes through Γ l (and ∆ l ) for l = i − 1, . . . , j. Since every subpath t l connects two vertices on the k-bands of the l-sector ∆ l , we can construct the mirror copy ∆ ′ 2i−l−1 (which is a 2i − l + 1-sector) of ∆ l or the replica of ∆ l (if 2i − l − 1 = 1), and the copies of the vertices (t l ) ± are connected in ∆ ′ 2i−l−1 by a path t ′ 2i−l−1 with |t ′ 2i−l−1 | ≤ |t l | by Lemma 4.2. The desired diagram ∆ ′ embeds in the union of these ∆ ′ 2i−l−1 -s, and y ′

Shortcuts
In this subsection we show that Lemma 4.4 helps cut off a hub from a diagram using a 'shortcut'. But at first we consider a few statements about simpler pieces (without hubs) of diagrams over the group G.
Lemma 4. 5. Suppose a diagram ∆ over G has a q-band C starting and ending on ∂∆, and p is a side of C. Assume that no θ-band crosses C twice in ∆ and that there is a factorization xy of the boundary path of ∆ such that x − = p − , x + = p + . Then |p| ≤ |x|, |∂C| ≤ |∂∆|, and |xp −1 | ≤ |∂∆|.
Proof. On the one hand, the length |p| of p is equal to the number m of θ-cells in C, since every cell of C has one θ-edge and at most one a-edge on p by condition 3 of Lemma 2.10 and the definition of (θ, q)-relations. On the other hand, every maximal θ-band crossing C must terminate on x, since it does not cross C twice. It follows that |x| ≥ m, and so |p| ≤ |x|. Similarly, we obtain inequalities |∂C| ≤ |∂∆|, and |xp −1 | ≤ |∂∆|. (We take into account the definition of length and the fact that the sides of the band C are separated by two q-edges lying on ∂∆.) A maximal θ-band of a diagram ∆ called a rim band if its start and end θ-edges as well as its top or its bottom lies on the boundary path of ∆.
Proof. Let s be the top side of T and s ⊂ ∂∆. Note that by our assumptions the difference between the number of a-edges in the bottom s ′ of T and the number of a-edges in s cannot be greater than 2N since every (θ, q)-cell has at most two a-edges. However, ∆ ′ is obtained by cutting off T along s ′ , and its boundary contains two fewer θ-edges than ∆. Thus one can compare the boundaries of ∆ and ∆ ′ as follows. There is a one-to-one correspondence between the q-edges of these boundaries, and to extend this correspondence to the θ-and a-edges of the boundaries, one should remove two θ-edges from ∂∆ and add at most 2N a-edges. Therefore it follows from the definition of length and by Inequality (4.7), that Lemma 4. 7. Suppose a diagram ∆ has two cells: an H-cell Π and a (θ, a)-cell π which have a common a-edge e, ep is the boundary of Π and ef e ′ f ′ is the boundary of π, where φ(e) ≡ φ(e ′ ) −1 ≡ a and φ(f ) ≡ φ(f ′ ) −1 ≡ θ for a θ-letter θ. Then there is a diagram ∆ ′ with the same boundary label as ∆ composed of Π and a θ-band T , and p is the side of T .
The letter a commutes with θ, and therefore every letter of the boundary label of Π commutes with θ. Hence one can construct a θ-band T with boundary If we attach the band T to Π along the path p and remove π, we obtain the required diagram ∆ ′ .
Let Π be a hub of a minimal diagram ∆ given by Lemma 3.11. Using the notation of that lemma, we recall that the subdiagrams Γ i and Γ i+1 intersect along the k-band B i+1 (i = 1, . . . , L − 5). We denote by Ψ the minimal subdiagram containing all the Γ i -s for i = 1, . . . , L − 4. The boundary path of Ψ is x ′ x ′′ , where x ′ is composed of the sides of B 1 , B L−3 , and a subpath of ∂Π, while x ′′ is a subpath of ∂∆. Proof. Let Ψ ′ be a minimal diagram with boundary of the form x ′x which satisfies conditions (1) and (2) and has minimal |x|. Since Ψ satisfies conditions (1) and (2) withx = x ′ , such Ψ ′ exists. Clearly the pathx has no loops. If the diagram Ψ ′ has a maximal q-band C which does not start/terminate on Π, then one can cut off C and shortenx by Lemma 4. 5. So Ψ ′ satisfies condition (3).
Assume that the diagram Ψ ′ does not satisfy condition (4) of the lemma. Then by Lemma 3.4, we have a θ-band of Ψ ′ starting and terminating onx. It follows that there is a θ-band T starting and terminating onx, such that the subdiagram Φ bounded by T and a part y ofx has no non-trivial θbands, i.e., it contains only H-cells. Two H-cells of Φ cannot have a common edge since otherwise they could be replaced by one H-cell contrary to the minimality of Ψ ′ . It follows that every H-cell π of Φ has a common edge with T because the pathx has no loops.
Thus we have a series of H-cells π 1 , . . . , π s in Φ with boundaries y i z i (i = 1, . . . , s), where y i is a part of y (or y i is empty) and z i belongs to the side z of T . If |φ(z i )| a ≤ 2, then |z i | ≤ 2δ since every z i is a product of a-edges. Therefore |z| ≤ |x| + 2δ. It follows from Lemma 4.6 that if we remove all π i -s and then cut off the band T , then we decrease the length ofx since 2δ < 1/2; a contradiction.
Hence |φ(z i )| a ≥ 3, and so at least one of π i -s has a common edge with a (θ, a)-cell of T . (We recall that T intersects each of the maximal q-bands of Ψ ′ starting on x ′′ at most once by Lemma 3.4, and so at most two of the (θ, q)-cells of T have a-edges with a ∈ A 1 , and each of these two (θ, q)-cells can have at most one a-edge with label from A 1 .) Therefore we can apply Lemma 4.7 to replace the (θ, a)-cell by a θ-band passing around the cell π i . This modification of the band T decreases the number of H-cells in Φ, since one of the H-cells gets moved over the band T .
The modified diagram can be non-minimal, but our surgery preserves q-bands and keeps the property that every q-band and θ-band have at most one common (θ, q)-cell. So sooner or later, this trick makes the inequality |φ(z i )| a ≤ 2 true, and one can decreasex, as was explained above. If one replaces the obtained diagram by a minimal one, then condition (1) still holds since the maximal q-bands B 1 and B L−3 are completely determined by the boundary as this follows from Lemma 3. 4. This contradicts the assumption on the minimality of |x|. Thus property (4) holds.
If Ψ ′ has H-cells in the subdiagram Γ ′ i between the pair of k-bands B ′ i and B ′ i+1 which are not a pair of k 1 -and k 2 -bands, then these H-cells (with labels over the alphabet A 1 ) cannot have common edges with the maximal θ-bands of Γ ′ i since the θ-bands of Γ ′ i must intersect either B ′ i or B ′ i+1 by (4). This implies that Γ ′ i has no H-cells at all because the pathx has no loops. The lemma is proved.
Lemma 4. 9. If a minimal diagram ∆ has a hub, then the (cyclic shift of the) boundary path of ∆ can be factorized as pp ′ so that the subpath p starts and ends with q-edges and there is a simple path z in ∆ with z − = p − , z + = p + such that the subdiagram bounded by the loop pz −1 has exactly one hub Π and the label φ(z) is equal in the group G 1 to a word of length < |p|.
Proof. We may assume that a hub Π is chosen in ∆ according to Lemma 3. 11. Let x ′ x ′′ be the boundary path of Ψ as in Lemma 4.8. We will look for the path z in the minimal subdiagram ∆ ′ obtained after removing Ψ and Π from ∆. Therefore to prove the lemma, one may assume using the notation of Lemma 4.8, that Ψ ′ = Ψ andx = x ′′ , i.e., the subdiagram Ψ itself has properties (3), (4), and (5)  Let T be the set of remaining maximal θ-bands of Ψ, i.e., every band of T intersects exactly one of the bands B 1 , B L−3 . It follows from Lemma 3.4 for Ψ that there is an integer l (1 ≤ l < L − 3) such that no θ-band of T crossing B 1 crosses B l+1 and no θ-band of T crossing B L−3 crosses B l . We have either (L − 3) − l < (L − 3)/2 or (l + 1) − 1 < (L − 3)/2 since L is even. Without loss of generality we choose the former inequality, and so l ≥ (L − 2)/2.
If B i is k 1 -band for some i ≤ 6, then we will consider a smaller subdiagram bounded by B 7 and B L−3 instead of Ψ. (Respectively, we change the complimentary minimal subdiagram ∆ ′ .) If none of B 1 , . . . , B 6 is a k 1 -band, then we do not change Ψ. Thus, in any case we can reindex the k-bands and assume that Ψ is bounded by B 1 and B L−r for some r ≤ 9 and that the bands B 1 , . . . , B r+3 are not k 1 -bands. Let them be k i -,. .., k i±(r+2) -bands for some i. We will assume that B r+3 is a k i−r−2 -band.
Since L ≥ 40, after such a reindexing l ≥ (L − 2)/2 − 6 > 12 ≥ r + 3, and so no θ-band from the set T crossing the band B L−3 crosses B r+3 . We denote by Φ the part of the diagram Ψ bounded by B r+3 and B 1 . Let T Φ 1 , . . . , T Φ s be the maximal θ-bands of Φ. Then by the choice of the subdiagrams Ψ and Φ, every cell of Φ belongs to one of these θ-bands, and each of these bands crosses the k i -band B 1 . We will assume that T Φ 1 is the closest band to the hub Π, and so on. For every j ≥ 1, at least one of the two q-edges of every (θ, q)-cell of T Φ i belongs to the top of T Φ j−1 (to ∂Π if j = 1) because Ψ has no maximal q-bands except for the bands starting on Π. Assume that a band T Φ j , starting with a (θ, q)-cell of B 1 , terminates with an (θ, a)-cell π having no a-edges on ∂T Φ j−1 . Then an a-edge and one θ-edge of π lie on the boundary subpath x ′′ of Ψ, and so if one removes π from Ψ, the length of x ′′ does not increase because ∂π has two θ-edges and two a-edges and all properties (3)-(5) hold for the remaining part of Ψ. Therefore we may assume that every edge f of tbot(T Φ j ) belongs to top(T Φ j−1 ) (to ∂Π for j = 1).
To construct the path z, we go along the side of the band B 2 which is closer to B 1 , then go along the part of the boundary of the hub which is not part of ∂Ψ, and finally go to ∂∆ along the side of B L−r which is closer to B L−r+1 . Thus we have z = z 1 z 2 z 3 according to this definition, and respectively, φ(z) ≡ Z ≡ Z 1 Z 2 Z 3 . Let p be the subpath of ∂∆ and ∂Ψ such that p − = z − , p + = z + . Then the boundary path of ∆ is of the form pp ′ , for an appropriate p ′ .
We denote by Γ the part of the diagram Φ bounded by B r+3 and B 2 . Let y be the maximal common subpath of ∂Γ and p with y − = p − = z − that does not cross the band B r+3 . We may apply Lemma 4.4 to the pair (Γ, Φ) and obtain a new diagram Γ ′ over G 1 given by that lemma. One of the four boundary sections of Γ ′ is a subword k i−1 ...k i+r+1 of the word Σ 0 .
Note that B L−r is a k i+r+1 -band since we take the indices of the k-letters modulo L. By Lemma 4.4, Γ ′ has a loop with label of the form Z 1 Z 2 Z ′ Y ′ , where Z ′ is the copy of the word written along the band B r+3 , and |Y ′ | ≤ |y|. Hence Z = (Y ′ ) −1 (Z ′ ) −1 Z 3 in G 1 . Therefore to prove that Z is equal in G 1 to a word of length ≤ |p|, it suffices to prove that the word (Z ′ ) −1 Z 3 is equal in G 1 to a word of length < |x|, where p = yx.
We observe that the words Z ′ and Z 3 have equal prefixes of length d since B L−r [d] is a copy of B r+3 [d]. Therefore the word (Z ′ ) −1 Z 3 is equal to (Z ′ ) −1Z 3 , where (Z ′ ) −1 copies the label of the part of the side of B r+3 crossed by the θ-bands from the set T only, andZ 3 is the label of the part of z 3 crossed by the θ-bands of T only. We denote the union of these two subsets of T byT . The length of (Z ′ ) −1Z 3 does not exceed the number |T | of bands inT since neither of the bands from T crosses both B r+3 and B L−r . By Lemma 3.4, every band ofT must end on the subpath x. Hence |(Z ′ ) −1Z 3 | ≤ |x|. In fact this inequality is strict since L − r > r + 3 (as r ≤ 9) and so x must include some q-edges as well. The lemma is proved.

Spaces of words
Let W be a set of vanishing in the group G words. An easy observation (see Lemma 5.5 below) shows that under a natural condition, spaces of words from W can be bounded from above by the same (up to equivalence) function as the spaces of words from a smaller set W ′ . In this section, we define several sets of words (as boundary labels of several sets of diagrams) and estimate from above the spaces of words from larger sets using the upper bounds for the spaces of words from smaller sets. To apply Lemma 5.5, we will cut up a diagram ∆ into two pieces such that one of the pieces belongs to a smaller set of diagrams while the perimeter of the second piece is bounded by |∂∆|. The most difficult statement is Lemma 5.8 (separating of a one-hub subdiagram) whose proof is based on Lemma 4.9.

Spaces of boundary labels of some diagrams.
We call a disc simple if it has no H-cells and either all its θ-edges have labels from Θ or all of them have labels fromΘ.
Lemma 5. 1. Let ∆ be a diagram having exactly one hub Π. Then there is a diagram∆ with the same boundary label as ∆ such that∆ has a simple disc subdiagram D, and the annular diagram Γ = ∆\D is a minimal annular diagram without θ-annuli.
Moreover, one may assume that the boundary label of D is of the form (k 1 W 1 k 2 W 2 . . . k L W L ) ±1 , where k 1 W 1 k 2 W 2 . . . k L W L is accepted by either the machine S(L) or byŜ(L), and the lengths of θ-annuli in D do not exceed N + L(space S∪Ŝ (W 2 )).
Proof. We may assume that ∆ is a minimal diagram. Let D 1 be a maximal disc in ∆, and denote by K 1 , . . . , K L the maximal k 1 , . . . , k L -bands of D 1 starting on the hub Π. We denote by Γ i the maximal accepted i-sector of D 1 bounded by k i and k i+1 (i = 1, . . . , L). By Lemmas 3.8 and 3.9(1), these sectors have no H-cells for i = 1 and each of them is a copy or a mirror copy of the 2-sector Γ 2 .
Now we replace the 1-sector Γ 1 by the replica Γ ′ 2 of Γ 2 in D 1 . (To achieve this, one can make a cut along K 1 , K 2 and the part of the boundary of Π between these k-bands, and insert two mirror copies of Γ ′ 2 along this cut.) By the definition of replica, we obtain a modification D 2 of the disc diagram D 1 with boundary label of the form k 1 W 1 k 2 W 2 . . . k L W L , where W 1 is a mirror copy of W 2 or W 1 has no a-letters. Since k 2 W 2 k 3 is an accepted 2-sector word, the word k 1 W 1 k 2 W 2 . . . k L W L is accepted by either the machine S(L) or the machineŜ(L) by Lemma 3. 1. Moreover the length of this computation does not exceed the length of the computation of S ∪Ŝ by the same lemma. Therefore by Lemma 3.9(2), the disc D 2 can be replaced by a disc D 3 which has no H-cells, and whose labels of θ-edges either all belong to Θ or all belong toΘ, and whose number of θ-annuli does not exceed that number for D 2 .
Now, if necessary, the annular diagram ∆\D 3 can be replaced by a minimal diagram Γ over the group G 1 . Assume that Γ has a θ-annulus T . By Lemma 3.4, T surrounds the disc diagram D 3 , and so D 3 can be included in a larger disc subdiagram D 4 which contains more (k i , θ)-cells for i = 1, 2 since the extensions of the K i -s have to cross T . Then one can make the surgery as above and replace D 3 by a larger simple disc D 4 . This procedure terminates, because we do not change the number of (k i , θ)-cells (i = 1, 2) in the compliment of the discs when passing from D 1 to D 2 and from D 2 to D 3 and we reduce this number when passing from D 3 to D 4 . (Recall that the rank of such cells is higher than the ranks of other cells in diagrams over G 1 .) Thus the procedure terminates with a desired diagram Γ.
Finally, one can replace the disc by a disc corresponding to a computation of minimal space and use Lemma 3.1 to make the second claim of the lemma true.
Lemma 5.2. There are positive constants c 1 and c 2 with the following property. For the boundary label w ≡ k 1 W 1 . . . k 2 W l of the simple disc D from Lemma 5.1, there is a derivation w ≡ w 0 → w 1 → · · · → w t = 1 using the relations of G(S ∪Ŝ, L) and the hub relation (3.6), such that |w i | ≤ c 1 S ′ S (|W 2 |) + c 2 (i = 0, 1, . . . , t).
Proof. Let us start with the words w ≡ w(0), w(1), . . . , w(m) ≡ Σ 0 , written on the boundaries of the θ-annuli of D. By Lemma 5.1 and 3.1, there is c 1 > 0 such that |w(i)| ≤ c 1 S ′ S (|W 2 |) + N. Notice that w(i) is written on the top of a θ annulus T of D and w(i + 1) is written on its bottom. The band T has Lemma 5. 5. Let C > 0. Assume that for every w from a vanishing set W, there is a derivation w ≡ w 0 → w 1 → · · · → w t ≡ 1 such that for every i = 0, . . . , t − 1, a cyclic shift of the word w i is freely equal to a product of a cyclic shift of w i+1 and a word v i from a vanishing set W ′ , where max(|v i |, |w i |) ≤ C|w| (i = 0, . . . , t − 1). Then the function f G,W is bounded from above by a function equivalent to f G,W ′ .
Proof. The hypothesis of the lemma implies that we can apply the following series of elementary transformations to w i . A first series of transformations replaces the word by its cyclic shift, a second series deletes/inserts mutual inverse letters operating with the words of length ≤ 2C|w|; then we split the obtained word into a product of a cyclic shift of w i+1 and the word v i ; then we keep w i+1 unchanged and use the appropriate procedure reducing the word v i to the empty word, and finally obtain the word w i+1 using cyclic shifts. Clearly, we have that the space of this procedure is at most 2C|w| + f G,W ′ (C|w|). Thus, by induction on i, we have Space G (w) ≤ 2Cn + f G,W ′ (Cn) for arbitrary word w ∈ W of length at most n. The lemma is proved.
Proof of Corollary 1.7. Assume that α is computable with space ≤ 2 2 m . It follows that for m = [log 2 log 2 n] we can recursively compute binary rationals α m such that |α − α m | = O(2 −m ) = O((log 2 n) −1 ) ( 5.8) and the space of the computation of α m is at most n. In addition, one may assume that the number of digits in the binary expansion of α m is O(m). Therefore the computation of [log 2 n] (in binary) and of the product α m [log 2 n] needs space at most O((log 2 n) 2 ). Then  Thus the subsequent application of the above mentioned DT M -s has space complexity equivalent to n α , and we can apply Corollary 1.4 to obtain a finitely presented group with space function equivalent to n α . Now assume that a function [n α ] is equivalent to a space function of a finitely presented group G. Then by Proposition 1.1, there is an N T M M whose space complexity S M (n) is equivalent to [n α ], that is c 1 n α < S M (n) < c 2 n α ( 5.9) for some positive c 1 , positive integer c 2 , and every sufficiently large n. In particular, we have S M (n) < c 2 n d for some integer d and every n. Since c 2 n d is an F SC function, we may apply Lemma 5.