CN117275583A

CN117275583A - Quantum technology-based gene search BLAST acceleration method and system

Info

Publication number: CN117275583A
Application number: CN202311254777.6A
Authority: CN
Inventors: 章乐; 喻扬超; 李冰
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2023-12-22
Anticipated expiration: 2043-09-27
Also published as: CN117275583B

Abstract

The invention provides a gene search BLAST acceleration method and system based on quantum technology, wherein the method mainly comprises three parts: the first part combines quantum computation, so that the time complexity of the classical BLAST algorithm is reduced, and the computational bottleneck is broken through; the second part provides a scheme for simplifying the quantum circuit, which can simplify the quantum circuit corresponding to the BLAST acceleration algorithm based on the quantum technology under the condition of smaller time cost, and reduce the space complexity of the quantum circuit; the third provides a simple quantum wire mapping scheme for physical quantum computers. The method effectively improves the running efficiency of the BLAST method and reduces the space complexity of the quantum circuit in the corresponding quantum computation.

Description

Quantum technology-based gene search BLAST acceleration method and system

Technical Field

The invention relates to the field of quantum computing and biological information, in particular to a sequence comparison method, quantum line simplification and quantum line mapping based on quanta, and in particular relates to a gene search BLAST acceleration method and system based on quantum technology, which are used for accelerating classical BLAST algorithm and reducing the operation requirement of a corresponding quantum line in a physical quantum computer.

Background

Interferon is a key factor in the field of immunological research, which plays an important role in the development of various fields of research in immunology. E.g. it may modulate and enhance the immune response in humans; inhibiting replication and transmission of viruses; inhibit proliferation and growth of tumor cells. Current research on interferon is generally based on annotation information. For annotated species, its location on the chromosome can be obtained very conveniently; whereas for the unannotated species, its whole genome information is a sequence of millions of ATCG characters, which would present a significant impediment to research. For this reason, researchers need to use known interferon sequences to align with whole gene sequences to obtain specific information about the interferon.

Sequence alignment is a key analytical method commonly used in bioinformatics. Through sequence alignment, researchers can obtain information such as the location of interferon in the unexplored species. One of the most widely used sequence alignment algorithms currently is BLAST. It is used in a number of well known databases such as NCBI. Moreover, BLAST plays a key role in gene annotation, disease research, evolutionary biology, and discovery of new genes. With the rapid growth of the amount of biological data, BLAST is a necessary tool to ensure data integration, parsing and application.

The classical BLAST main flow is three steps, wherein the first step is to generate a field words with a fixed length according to a query sequence; secondly, scanning indexes in a database through words, and obtaining loci; and thirdly, extending towards two ends by taking the hits as the center so as to find high-score fragments exceeding a set threshold value. The time complexity of the first and third steps in the BLAST algorithm depends on the length L of the query sequence, which is O (L), and the time complexity of the second step depends on the length N of the subject sequence, which is O (N). Since the length of the query sequence is much smaller than the subject sequence, the main time overhead of the classical BLAST algorithm comes from the second step.

Currently, many optimization methods for BLAST have been proposed, such as pruning before the third step, and parallel operation using a multi-core CPU or GPU. Although these methods better improve the operating efficiency of classical BLAST, the operating efficiency of classical BLAST algorithms is not altered in terms of time complexity. In recent years, the rapid development of quantum computing has provided a new possibility for accelerating BLAST.

Because of the hardware defects of physical quantum computers (physical machines), gate operations in quantum algorithms introduce errors, simplification and equivalent substitution for quantum wires have an important role in reducing computational errors on physical machines. ZX-calculus is an optimization method of a general quantum circuit, however, the calculation time of the method is very complex.

In the existing physical machine, the entanglement effect between the qubits is weak, and the entanglement effect is further weakened with the increase of the distance. This means that the quantum gate only acts between two adjacent qubits. The far-apart qubits require multiple SWAP operations to be performed next to each other before performing the operations. However, this introduces additional gate operations, increasing quantum computation errors. Quantum wires need to be mapped to physical machines using quantum wire mapping algorithms. However, most physical machines at present lack a line mapping method, and mapping quantum lines needs manual mapping, which is difficult, and SWAP operation cannot be effectively reduced.

In view of this, it is of great importance to construct a quantum technology-based gene search BLAST acceleration method.

Disclosure of Invention

The invention aims at solving the problems that the traditional method cannot improve the time complexity of the classical BLAST algorithm, the general line simplifying method simplifies the quantum line of the BLAST algorithm based on quantum calculation, and how to conveniently map the quantum line, and provides a BLAST accelerating scheme based on quantum technology. The quantum gene sequence comparison method QGSA (Quantum Gene Sequence Alignment) based on the quantum technology improves the time complexity of the classical BLAST algorithm, the quantum circuit reduction method OQCBTT (Optimization of Quantum Circuit Based on Truth Table) based on the truth table reduces the time cost of the reduced QGSA quantum circuit, and the mapping model QCMM (Quantum Circuit Mapping Model) based on the nerve layout provides a simple circuit mapping method.

Specifically, the invention provides the following technical scheme:

in one aspect, the present invention provides a quantum technology-based gene search BLAST acceleration method, comprising:

s1, performing quantum bit coding on a target sequence and a field, and constructing an initial state, wherein the initial state comprises index string bits, target string bits and mode string bits; the target sequence forms target string bits, and the field forms pattern string bits; performing cyclic displacement operation, and converting the target string bits into superposition of states according to the index string bits; searching for a correct solution state, and amplifying the amplitude of the solution state;

s2, obtaining a truth table based on the corresponding relation between the index string bit state and the target string bit state in the S1 cyclic shift operation, extracting a simplified truth table, and obtaining the target string bit, the mode string bit and the index string bit according to the simplified truth table; traversing the index string (namely traversing the bit state of the index string) and updating the count table to complete the construction of the quantum circuit, thereby obtaining the simplified quantum circuit; each target bit maintains a count table, and the target bit refers to one bit in target string bits; the index string refers to a binary string corresponding to the bit state of the index string, and the index string can indicate the bit state of the index string;

S3, training a QCMM model, and mapping the simplified quantum circuit to a physical machine.

Preferably, in the step S1, the target sequence and the field are subjected to quantum bit encoding, and in the initial state of construction, the superposition state is constructed by using an H gate for the index string bit, and the quantum state is constructed by using an X gate for the target string bit and the mode string bit.

Preferably, after the initial state is built, the qubit state formula is:

wherein t is _i I bit, p, of the target string bit _j And the j-th bit of the mode string bit is T, the target string bit length is P, the mode string bit length is P, and k represents the quantum state corresponding to the index string bit.

Preferably, in the step S1, the cyclic shift operation is specifically: the target string bit is shifted by utilizing the superposition state of the index string bit, and the state of the index string bit indicates the number of shifted bits and corresponds to different target string bit states; the state of the qubit after the shift operation is finished is:

wherein t is _i I bit, p, of the target string bit _j For the j-th bit of the pattern string bit, T is the target string bit length, P is the pattern string bit length, k represents the quantum state corresponding to the index string bit,representing uniform superposition of index string bit formations |s>I.e. a superposition of states.

Preferably, in the step S1, the method for searching the correct solution state is: the CNOT gate is used for realizing the XOR operation so as to identify the correct solution state |w > in the uniform superposition state |s >, and the state of the quantum bit after the operation is finished is as follows:

then pass through U _w Is |s>In the correct solution state |w>Adding negative phase, U _w The following is shown:

wherein p is _i I bit is the pattern string bit, and P is the pattern string bit length.

Preferably, for state U _w |s>(i.e., for a uniform superimposed state |s>Applying U _w State after operation) uses U _s Amplitude amplification is performed, wherein:

U _s ＝2|s><s| -I (with respect toState |s>Is a reflection of (d) to state U _w |s>；

I represents a unit gate.

Preferably, the step S2 further includes:

for each target bit, maintaining an N×log N count table, where N represents the target sequence length;

adding a pointer to the i-th index string of the count table: if the number of 1 s in the index string bit is j, then a pointer is added starting from the ith row and jth column of the count table, the pointer points to the same column position in the count table, and the position corresponds to the position of 1 s in the index string bit, and the position of 1 s in the ith index string bit must be included.

Preferably, the S2 further includes: obtaining an index table according to a simplified truth table, wherein the first column of the index table is in different states of index string bits, and the second column of the index table is in the first P quantum bit corresponding states of target string bits corresponding to the index string bits in different states, wherein P represents the length of mode string bits;

The specific way of traversing the index string and updating the count table is as follows:

traversing an index table;

index represents the decimal value corresponding to the different states of the first column index string bit of the index table. x represents one of the qubits in the target string bit, referred to as the target bit. q _xi When index=i, the state of the target bit x corresponding to the index table is referred to as a target bit value. If index in index table is 0 and corresponds to q _x0 If the bit is 1, an X gate is applied to the quantum circuit corresponding to the target bit X; if index is not 0, calculating sum of all columns of the corresponding row of the row index in the count table of the target bit x; temp=q if sum/2=0 _x0 Otherwise, the device can be used to determine whether the current,comparing temp to target bit value q _xindex If temp. Q _xindex Applying a gate according to the index string; wherein temp is a variable for temporarily storing data;

if one gate is added when index=i, the position end whose corresponding index string bit is 1 is recorded, and the counter positions pointed to by pointers starting from the i+1th row end column of the counter table of the target bit x are all added by 1.

Preferably, the QCMM model structure comprises: an input layer, a shared layer, a Dropout layer, and a plurality of SLOT structures;

the input layer is connected with the sharing layer, the sharing layer is connected with the Dropout layer, and the Dropout layer is connected with a plurality of SLOT structures;

The single SLOT structure comprises a dense1 layer, a dense2 layer and a SLOT layer; the dense1 layer receives the output data of the Dropout layer; the dense1 layer is connected with the dense2 layer, and the dense2 layer is connected with the slot layer;

each SLOT structure generates a group of results, and the category with the highest probability in each SLOT structure is selected as the classification result of the SLOT structure.

Preferably, the step S3 further includes processing conflicts between the SLOT structures:

s3-1, recording the number num of SLOT structures, and storing the output result of each SLOT structure into a sList list, wherein sList= [ pList ₀ ,…,pList _num-1 ]Wherein pList _t Representing the result of the i+1st SLOT structure output; flag= [ flag ] ₀ ,…,flag _num-1 ]Wherein the flag is _i =1 indicates that the physical bit corresponding to the i+1th SLOT structure is determined, flag _i =0 indicates that the physical bit corresponding to the i+1st SLOT structure is uncertain; y= [ y ] ₀ ,…,y _num-1 ]Wherein y is _i Representing the physical bit corresponding to the i+1th SLOT structure, each position of the initial y is set to-1; initial oknum=0, and the value of oknum represents the number of SLOT structures for which the corresponding physical bit has been determined;

s3-2, outputting a result y if oknum=num, otherwise executing S3-3;

s3-3, initializing an empty list ntmp; traversing the list of sLists in turn, finding each pList therein _i Maximum probability pmax of (2) _i The method comprises the steps of carrying out a first treatment on the surface of the If a flag is _i Not equal to 1, pmax is to be _i Adding to the list of ntemp, otherwise adding-1 to the end of ntemp;

s3-4, find the maximum value of the ntempMax in the ntemp, and the ntempMax is atThe index in ntemp, ntempIndex, sets the value of yvalue to the value of ntempMax in pList _ntempIndex Index of (a);

s3-5, if yvalue is not in the output result y, y will be _ntempIndex The value of (2) is set to yvalue, and the flag is set _ntempIndex Setting 1, and adding 1 to the value of oknum; otherwise, pList is to be formed _ntempIndex [yvalue]The value of (2) is set to-1. Returning to S3-2.

On the other hand, the invention also provides a gene search BLAST acceleration system based on quantum technology, which comprises:

the QGSA acceleration module is used for carrying out quantum bit coding on the target sequence and the field and constructing an initial state, wherein the initial state comprises index string bits, target string bits and mode string bits; performing cyclic displacement operation, and converting the target string bits into superposition of states according to the index string bits; searching for a correct solution state, and amplifying the amplitude of the solution state;

the quantum circuit simplifying module is used for obtaining a truth table based on the corresponding relation between the index string bit state and the target string bit state in the cyclic displacement operation of the QGSA accelerating module, extracting a simplified truth table, and obtaining the target string bit, the mode string bit and the index string bit according to the simplified truth table; traversing the index string and updating the count table to complete the construction of the quantum circuit, thereby obtaining a simplified quantum circuit; each target bit maintains a count table, and the target bit refers to one bit in target string bits; the index string refers to a binary string corresponding to the bit state of the index string;

And the mapping module is used for training the QCMM model and mapping the simplified quantum circuit to a physical machine.

In yet another aspect, the present invention also provides an electronic apparatus including a processor and a storage device, the processor invoking instructions stored in the storage device to perform a quantum technology based gene search BLAST acceleration method as described above.

Compared with the prior art, the technical scheme has the following beneficial effects:

(1) The QGSA algorithm provided by the invention is realized by BLASTThe time complexity of the two steps is optimized from O (N) to(N represents the length of the target sequence) thereby increasing the efficiency of BLAST operation.

(2) Compared with the general ZX-calcul method (the worst time complexity is O(s) ² n ² ) S represents the number of nodes in the ZX graph, n represents the number of quantum bits), the simplified quantum circuit equivalent to the QGSA algorithm can be obtained under the condition of lower time complexity O (n), and the space complexity of the simplified quantum circuit can be reduced from O (n) to O (logn), so that the calculation scale is reduced, and the operation requirement of the simplified quantum circuit on a physical machine is reduced.

(3) The QCMM model provided by the invention provides a simple line mapping method for a physical machine.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of an example QGSA cyclic shift;

FIG. 2 is a diagram of QGSA two operating state changes;

FIG. 3 is a QGSA probability map;

FIG. 4 is a graph of the correspondence of qubit states in the OQCBTT index string register and the qubit states in the target string register;

FIG. 5 is an OQBTT count table initialization diagram;

FIG. 6 is an OQBTT count table append pointer diagram;

fig. 7 is an OQCBTT simplified line representation;

fig. 8 is an OQCBTT simplified line representation;

FIG. 9 is an OQBTT count table update diagram;

FIG. 10 is a graph of OQBTT reduction results;

FIG. 11 is a partial quantum topology structure diagram of a physical machine;

fig. 12 is a QCMM network structure diagram;

FIG. 13 is a SLOT block diagram of QCMM;

FIG. 14 is a physical bit encoding result;

FIG. 15 is a schematic diagram of various quantum gates;

FIG. 16 is a diagram of a quantum circuit diagram corresponding to an example of study;

FIG. 17 is a graph of QGSA versus BLAST calculation time;

FIG. 18 is a graph comparing QGSA to BLAST runtime;

FIG. 19 is a quantum wire comparison graph, wherein a is an un-simplified quantum wire; b is a quantum circuit after OQCBTT simplification;

FIG. 20 is a graph comparing the number of line quantum gates and the spatial complexity before and after using the OQBTT method, wherein a is the number of line quantum gates before and after using the OQBTT method, and b is the spatial complexity of the line before and after using the OQBTT method;

FIG. 21 is a graph comparing OQBTT and ZX-calcul time complexity;

FIG. 22 is a diagram of the actual operation result of the physical machine;

FIG. 23 is a graph comparing QCMM and Dense Layout methods mapping quantum wires to physical machine results, where a is QCMM mapping and b is Dense Layout mapping;

fig. 24 is a schematic of the main flow of the scheme of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be understood that the described embodiments are only some, but not all, of the embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It will be appreciated by those of skill in the art that the following specific embodiments or implementations are provided as a series of preferred arrangements of the present invention for further explanation of the specific disclosure, and that the arrangements may be used in conjunction or association with each other, unless it is specifically contemplated that some or some of the specific embodiments or implementations may not be associated or used with other embodiments or implementations. Meanwhile, the following specific examples or embodiments are merely provided as an optimized arrangement, and are not to be construed as limiting the scope of the present invention.

The invention mainly aims at solving the problems that the traditional method cannot improve the time complexity of the classical BLAST algorithm, the general line simplifying method simplifies the quantum line of the BLAST algorithm based on quantum calculation, and how to conveniently map the quantum line, and provides a BLAST accelerating scheme based on quantum technology. The quantum gene sequence comparison method QGSA (Quantum Gene Sequence Alignment) of the quantum technology is utilized to improve the time complexity of the classical BLAST algorithm, the quantum circuit simplifying method OQCBTT (Optimization of Quantum Circuit Based on Truth Table) based on the truth table reduces the time cost of the simplified QGSA quantum circuit, and the mapping model QCMM (Quantum Circuit Mapping Model) based on the nerve layout provides a simple circuit mapping method.

In a specific embodiment, as shown in connection with fig. 24, the present solution mainly comprises three parts. The first part is a BLAST acceleration method QGSA based on quantum computation for reducing the time complexity of the BLAST algorithm. The second part is a truth table-based quantum circuit reduction method OQCBTT method for reducing the time overhead of reducing QGSA quantum circuits. The third part is a neural layout-based mapping model QCMM, which provides a simple line mapping scheme for quantum computers. The following describes the present scheme in detail in connection with interferon gene search.

And 1, constructing a QGSA algorithm.

Step 1.1, coding. For character features of the target sequence (length N) and the field (length M), 4 bases are encoded using 2-bit qubits, for example, we can set their correspondence to [ ' a ': 00', ' C ': 01', ' G ': 10', ' T ': 11 ]. After encoding, both the target sequence and the field become a binary string.

And 1.2, initializing. When the initial state is constructed, 3 types of quantum bits are preferably set, namely index string bits, target string bits and mode string bits, and the 3 types of quantum bits are respectively stored in an index register, a target register and a mode register. After the target sequence and the field are qubit encoded, the target sequence forms a target string bit and the field forms a pattern string bit. The index string bit and the mode string bit construct a corresponding quantum state by applying an H gate in an initialization stage. After the initialization phase is finished, the state formula of the qubit is as follows:

Wherein t is _i I bit, p, of the target string bit _j The j-th bit is the pattern string bit. T is the target string bit length and P is the pattern string bit length. In total, log is needed ₂ N index string bits, T target string bits (i.e., the number of target string bits is equal to the target string bit length), and P pattern string bits, k representing the quantum state to which the index string bits correspond.

Step 1.3, cyclic displacement. And converting the target string bit into superposition of all states according to the index string bit by taking the index bit as a control bit. The state formula of the qubit after this stage is over is as follows:

the step essentially uses the superposition state of the index bits to shift the target string bits, and the state of the index string bits indicates the number of bits shifted and corresponds to different target string bit states. For example: when the index string bit (q ₁ q ₀ ) At 10, the cyclic shift line may be shifted by q ₁ CSWAP gate with bits as control bits to implement 2-bit tracking on target string bitsRing displacement, and so on. Therefore, log can be used ₂ N index bits, for the target string bit with length of T, realize arbitrary m bit displacement (m is less than or equal to T). An example of a cyclic shift of 2 bits for an 8-bit target string bit is shown in fig. 1. In fig. 1, two bits of a wired connection require switching operations using CSWAP, all of which have control bits q ₁ . By indexing the string bits q ₁ For the control bits, a first round of swapping is performed at intervals of 2, starting from the target bit 1. Each exchange may determine the position of the target bit downstream of the exchange, which is shown in fig. 1 as the position of the target bits 3,4,7,8 after the first round of exchange is fixed and does not participate in the next round of exchange. The second round of exchange is carried out with 4 as interval, and finally, the 2-bit cyclic displacement can be realized. More general description is for 2 ⁿ (n＝log ₂ T) bit target string bit shift 2 ^m Bit of 2 ^m ,2 ^m+1 ..2 ^n-1 Sequentially exchanged for intervals.

And 1.4, searching a solution state. The CNOT gate is used for realizing the XOR operation so as to identify the correct solution state |w > in the uniform superposition state |s >, and the state of the quantum bit after the operation is finished is as follows:

Step 1.5, use U _s Amplitude amplification is performed. Application U _s ＝2|s><s| -I (with respect to state |s>Is a reflection of (d) to state U _w |s>Wherein I represents a identity gate. Fig. 2 shows the state change process for two operations. State |s>Is (U) _s U _w ) Can be amplifiedSolving the amplitude of the state to let the initial state |s>More approximate solution state |w >。

FIG. 3 shows the use (U _s U _w ) Then, solve state |w>Probability amplification case of (a). In fact, each time a U is executed _s U _w Solution state |w>A portion of the probability is amplified. This step can maximize the solution state |w in the observed result by multiple applications>Is a probability of (2). The specific probability amplification is as follows:

|ψ _t >＝sin[(2t+1)θ]|w>+cos[(2t+1)θ]|s′>

let t times the above operations be applied _t >Representing the qubit state after t applications of the above operations. Then it is required at |psi _t >Is close enough to solution state |w>When the above operation is stopped to be repeatedly applied. When sin [ (2t+1) θ]Approaching 1, i ψ _t >Approach |w>At this time t approachesWhen T is sufficiently large, initial state |s>The amplitude of the solution states is small and when only one solution state is present, the +.>And t satisfies:

if there are R solution states,at this time t satisfies:

and 2, constructing an OQBTT algorithm. The method of this step is described in detail by taking the target sequence subject sequence= 'ACGT' (encoded as 00011011), words= 'a' (encoded as 00) as an example.

And 2.1, initializing. For a subject sequence of length N (i.e., the target sequence) and words of length M (i.e., the field), the encoded binary string lengths are T (t=2n) and P (p=2m), respectively. The register in which the subject sequence binary string is stored is referred to as a target string register (storing target string bits), the register in which the words binary string is stored is referred to as a pattern string register (storing pattern string bits), and the register in which the index string bits are stored is referred to as an index string register. The simplified truth table (see table 2) is extracted according to the cyclic shift corresponding truth table (see table 1).

Table 1 truth table for cyclic shift

Index string	Corresponding sequences in the target string register
		00	00011011
01	01101100
		10	10110001
11	11000110

According to table 1, in the find solution state step of the QGSA algorithm, the first 2 bits (2M) in the target string register and the sequence in the pattern register need to be compared sequentially using the CNOT gate. In fact, the last 6 bits (2N-2M) of the target string register do not participate in subsequent operations. To reduce the qubits in the line, this step may eliminate the last 6 bits in the destination register.

Therefore, the method requires log for building the quantum circuit ₂ N index string bits, 2M target string bits, 2M pattern string bits.

Thus, building a quantum wire in the above example requires a total of 2 index string bits, 2 target string bits, 2 pattern string bits.

I.e. the truth table can be reduced to the form in table 2:

table 2 simplified truth table

Index string	Corresponding sequences in the target string register
		00	00
01	01
		10	10
11	11

According to a simplified truth table (table 2), obtaining an index table, wherein the first column of the index table is different states of index string bits, the second column of the index table is the first 2M quantum bit corresponding states of target string bits corresponding to the index string bits in different states, and here, an index string is introduced, the index string refers to a binary string corresponding to the index string bit state, and the index string can indicate the state of the index string bits; the index table is constructed as shown in fig. 4.

In FIG. 4, the qubit in the index string register is q ₁ 、q ₀ . Qubit q in target string register ₂ 、q ₃ . For each target bit x (i.e., one bit in the target string bit), this step maintains an n×log count table, where each position in the count table has an initial value of 0 and the length of the target sequence is N. The value in the count table indicates the number of quantum gates added in the line corresponding to the target bit x. Fig. 5 shows the target bit q ₂ The initialized state of the correspondence count table.

And 2.2, adding a pointer. A pointer is added to the index string of the count table i. In a preferred embodiment, pointer addition may be in the following manner: if the number of 1 s in the index string is j, then a pointer is added starting from the ith row and jth column of the count table, the pointer points to the same column position in the count table, and the position corresponds to the position of 1 s in the index string, and the position of 1 s in the ith index string must be included. Fig. 6 shows the target bit q ₂ And adding pointers to the corresponding count table.

And 2.3, traversing the bit states of the index strings in the index table and updating the index table.

Step 2.3.1, index, represents the decimal values corresponding to the different states of the first column of index string bits of the index table, called the index. For each target bit x, q _xi When index=i, the state of the target bit x corresponding to the index table is referred to as a target bit value. The index is traversed in turn. The index string bit states 00, 01, 10, 11 have indices of 0,1,2,3, respectively.

Step 2.3.2 if index=0, and q _x0 =1 (here, q for convenience of explanation _x0 Refers to the state that the target bit X corresponds to when the index is 0), and an X gate is applied to the quantum wire to which the target bit X corresponds. If index is not equal to 0, the sum of all columns of the corresponding row in the count table is calculated. Temp=q if sum/2=0 _x0 Otherwise, the device can be used to determine whether the current,(it isMiddle temp is a variable that stores data). Comparing temp to the index corresponding target bit value q _xindex If temp. Q _xindex Then a gate is applied according to the index string. (if only one bit in the index string is 1, the CNOT gate is applied with that bit as the control bit, x as the target bit, and if the bits in the index string are 1, the Toffoli gate is applied with the bit as 1 as the control bit, x as the target bit).

Step 2.3.3, if a gate is added when the index is index, recording the position end of index bit 1 corresponding to index. The counter position pointed to by the pointer starting from the index row end column of the counter is all incremented by 1.

Here in target bits q ₂ For example (line reduction does not involve a pattern string register, only the lines of index string registers and target string registers are given here):

(1) Index string 00:

the index strings are all 0, and the corresponding value q of the target bit x in the truth table is observed _x0 If q _x0 If 1, then an X gate is applied to the target bit. The circuit after this process is completed is shown in fig. 7.

(2) Index string 01:

the process does not involve any parameter updates, so the line is unchanged.

(3) Index string 10:

current temp=0, and q _x2 ＝1。temp≠q _x2 Then a gate is applied according to the index string. The circuit after adding the CNOT gate at this time is shown in FIG. 8.

After adding the gate, the count table needs to be updated, and the current count table is updated as shown in fig. 9. FIG. 9 illustrates the count table update resulting from the add CNOT gate operation after traversing the index string 10. Traversing all indexes in turn according to the method to obtain the index added to q ₂ All gates on the bit. Then for the target bit q ₃ Traversing, finally obtaining the circuit as shown in fig. 10.

Fig. 10 shows the result of simplifying the sub-line using the OQCBTT method in the case of subject sequence= 'ACGT' (encoded as 00011011), words= 'a' (encoded as 00). For multiple target bits, this can be done in one traversal index, only requiring the maintenance of a count table for multiple target bits at the same time.

And step 3, training a QCMM model. The physical bit selected in the step is the 'ancestor' number of the physical machine "

The six qubits Q3, Q9, Q14, Q21, Q15 and Q10 on (ClosedBetaQC) and the topology of the physical bits on the physical machine selected for this step is shown in fig. 11.

And 3.1, modeling quantum circuit mapping. The quantum wire mapping problem is modeled as a classification problem. The essence of the line mapping is that each quantum bit (logical bit) in the logical line corresponds to a bit of a physical machine (physical bit), and it can be said that each logical bit is mapped onto a physical bit. This step treats the physical bits as a class, for m physical bits p, n logical bits l (m>n) mapping. To accomplish line mapping, this step may describe this classification task as f:l _j →p _i Where i=1, …, m. j=1, …, n. f is the mapping from logical bits to physical bits. Tag vector y= [ p ] for each mapped line ₁ ,…,p _m ]. The index of tag y indicates a logical bit whose corresponding value indicates the physical bit to which this logical bit should be mapped. In one embodiment, this step does not take into account various errors in the quantum wires, only the structure of the quantum wires, in order to reduce training parameters and model complexity.

This step uses the following features of the line as training data for the qcm model: the total number of logic bits on the line, the total number of CZ gates on the line, and specific information about the effect of CZ gates. Wherein the specific role information of the CZ gates is represented by an n×n matrix, matrix [ i ] [ j ] =1 indicating that bits i and j have a CZ gate. Based on the above features, this step extracts 38-dimensional data from each logic line in total.

And 3.2, generating data. Quantum wires for training are randomly generated according to the selected physical machine structure. { Ry, rz, CNOT } has been shown to be a common set of gates in quantum computing. Any quantum gate can be extended with this gate set. In one embodiment, for a CNOT gate in a common gate set, one CZ gate and two H gates may be used instead. Physical quantum computers provide methods of using Ry, rz, CZ, H quantum gates. Any quantum wire can be replaced on a physical quantum computer using wires formed by Ry, rz, CZ, H quantum gates. This step assumes that only four gates Ry, rz, CZ, H are present in the logic that needs to be mapped. Since only the double qubit CZ gate will introduce a SWAP operation during the mapping process among the four gate operations described above, this step only considers the structure of the double qubit gate when acquiring training data of the QCMM model. For this purpose this step randomly generates a 6-bit quantum wire containing only CZ gates.

And 3.3, marking data. Two algorithms provided by IBM are used to generate corresponding labels for line data. After generating the line data required for training, this step requires obtaining a corresponding tag for each line. To obtain the best label for each line, this step can be implemented using the well known mapping algorithms in IBM qisket, dense Layout and trivia Layout, etc. This step uses both algorithms to map the logic to the physical machines that have the specified constraints, using the mapping results of the least SWAP gates as labels. The process of marking each randomly generated logic is as follows:

step 3.3.1, calculate the mapping result of the logic circuit using the Dense Layout and Trivia Layout provided by qiskit.

Step 3.3.2, count the number of SWAP gates used in both mapping methods.

Step 3.3.3, selecting the mapping result using the least SWAP gate as the tag.

And 3.4, building a network. In this step, we build a network for training a model based on the neural layout idea. Based on the neural layout network, this step modifies the part of feature encoding, the structure of the neural layout output layer slot, and the structure of the output tag. For the characteristics of the input data, this step only considers the interaction information between the CNOT gates in the logic circuit; for the neural layout output layer, the step modifies the final output result, and the step does not consider the condition that the logic bits are not mapped any more; for the structure of the output label, the value in label y of the neural layout represents a logical bit, the index represents a physical bit, and the model label y value of this step is of opposite significance. Fig. 12 shows a network structure of a training QCMM, which includes an input layer, a shared layer, a Dropout layer, and individual SLOTs.

Fig. 13 shows a specific structure of a SLOT, which includes a dense1 layer, a dense2 layer, and a SLOT layer.

With reference to fig. 12 and 13, in the QCMM network designed in this embodiment, the input of the input layer is 38-dimensional, and 256 dimensions are output; the input layer is connected with the sharing layer, and the sharing layer outputs 1024-dimensional data; the shared layer is connected to a Dropout layer, which is connected to a plurality of SLOT layers. The single SLOT layer comprises a dense1 layer, a dense2 layer and a SLOT layer; the dense1 layer receives the output data of the Dropout layer, and the dense1 layer outputs 256-dimensional data; the dense1 layer is connected with the dense2 layer and outputs 128-dimensional data; the dense2 layer is connected with the slot layer and outputs 6-dimensional data.

Each SLOT structure in the modified neural layout network will generate a set of results, the step selects the category with the highest probability in each SLOT as the classifying result of the SLOT _i =j means that the ith logical bit should be mapped onto the jth physical bit. The classification results of all SLOTs together form an output result y. However, according to the above method, the same element may exist in y, which may cause two logical bits to be mapped onto the same physical bit, thereby generating a collision. In order to solve the collision problem, a collision processing layer is provided in this embodiment, and in order to facilitate calculation, the physical bits used in this step are encoded, and the structure after encoding is shown in fig. 14.

And 3.5, conflict processing. Conflicts generated by different SLOT structures are processed to obtain a mapping result.

Step 3.5.1, recording the number num of SLOTs, and storing the output result of each SLOT into a sList list, wherein sList= [ pList ₀ ,…,pList _num-1 ]Wherein pList _i Representing the result of the i+1st SLOT structure output; flag= [ flag ] ₀ ,…,flag _num-1 ]Wherein the flag is _i =1 indicates that the physical bit corresponding to the i+1th SLOT structure is determined, flag _i =0 represents uncertainty; y= [ y ] ₀ ,…,y _num-1 ]Wherein y is _i Representing the physical bit corresponding to the i+1th SLOT structure, each position of the initial y is set to-1; the initial oknum=0, and the value of oknum represents the number of SLOT structures for which the corresponding physical bit has been determined.

Step 3.5.2, if oknum=num, outputting a result y, otherwise, executing the next step.

Step 3.5.3, initialize an empty list, ntemp. Traversing sLists in turn, finding each pList therein _i Maximum probability pmax of (2) _i The method comprises the steps of carrying out a first treatment on the surface of the If a flag is _i Not equal to 1, pmax is to be _i Add to the list of ntemp, otherwise add-1 to the end of ntemp.

Step 3.5.4, ntempMax, represents the maximum value in ntemp, and ntempIndex represents the index of ntempMax in ntemp. yvalue represents a variable of the temporary data. The value of yvalue is set to be the value of the nterpMax in the pList _ntempIndex Is included in the index of (a).

Step 3.5.5 if yvalue is not in output result y, then y _ntempIndex The value of (2) is set to yvalue, and the flag is set _ntempIndex Setting 1, and adding 1 to the value of oknum; otherwise, pList is to be formed _ntempIndex [yvalue]The value of (2) is set to-1. Returning to step 3.5.2.

In this embodiment, we take the subject sequence= 'ACGT' (coded as 00011011), words= 'a' (coded as 00) as an example, and show the corresponding results of each part, so as to further illustrate the scheme of the present invention.

First portion QGSA:

the quantum gate used in this scheme is shown in fig. 15. In this embodiment, a quantum circuit is set up for the above-mentioned research case. The circuit diagram is shown in fig. 16.

Based on this case, the invention runs the QGSA algorithm on the virtual machine and compares with classical BLAST, and for the second step of classical BLAST, the time overhead mainly consists of two parts of database creation and database searching. In this embodiment, only the step of creating the database is considered, and the comparison of time consumption is shown in fig. 17.

In fig. 17, the horizontal axis corresponds to the number of runs, and the vertical axis corresponds to the time of each run. Wherein the dashed line corresponds to the run time of the second step of the classical BLAST algorithm and the solid line corresponds to the run time of the QGSA method.

As can be seen from fig. 17, 18, the QGSA method operates significantly more efficiently than the classical BLAST algorithm, where fig. 18 demonstrates that QGSA performs faster than the classical BLAST algorithm in statistical significance.

Second part OQCBTT:

the invention simplifies the quantum circuit diagram of the research case by using OQCBTT. Fig. 19 shows quantum wires that were simplified without using OQCBTT (part a in fig. 19) and after using OQCBTT (part b in fig. 19).

Part a of fig. 20 shows the number of quantum gates that simplify the front and back lines. Because the number of quantum gates in the line after the OQBTT is simplified is far lower than the number before the simplification, the method indicates that the OQBTT can remarkably reduce the calculation error of the quantum line corresponding to the current case on a physical machine. Since only logN qubits are needed to build a quantum line for a target sequence of length N, the OQCBTT algorithm can reduce the spatial complexity of the QGSA algorithm from O (N) to O (logN) (part b in fig. 20, solid lines represent reduced).

Finally, the present invention compares the time complexity of the OQCBTT method and the ZX-calculous method, and fig. 21 shows the time complexity of both reduction methods, where the horizontal axis represents the number of qubits in the line and the vertical axis represents the computation time of the algorithm. Fig. 21 shows that the time complexity of the OQCBTT method is much lower than the ZX-calcul method.

Third portion QCMM:

in this embodiment, a total of 5000 random 6-bit quantum wires are generated as a data set. Where the ratio of training set, test set and validation set is set to 0.8:0.1:0.1. Among 500 cases of test data, 364 cases of test data are obtained by introducing SWAP number less than or equal to the Dense method by the QCMM method, and the test data account for 72.8% of the total test data; 365 cases of SWAP number less than or equal to the result of the Trival method account for 73% of the total test data.

The logic circuit (part b in fig. 19) of the above study case is encoded and then sent to QCMM, and the output result is y= [2,5,4,1,0,3], that is, the qubits [0,1,2,3,4,5] of the logic circuit in part b in fig. 19 correspond to the bits [2,5,4,1,0,3] in the quantum computer, respectively. After the line mapping is completed and operated, the result is shown in fig. 22.

As can be seen from fig. 22, the hit probability of the correct matching position 00 does not reach 1, but is significantly higher than other positions due to the influence of noise. Thus, QCMM successfully maps logic onto quantum computers.

And as can be seen from part a of fig. 23, after QCMM quantum line mapping, only 1 SWAP operation is used in the quantum line (x-x in the figure indicates SWAP operation), while 2 SWAP operations are used in the line mapped in a general manner (e.g. Qiskit Dense Layout manner) (shown in part b of fig. 23). This demonstrates that QCMM effectively reduces SWAP operands in current study cases, significantly improving the accuracy of the final calculated results.

Any process or method descriptions herein in other ways may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present embodiment. The processor performs the various methods and processes described above. For example, method embodiments in the present solution may be implemented as a software program tangibly embodied on a machine-readable medium, such as a memory. In some embodiments, part or all of the software program may be loaded and/or installed via memory and/or a communication interface. One or more of the steps of the methods described above may be performed when a software program is loaded into memory and executed by a processor. Alternatively, in other embodiments, the processor may be configured to perform one of the methods described above in any other suitable manner (e.g., by means of firmware).

The logic and/or steps described elsewhere herein may be embodied in any readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A quantum technology-based gene search BLAST acceleration method, comprising:

s1, performing quantum bit coding on a target sequence and a field, and constructing an initial state, wherein the initial state comprises index string bits, target string bits and mode string bits; performing cyclic displacement operation, and converting the target string bits into superposition of states according to the index string bits; searching for a correct solution state, and amplifying the amplitude of the solution state;

S2, obtaining a truth table based on the corresponding relation between the index string bit state and the target string bit state in the S1 cyclic shift operation, extracting a simplified truth table, and obtaining the target string bit, the mode string bit and the index string bit according to the simplified truth table; traversing the index string and updating the count table to complete the construction of the quantum circuit, thereby obtaining a simplified quantum circuit; each target bit maintains a count table, and the target bit refers to one bit in target string bits; the index string refers to a binary string corresponding to the bit state of the index string;

2. The method according to claim 1, wherein in S1, the target sequence and the field are qubit encoded, and in the initial state of construction, the superposition state is constructed with H gate for the index string bit, and the quantum state is constructed with X gate for the target string bit and the mode string bit.

3. The method of claim 2, wherein after the initial state is constructed, the qubit state formula is:

4. The method according to claim 1, wherein in S1, the cyclic shift operation is specifically: the target string bit is shifted by utilizing the superposition state of the index string bit, and the state of the index string bit indicates the number of shifted bits and corresponds to different target string bit states; the state of the qubit after the shift operation is finished is:

5. The method of claim 4, wherein in S1, the method of finding the correct solution state is: the CNOT gate is used for realizing the XOR operation so as to identify the correct solution state |w > in the uniform superposition state |s >, and the state of the quantum bit after the operation is finished is as follows:

then pass through U _w Is |s>In the correct solution state |w>Adding negative phase, U _w The following are provided:

6. The method of claim 5, wherein for state U _w |s>By U _s Amplitude amplification is performed, wherein:

U _s ＝2|s><s| -I (with respect to state |s>Is a reflection of (d) to state U _w |s>；

I represents a unit gate.

7. The method according to claim 1, wherein the step S2 further comprises:

8. The method of claim 7, wherein S2 further comprises: obtaining an index table according to a simplified truth table, wherein the first column of the index table is in different states of index string bits, and the second column of the index table is in the first P quantum bit corresponding states of target string bits corresponding to the index string bits in different states, wherein P represents the length of mode string bits;

traversing an index table;

index represents decimal values corresponding to different states of the first column index string bit of the index table; x represents a target bit; q _xi When index=i, the state corresponding to the target bit x in the index table is called a target bit value; if index in index table is 0 and corresponds to q _x0 If the bit is 1, an X gate is applied to the quantum circuit corresponding to the target bit X; if index is not 0, calculating sum of all columns of the corresponding row of the row index in the count table of the target bit x; temp=q if sum/2=0 _x0 Otherwise, the device can be used to determine whether the current,comparing temp to target bit value q _xindex If temp. Q _xindex Applying a gate according to the index string; wherein temp is a variable for temporarily storing data;

if one gate is added when index=i, the position end whose corresponding index string bit is 1 is recorded, and the count table positions pointed to by the pointers starting from the i+1th row end column of the count table of the target bit x are all incremented by 1.

9. The method of claim 1, wherein the QCMM model structure comprises: an input layer, a shared layer, a Dropout layer, and a plurality of SLOT structures;

10. The method of claim 9, wherein S3 further comprises processing conflicts between individual SLOT structures:

s3-1, recording the number num of SLOT structures, and storing the output result of each SLOT structure into a sList list, wherein sList= [ pList ₀ ,…,pList _num-1 ]Wherein pList _i Representing the result of the i+1st SLOT structure output; flag= [ flag ] ₀ ,…,flag _num-1 ]Wherein the flag is _i =1 indicates that the physical bit corresponding to the i+1th SLOT structure is determined, flag _i =0 represents uncertainty; y= [ y ] ₀ ,…,y _num-1 ]Wherein y is _i Representing the physical bit corresponding to the i+1th SLOT structure, each position of the initial y is set to-1; initial oknum=0, the value of oknum representing the number of SLOT structures for which the corresponding physical bit has been determined;

s3-2, outputting a result y if oknum=num, otherwise executing S3-3;

s3-3, initializing an empty list ntmp; traversing sList list in turn to find pList _i Maximum probability pmax of (2) _i The method comprises the steps of carrying out a first treatment on the surface of the If a flag is _i Not equal to 1, pmax is to be _i Adding to the list of ntemp, otherwise adding-1 to the end of ntemp;

s3-4, searching the maximum value of the ntempMax in the ntemp, and indexing the ntempMax in the ntemp, setting the value of yvalue to be the value of the ntempMax in pList _ntempIndex Index of (a);

s3-5, if yvalue is not in the output result y, y will be _ntempIndex The value of (2) is set to yvalue, and the flag is set _ntempIndex Setting 1, and adding 1 to the value of oknum; otherwise, pList is to be formed _ntempIndex [yvalue]The value of (1) is set to-1 and S3-2 is returned.

11. A quantum technology-based gene search BLAST acceleration system, comprising: