Hairpin Loop Method For Double Strand Polynucleotide Sequencing Using

< Back

Hairpin Loop Method For Double Strand Polynucleotide Sequencing Using Transmembrane Pores

Abstract: The invention relates to a new method of sequencing a double stranded target polynucleotide. The two strands of the double stranded target polynucleotide are linked by a bridging moiety. The two strands of the target polynucleotide are separated using a polynucleotide binding protein and the target polynucleotide is sequenced using a transmembrane pore.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

10 January 2014

Publication Number

23/2015

Publication Type

INA

Invention Field

MICRO BIOLOGY

Status

Email

sna@sna-ip.com

Parent Application

Patent Number

Legal Status

Grant Date

2019-11-15

Renewal Date

Applicants

OXFORD NANOPORE TECHNOLOGIES LIMITED

Edmund Cartwright House 4 Robert Robinson Avenue Oxford Science Park Oxford Oxfordshire OX4 4GA

Inventors

1. BROWN Clive

Oxford Nanopore Technologies Limited Edmund Cartwright House 4 Robert Robinson Avenue Oxford Science Park Oxford Oxfordshire OX4 4GA

2. CLARKE James

Oxford Nanopore Technologies Limited Edmund Cartwright House 4 Robert Robinson Avenue Oxford Science Park Oxford Oxfordshire OX4 4GA

3. HALL Graham

Oxford Nanopore Technologies Limited Edmund Cartwright House 4 Robert Robinson Avenue Oxford Science Park Oxford Oxfordshire OX4 4GA

4. HARPER Gavin

Oxford Nanopore Technologies Limited Edmund Cartwright House 4 Robert Robinson Avenue Oxford Science Park Oxford Oxfordshire OX4 4GA

5. HERON Andrew

Oxford Nanopore Technologies Limited Edmund Cartwright House 4 Robert Robinson Avenue Oxford Science Park Oxford Oxfordshire OX4 4GA

6. WHITE James

Oxford Nanopore Technologies Limited Edmund Cartwright House 4 Robert Robinson Avenue Oxford Science Park Oxford Oxfordshire OX4 4GA

Specification

Field of the invention
The invention relates to a new method of sequencing a double stranded target
polynucleotide. The two strands of the target polynucleotide are linked by a bridging moiety.
The two strands of the target polynucleotide are separated by a polynucleotide binding protein.
Sequencing of the target polynucleotide is carried out using a transmembrane pore.
Background of the invention
There is currently a need for rapid and cheap nucleic acid (e.g. DNA or RNA) sequencing
technologies across a wide range of applications. Existing technologies are slow and expensive
mainly because they rely on amplification techniques to produce large volumes of nucleic acid
and require a high quantity of specialist fluorescent chemicals for signal detection.
Transmembrane pores (nanopores) have great potential as direct, electrical biosensors for
polymers and a variety of small molecules. In particular, recent focus has been given to
nanopores as a potential DNA sequencing technology.
When a potential is applied across a nanopore, there is a drop in the current flow when an
analyte, such as a nucleotide, resides transiently in the barrel for a certain period of time.
Nanopore detection of the nucleotide gives a current blockade of known signature and duration.
The concentration of a nucleotide can then be determined by the number of blockade events
(where an event is the translocation of an analyte through the nanopore) per unit time to a single
pore.
In the "Strand Sequencing" method, a single polynucleotide strand is passed through the
pore and the nucleotides are directly identified. Strand Sequencing can involve the use of a
nucleotide handling enzyme, such as Phi29 DNA polymerase, to control the movement of the
polynucleotide through the pore. Nanopore sequencing, using enzymes to control the
translocation of dsDNA through the nanopore, has in the past focused on only reading one strand
of a dsDNA construct. When the enzyme is used as polymerase, the portion to be sequenced is
single stranded. This is fed through the nanopore and the addition of dNTPs at a primer/template
junction on top of the strand pulls the single stranded portion through the nanopore in a
controlled fashion. The majority of the published literature uses this approach to control strand
movement (Lieberman et al. (2010) "Processive Replication of Single DNA Molecules in a
Nanopore Catalyzed by phi29 DNA Polymerase" J . Am. Chem. Soc. 132(50): 17961-17972). In
the polymerase mode, the complementary strand cannot be sequenced. When the enzyme is used
as a double stranded exonuclease as published (Lieberman et al. (2010) supra), the unzipping of
the complementary strand is accompanied by the digestion of this strand. It is therefore not
possible to sequence the complementary strand with this approach. The complementary strand
cannot therefore be captured and sequenced by the nanopore. Hence, only half of the DNA
information in dsDNA is sequenced.
In more detail, when both polymerase and exonuclease activity are inhibited (by running
without tri-phosphates bases and with excess of EDTA), enzymes such as Phi29 DNA
polymerase have been shown to unzip dsDNA when pulled through a nanopore by a strong
applied field (Fig. 1) (Lieberman et al. (2010) supra). This has been termed unzipping mode.
Unzipping mode implies that it is the unzipping of dsDNA above or through the enzyme, and
importantly, it is the requisite force required to disrupt the interactions of both strands with the
enzyme and to overcome the hydrogen bonds between the hybridised strands. In the past the
second complementary strand was considered to be essential for efficient enzyme binding. In
addition, it was thought that the requisite force required to unzip the strand above or in the
enzyme was a dominant braking effect slowing DNA through the pore. Herein we describe how
enzymes such as Phi29 DNA polymerase can act as a molecular brake for ssDNA, enabling
sufficient controlled movement through a nanopore for sequencing around the hairpin turns of
specially designed dsDNA constructs to sequence both the sense and anti-sense strands of
dsDNA (Fig. 2). Unzipping mode has in the past predominantly been performed using templates
where the distal part of the analyte is blunt ended (Fig. 1). Small hairpins have occasionally
been used, but were only included to simplify DNA design. Previous work has not considered
the use of hairpins on long dsDNA to provide the ability to read both strands. This is because the
unzipping movement model has not considered P 29 DNA polymerase or related enzymes
capable of controlling the movement of the DNA when entering ssDNA regions (i.e. when
moving around the hairpin and along the anti-sense strand - Fig. 2).
Summary of the invention
The inventors have surprisingly demonstrated that both strands of a double stranded
target polynucleotide can be sequenced by a nanopore when the two strands are linked by a
bridging moiety and then separated. Furthermore, the inventors have also surprisingly shown
that an enzyme, such as Phi29 DNA polymerase, is capable of separating the two strands of a
double stranded polynucleotide, such as DNA, linked by a bridging moiety and controlling the
movement of the resulting single stranded polynucleotide through the transmembrane pore.
The ability to sequence both strands of a double stranded polynucleotide by linking the
two strands with a bridging moiety has a number of advantages, not least that both the sense and
anti-sense strands of the polynucleotide can be sequenced. These advantages are discussed in
more detail below.
Accordingly, the invention provides a method of sequencing a double stranded target
polynucleotide, comprising:
(a) providing a construct comprising the target polynucleotide, wherein the two
strands of the target polynucleotide are linked at or near one end of the target
polynucleotide by a bridging moiety;
(b) separating the two strands of the target polynucleotide to provide a single
stranded polynucleotide comprising one strand of the target polynucleotide
linked to the other strand of the target polynucleotide by the bridging moiety;
(c) moving the single stranded polynucleotide through a transmembrane pore such
that a proportion of the nucleotides in the single stranded polynucleotide
interact with the pore; and
(d) measuring the current passing through the pore during each interaction and
thereby determining or estimating the sequence of the target polynucleotide,
wherein the separating in step (b) comprises contacting the construct with a
polynucleotide binding protein which separates the two strands of the target
polynucleotide.
The invention also provides:
a kit for preparing a double stranded target polynucleotide for sequencing comprising
(a) a bridging moiety capable of linking the two strands of the target polynucleotide
at or near one end and (b) at least one polymer;
a method of preparing a double stranded target polynucleotide for sequencing,
comprising:
(a) linking the two strands of the target polynucleotide at or near one end with a
bridging moiety; and
(b) attaching one polymer to one strand at the other end of the target polynucleotide
and thereby forming a construct that allows the target polynucleotide to be sequenced
using a transmembrane pore;
a method of sequencing a double stranded target polynucleotide, comprising:
(a) providing a construct comprising the target polynucleotide, wherein the two
strands of the target polynucleotide are linked at or near one end of the target
polynucleotide by a bridging moiety;
(b) separating the two strands of the target polynucleotide to provide a single
stranded polynucleotide comprising one strand of the target polynucleotide linked to
the other strand of the target polynucleotide by the bridging moiety;
(c) synthesising a complement of the single stranded polynucleotide, such that the
single stranded polynucleotide and complement form a double stranded
polynucleotide;
(d) linking the two strands of the double stranded polynucleotide at or near one end
of the double stranded polynucleotide using a bridging moiety;
(e) separating the two strands of the double stranded polynucleotide to provide a
further single stranded polynucleotide comprising the original single stranded
polynucleotide linked to the complement by the bridging moiety;
(f) moving the complement through a transmembrane pore such that a proportion of
the nucleotides in the complement interact with the pore; and
(g) measuring the current passing through the pore during each interaction and
thereby determining or estimating the sequence of the target polynucleotide,
wherein the separating in step (e) comprises contacting the construct with a
polynucleotide binding protein which separates the two strands of the target
polynucleotide;
an apparatus for sequencing a double stranded target polynucleotide, comprising: (a)
a membrane; (b) a plurality of transmembrane pores in the membrane; (c) a plurality
of polynucleotide binding proteins which are capable of separating the two strands of
the target polynucleotide; and (d) instructions for carrying out the method of the
invention; and
an apparatus for sequencing a double stranded target polynucleotide, comprising: (a)
a membrane; (b) a plurality of transmembrane pores in the membrane; and (c) a
plurality of polynucleotide binding proteins which are capable of separating the two
strands of the target polynucleotide, wherein the apparatus is set up to carry out the
method of the invention. .
Description of the Figures
Fig. 1 shows a schematic of enzyme controlled dsDNA and ssDNA translocation through
a nanopore. An enzyme (e.g. Phi29 DNA polymerase) that is incubated with dsDNA having an
ssDNA leader binds at the ssDNA-dsDNA interface. DNA-enzyme complexes are captured by a
nanopore under an applied field. Under the field, the template strand of the DNA is slowly
stripped through the enzyme in a controlled base-by-base manner, in the process unzipping the
complementary primer strand of the dsDNA in or above the enzyme. Once the enzyme reaches
the end of the dsDNA it falls off the D A, releasing it through the nanopore.
Fig. 2 shows another schematic of enzyme controlled dsDNA and ssDNA translocation
through a nanopore. The dsDNA has a hairpin turn linking the sense and anti-sense strands of the
dsDNA. Once the enzyme reaches the hairpin it remains bound to the DNA, proceeds around the
hairpin turn, and along the anti-sense strand. In the hairpin and antisense regions the enzyme
functions as an ssDNA molecular brake, continuing to sufficiently control translocation of the
DNA through the nanopore to sequence the DNA.
Fig. 3 shows a schematic overview of reading around a hairpin of dsDNA using the
ability of the enzyme to control movement in ssDNA regions. The dsDNA has a 5'-ssDNA
leader to allow capture by the nanopore. This is followed by a dsDNA section, where the sense
and anti-sense strands are connected by a hairpin. The hairpin can optionally contain markers
(e.g. abasic residues, shown in Fig. 3 as a cross) that are observed during sequencing, which
permit simple identification of the sense and anti-sense strands during sequencing. The 3'-end of
the anti-sense strand can optionally also have a 3'-ssDNA overhang, which if greater than ~20
bases allows full reading of the anti-sense strand (the read-head of the nanopore is -20 bases
downstrand from the top of the DNA in the enzyme).
Fig. 4 shows a schematic of the DNA-Enzyme-nanopore complex (left) sequenced in
unzipping mode through MspA nanopores using Phi29 DNA polymerase, and the consensus
sequence obtained from them (right). Section 1 marks the sense section of DNA, and section 2
marks the anti-sense section. This figure shows DNA sequencing of a short dsDNA construct. In
this construct the dsDNA section is not connected by a hairpin, so the enzyme falls off the end of
the DNA, and only the template/sense strand is sequenced (except for the last -20 bases).
Fig. 5 shows a schematic of the DNA-Enzyme-nanopore complex (left) sequenced in
unzipping mode through MspA nanopores using Phi29 DNA polymerase, and the consensus
sequence obtained from them (right). Section 1 marks the sense section of DNA, and section 2
the anti-sense section. DNA sequencing of a short dsDNA construct with a hairpin. In this
construct the enzyme moves along the sense strand, around the hairpin loop, and down the antisense
strand, permitting sequencing of both the sense and the first part of the anti-sense strand.
Fig. 6 shows a schematic of the DNA-Enzyme-nanopore complex (left) sequenced in
unzipping mode through MspA nanopores using Phi29 DNA polymerase, and the consensus
sequence obtained from them (right). Section 1 marks the sense section of DNA, and section 2
the anti-sense section. Similar to Fig. 5, this construct permits sequencing of both the sense and
anti-sense strands, but the additional 3'-ssDNA overhang permits reading of the full length of the
anti-sense strand before the enzyme falls off the end of the DNA.
Fig. 7 shows a schematic of the DNA-Enzyme-nanopore complex (left) sequenced in
unzipping mode through MspA nanopores using Phi29 DNA polymerase, and the consensus
sequence obtained from them (right). Section 1 marks the sense section of DNA, and section 2
the anti-sense section. Similar to Fig. 5, this construct permits sequencing of both the sense and
anti-sense strands, however, this construct has a single abasic residue (shown as a cross) in the
hairpin, which provides a clear marker in the DNA sequence to identify the sense and anti-sense
sections.
Fig. 8 shows the consensus DNA sequence of UA02 through MspA. Section 1 marks the
homopolymeric '-overhang initially in the nanopore. Section 2 marks the sense section of the
DNA strand. Section 3 marks the turn. Section 4 marks the anti-sense region of the DNA strand.
The polynucleotide sequence that corresponds to each section is shown below that section
number.
Fig. 9 shows a schematic of a genomic template for unzipping through MspA nanopores
using Phi29 DNA polymerase. It shows a general design outline for creating dsDNA suitable for
reading around hairpins. The constructs have a leader sequence with optional marker (e.g. abasic
DNA) for capture in the nanopore, and hairpin with optional marker, and a tail for extended
reading into anti-sense strand with optional marker.
Fig. 10 shows a schematic of the adapter design for ligating ssDNA overhangs (left) and
hairpin turns (right) onto genomic dsDNA. X = abasic DNA. Choi = cholesterol-TEG DNA
modification.
Fig. 11 shows typical polymerase controlled DNA movement of a 400mer-hairpin
through MspA using Phi29 DNA polymerase. Sense region = abasic 1 to 2. Anti-sense region =
abasic 2 to 3.
Fig. 12 shows a consensus DNA sequence profile from multiple polymerase controlled
DNA movements of a 400mer-hairpin through MspA. Sense region = abasic 1 to 2 . Anti-sense
region = abasic 2 to 3.
Fig. 13 shows a schematic of an alternative sample preparation for sequencing. A
construct is illustrated comprising the target polynucleotide and a bridging moiety (hairpin)
linking the two strands of the target polynucleotide. The construct also comprises a leader
polymer (a single stranded sequence), a tail polymer (also a single stranded sequence) and an
abasic marker region within the leader. The marker may prevent the enzyme from making the
template completely blunt ended i.e. filling in opposite the required leader ssDNA. A strand
displacing polymerase (nucleic acid binding protein) separates the two strands of the construct,
initiating either via a complementary primer or by protein primed amplification from the tail
polymer. A complement is generated to the resulting single stranded polynucleotide. The
complement and the original sense and antisense single stranded polynucleotide analyte can be
further modified by addition of a second bridging moiety (hairpin).
Fig. 14 shows a specific preparation of the construct comprising the target
polynucleotide.
Fig. 1 shows where amplification may be added as part of the sample preparation to aid
the detection of epigenetic information. A nucleotide has been constructed so that the following
information is read through the pore: sense (original), antisense (original), bridging moiety, sense
(replicate), antisense (replicate). Information on the methylated base (mC) is therefore obtained
four times.
Fig. 16 shows how RNA can be sequenced. A bridging moiety is attached to a piece of
R A and the DNA reverse complement added to the RNA via a reverse transcriptase. The RNA
is read, followed by the DNA of the reverse complement.
Fig. 17 shows a schematic of helicase controlled dsDNA and ssDNA translocation
through a nanopore. The dsDNA has a hairpin turn linking the sense and anti-sense strands of the
dsDNA. Once the enzyme reaches the hairpin it remains bound to the DNA, proceeds around the
hairpin turn, and along the anti-sense strand. In the hairpin and antisense regions the enzyme
functions as an ssDNA molecular brake, continuing to sufficiently control translocation of the
DNA through the nanopore to sequence the DNA.
Fig. 18 shows the polynucleotide MONO hairpin construct (SEQ ID NOs: 29 to 35) used
in Example 4.
Fig. 1 shows a typical helicase controlled DNA movement of a 400 bp hairpin (SEQ ID
NOs: 29 to 35 connected as shown in Fig. 18) through an MspA nanopore (MS(B1-G75S-G77SL88N-
Q126R)8 MspA (SEQ ID NO: 2 with the mutations G75S/G77S/L88N/Q126R)). Sense =
region 1. Anti-sense = region 2.
Fig. 20 shows the beginning of a typical helicase controlled DNA movement of a 400 bp
hairpin (SEQ ID NOs: 29 to 35 connected as shown in Fig. 18) through an MspA nanopore
(MS(B1-G75S-G77S-L88N-Q126R)8 MspA (SEQ ID NO: 2 with the mutations
G75S/G77S/L88N/Q126R)). The polyT region at the beginning of the sequence is highlighted
with a * and the abasic DNA bases as a #.
Fig. 2 1 shows the transition between the sense and antisense regions of a typical helicase
controlled DNA movement of a 400 bp-hairpin (SEQ ID NOs: 29 to 35 connected as shown in
Fig. 18) through an MspA nanopore (MS(B1-G75S-G77S-L88N-Q126R)8 MspA (SEQ ID NO:
2 with the mutations G75S/G77S/L88N/Q126R)). The transition region between the sense and
antisense regions of the sequence is highlighted by a * , the sense region labeled 1 and the
antisense region labeled 2 .
Fig. 22 shows an example sample prep method for forming DUO hairpin constructs. The
double stranded DNA analyte is contacted by and modified to contain a Y-shaped adapter (the
sense strand (SEQ ID NO: 29 attached to SEQ ID NO: 30 via four abasic DNA bases) of this
adaptor contains the 5' leader, a sequence that is complementary to the tether (SEQ ID NO: 35,
which at the 3' end of the sequence has six iSpl 8 spacers attached to two thymine residues and a
3' cholesterol TEG) and 4 abasics and the antisense half of the adaptor contains a 3' hairpin
(SEQ ID NO: 31)) at one end of the duplex and a hairpin (SEQ ID NO: 32) at the other. The Yshaped
adapter itself also carries a 3'-hairpin (SEQ ID NO: 31), which allows extension either by
a polymerase or by ligation. This extension is preferentially carried out by a mesophilic
polymerase that has strand displacement activity. As the polymerase extends from the 3' of the
Y-shaped adapter hairpin (SEQ ID NO: 31) it copies the antisense strand (SEQ ID NO: 34) and
so displaces the original sense strand (SEQ ID NO: 33). When the polymerase reaches the end of
the antisense strand (SEQ ID NO: 34) it fills-in opposite the hairpin (SEQ ID NO: 32) and then
begins to fill-in opposite the now single stranded and original sense strand (SEQ ID NO: 33).
Extension is then halted by a section of abasic or spacer modifications (other possible
modifications which could halt enzyme extension include RNA, PNA or morpholino bases and
iso-dC or iso-dG) to leave the 5'-region of the Y-shaped adapter single stranded (SEQ ID NO:
29).
Fig. 23 shows the specific preparation method used in Example 5 for preparing a DUO
hairpin construct (SEQ ID NOs: 29 to 36 connected as shown in Fig. 25). A -400 bp region of
PhiX 174 was PC amplified with primers containing Sad and Kpnl restriction sites (SEQ ID
Nos: 27 and 28 respectively). Purified PCR product was then Sad and Kpnl digested before aYshaped
adapter (sense strand sequence (SEQ ID NO: 29 attached to SEQ ID NO: 30 via four
abasic DNA bases) is ligated onto the 5' end of SEQ ID NO: 33 and the anti-sense strand (SEQ
ID NO: 31) is ligated onto the 3' end of the SEQ ID NO: 34) and a hairpin (SEQ ID NO: 32,
used to join SEQ ID NO's: 33 and 34) were ligated to either end, using T4 DNA ligase (See Fig.
18 for final DNA construct). The doubly ligated product was PAGE purified before addition of
lenow DNA polymerase, SSB and nucleotides to allow extension from the Y-shaped adapter
hairpin (SEQ ID NO: 31). To screen for successful DUO product a series of mismatch restriction
sites were incorporated into the adapter sequences, whereby the enzyme will cut the analyte only
if the restriction site has been successfully replicated by the DUO extension process.
Fig. 24 shows that the adapter modified analyte (MONO, SEQ ID NOs: 29-35 connected
as shown in Fog. 18) in the absence of polymerase does not digest with the restriction enzymes
(see gel on the left, Key: M = Mfel, A = Agel, X = Xmal, N = NgoMIV, B = BspEl), due to the
fact they are mismatched to one another. However, on incubation with polymerase there is a
noticeable size shift and the shifted product (DUO, SEQ ID NOs: 29-36 connected as shown in
Fig. 25) now digests as expected with each of the restriction enzymes (see gel on the right, Key:
M = Mfel, A = Agel, X = Xma\, N = NgoMIV, B = BspEl).
Fig. 25 shows the polynucleotide DUO hairpin construct (SEQ ID NOs: 29 to 36) used in
Examples 6.
Fig. 26 shows two typical helicase controlled DNA movements for the DUO hairpin
construct (SEQ ID NOs: 29 to 36 connected as shown in Fig. 25) through an MspA nanopore
(MS(B1-G75S-G77S-L88N-Q126R)8 MspA (SEQ ID NO: 2 with the mutations
G75S/G77S/L88N/Q126R)). Sense original = region 1. Anti-sense original = region 2 . Sense
replicate = region 3. Anti-sense replicate = region 4.
Fig. 27 shows an expanded view of a typical helicase controlled DNA movement for the
DUO hairpin construct (SEQ ID NOs: 29 to 36 connected as shown in Fig. 25) through an MspA
nanopore (MS(B1-G75S-G77S-L88N-Q126R)8 MspA (SEQ ID NO: 2 with the mutations
G75S/G77S/L88N/Q126R)). Sense original = region 1. Anti-sense original = region 2. Sense
replicate = region 3. Anti-sense replicate = region 4.
Fig. 28 shows an expanded view of a typical transition between the sense original and
antisense original regions of the DUO hairpin construct (SEQ ID NOs: 29 to 36 connected as
shown in Fig. 25) when under helicase controlled DNA movement through an MspA nanopore
(MS(B1-G75S-G77S-L88N-Q126R)8 MspA (SEQ ID NO: 2 with the mutations
G75S/G77S/L88N/Q126R)). Sense original = region 1. Anti-sense original = region 2 .
Description of the Sequence Listing
SEQ ID NO: 1 shows the codon optimised polynucleotide sequence encoding the NNNRRK
mutant MspA monomer.
SEQ ID NO: 2 (also referred to as "Bl") shows the amino acid sequence of the mature
form of the NNN-RRK mutant of the MspA monomer. The mutant lacks the signal sequence
and includes the following mutations: D90N, D91N, D93N, Dl 18R, D134R and E139K. These
mutations allow DNA transition through the MspA pore.
SEQ ID NO: 3 shows the polynucleotide sequence encoding one subunit of a-hemolysin-
E 111N/K147N (a-HL-NN; Stoddart et al, PNAS, 2009; 106(19): 7702-7707).
SEQ ID NO: 4 shows the amino acid sequence of one subunit of a-HL-NN.
SEQ ID NO: 5 shows the codon optimised polynucleotide sequence encoding the Phi29
DNA polymerase.
SEQ ID NO: 6 shows the amino acid sequence of the Phi29 DNA polymerase.
SEQ ID NOs: 7 to 9 show the amino acid sequences of the mature forms of the MspB, C
and D mutants respectively. The mature forms lack the signal sequence.
SEQ ID NOs.: 10 to 15 show the sequences used to illustrate homopolymer reads.
SEQ ID NOs: 16 to 36 show the sequences used in the Examples.
Detailed description of the invention
It is to be understood that different applications of the disclosed products and methods
may be tailored to the specific needs in the art. It is also to be understood that the terminology
used herein is for the purpose of describing particular embodiments of the invention only, and is
not intended to be limiting.
In addition as used in this specification and the appended claims, the singular forms "a",
"an", and "the" include plural referents unless the content clearly dictates otherwise. Thus, for
example, reference to "a pore" includes two or more such pores, reference to "a nucleic acid
sequence" includes two or more such sequences, and the like.
All publications, patents and patent applications cited herein, whether supra or infra, are
hereby incorporated by reference in their entirety.
Methods of the invention
The invention provides a method for sequencing a double stranded target polynucleotide.
The method comprises linking the two strands of the target polynucleotide by a bridging moiety.
The two strands of the target polynucleotide are separated to provide a single stranded
polynucleotide by contacting the construct comprising the target polynucleotide with a
polynucleotide binding protein. The single stranded polynucleotide is moved through a
transmembrane pore. A proportion of the nucleotides in the single stranded polynucleotide
interact with the pore. The current passing through the pore is measured during each interaction
and the sequence of the target polynucleotide is estimated or determined. The target
polynucleotide is therefore sequenced using Strand Sequencing. This method may be referred to
herein as the "MONO" method.
As discussed above, linking the two strands of the target polynucleotide by a bridging
moiety allows both strands of the target polynucleotide to be sequenced by the transmembrane
pore. This method is advantageous because it doubles the amount of information obtained from
a single double stranded target polynucleotide construct. Moreover, because the sequence in the
complementary 'anti-sense' strand is necessarily orthogonal to the sequence of the 'sense'
strand, the information from the two strands can be combined informatically. Thus, this
mechanism provides an orthogonal proof-reading capability that provides higher confidence
observations.
Furthermore, the other major advantages of the method of the invention are:
1) Coverage of missed nucleotides: the method substantially minimises issues of any
missed nucleotides or groups of nucleotides(e.g. due to movement issues such as the strand
slipping through the pore), since any states that might be missed in one strand are likely to be
covered by the orthogonal information obtained from its complement region.
2) Coverage of problematic sequence motifs: any difficult to sequence motifs are covered
by the orthogonal and opposite information in the complementary strand, which having a
different sequence will not have the same sequence dependent issues. For example, this is
particularly relevant for sequence motifs that produce only small changes in current, or have
similar current levels - i.e. consecutive base motifs that when moved through the nanopore
produce the same current block, and are therefore not observed as there is no step change in
current. Any similar current levels from one sequence motif will be covered by the entirely
different current levels obtained from its orthogonal sequence in the complement strand.
In addition to the advantages discussed above there are a number of special cases where
the concept of reading both strands of the double stranded polynucleotide can be utilized to
provide further benefits.
1. Epigenetic information
Being able to identify epigenetic information (such as 5-methylcytosine of 5-
hydroxymenthylcytosine nucleotides) or damaged bases within a natural DNA strand is desirable
in a wide range of applications. At present, it is difficult to extract this information as the
majority of DNA sequencing technologies rely on DNA amplification as part of their sequencing
chemistry. This information can be extracted, but requires chemical treatment followed by
amplification, both of which can introduce errors.
Nanopore sequencing is also a single molecule sequencing technology and therefore can
be performed without the need of DNA amplification. It has been shown that nanopores can
detect modifications to the standard four DNA nucleotides. Reading both strands of the
polynucleotide can be useful in detecting DNA modifications in situations where a modified base
behaves in a similar way (generates a similar current signal) to another base. For example if
methylcytosine (mC) behaves in a similar way to thymidine there is an error associated with
assigning a mC to a T. In the sense strand, there is a probability of the base being called a mC or
a T. However, in the anti-sense strand, the corresponding base may appear as a G with a high
probability. Thus by "proof reading" the anti-sense strand, it is highly likely that the base in the
sense strand was a mC rather than a T.
Reading the sense and the anti-sense strand can be performed without the need of
amplification or replication. However, amplification or replication may be added as part of the
sample preparation to aid the detection of epigenetic information. A nucleotide strand may be
constructed (described in detail below) where the following information is read through the
nanopore in the following order: sense (original), antisense (original), bridging moiety , sense
(complement), antisense (complement) (Fig. 15).
In this scheme, information on the methylated base will be obtained four times. If the
epigenetic base is in the original sense strand (in this case, mC), the following information will
be obtained with a high probability: sense (original)-mC, anti-sense (original)-G, sense
(complement)-C, and anti-sense (compliment)-G. It is clear that the original sense read and the
replicated sense read will give different results, while the both anti-sense reads will yield the
same base call. This information can be used to indicate the position of the epigenetic base in the
original sense strand.
2. RNA-DNA double reads
A similar scheme can be used to sequence RNA. A bridging moiety can be attached to a
piece of RNA and the DNA reverse complement added to the RNA via a reverse transcriptase
(resulting construct shown in Fig. 16)
In this scheme, the RNA is read followed by a DNA read of the reverse complement.
Information irom both the RNA and the DNA reads can be combined to increase the accuracy of
determining or estimating the RNA sequence. For example, if a uracil base (U) in RNA gives a
similar read to a cytosine, then the corresponding base could be used to resolve this error. If the
corresponding DNA base is G, then it is highly likely that the RNA base was a C, however if the
DNA base is called as an A, then it is likely that the RNA base was a U.
3. Homopolymer reads
Homopolymer reads may be a problem for single molecule nanopore sequencing. If the
homopolymer region is longer than the reading section of the pore, the length of the
homopolymer section will be difficult to determine.
It has already been shown that Phi29 can be used to read around a hairpin, allowing the
sense and the antisense strand to be read. Amplification can be used to generate the antisense
strand using a polymerase and a set of regular DNA triphosphates; dTTP, dGTP, dATP, dCTP.
To overcome the problem of homopolymer reads, the antisense strand can be synthesised with
the addition of a different base in combination with the original dTTP, dGTP, dATP, dCTP. This
could be a natural base analogue such as inosine (I). The base will have a random chance of
incorporating compared to the correct natural base and the insertion rates can be controlled by
varying the concentration of the triphosphate species.
Through the addition of the alternative base, there will be a probability of an alternative
base being inserted into the reverse complement of a homopolymer region. The result of this is
that the homopolymer run will be reduced in length to a point where it can be read by the reading
section of the nanopore. For example, a homopolymer group of AAAAAAAAAAAA (SEQ ID
NO: 10) will have random insertions of the alternative base and may give TTTITTIITTTl (SEQ
ID NO: 11) (where I is inosine).
Original DNA + Hairpin (SEQ ID NO: 12):
5'-TT TTTTTTTTTTTTTTTTTTXXXXXTGTACTGCCGTACGTAAAAAAATAGCTGATCGTACTT
ACTAGCATGT T
(abasic = X)
Regular Conversion (SEQ ID NO: 13):
5'-TT TTTTTTTTTTTTTTTTTTXXXXXTGTACTGCCGTACGTAAAAAAATAGCTGATCGTACTT
ACATGACGGCATGCAT TTTTTTATCGACTAGCATGT T
(abasic = X)
Proposed Scheme 1 (G, T, A, C is randomly replaced by analogue I) (SEQ ID NO 14):
5'-TT TTTTTTTTTTTTTTTTTTXXXXXTGTACTGCCGTACGTAAAAAAATAGCTGATCGTACTT
AIAT IACG I CATGIAT T I T I A I GACTAGCATGT T
(abasic = X)
The base analogue could be generic (replace T, G, A, or C), or it could be specific to one base
(e.g. deoxyuridine (U) just replaces T).
Proposed Scheme 2 (T is randomly replaced by analogue U) (SEQ ID NO: 15):
5'-TT TTTTTTTTTTTTTTTTTTXXXXXTGTACTGCCGTACGTAAAAAAATAGCTGATCGTACTT
ACAUGACGGCATGCAUT TTUT TATCGACTAGCATGT T
(abasic = X)
In both scheme one and two, the homopolymer stretch has been reduced to allow
individual nucleotides or groups of nucleotides to be estimated or determined. The sense strand
will be a natural DNA strand, while the anti-sense will contain a mixture of natural bases and
base analogues. The combination of data from the sense and the antisense reads can be used to
estimate the length of the homopolymer run in the original DNA section.
Double Stranded Target Polynucleotide
The method of the invention is for sequencing a double stranded polynucleotide. A
polynucleotide, such as a nucleic acid, is a macromolecule comprising two or more nucleotides.
The polynucleotide or nucleic acid may comprise any combination of any nucleotides. The
nucleotides can be naturally occurring or artificial. The nucleotide can be oxidized or
methylated. A nucleotide typically contains a nucleobase, a sugar and at least one phosphate
group. The nucleobase is typically heterocyclic. Nucleobases include, but are not limited to,
purines and pyrimidines and more specifically adenine, guanine, thymine, uracil and cytosine.
The sugar is typically a pentose sugar. Nucleotide sugars include, but are not limited to, ribose
and deoxyribose. The nucleotide is typically a ribonucleotide or deoxyribonucleotide. The
nucleotide typically contains a monophosphate, diphosphate or triphosphate. Phosphates may be
attached on the 5' or 3' side of a nucleotide.
Nucleotides include, but are not limited to, adenosine monophosphate (AMP), adenosine
diphosphate (ADP), adenosine triphosphate (ATP), guanosine monophosphate (GMP),
guanosine diphosphate (GDP), guanosine triphosphate (GTP), thymidine monophosphate (TMP),
thymidine diphosphate (TDP), thymidine triphosphate (TTP), uridine monophosphate (UMP),
uridine diphosphate (UDP), uridine triphosphate (UTP), cytidine monophosphate (CMP),
cytidine diphosphate (CDP), cytidine triphosphate (CTP), 5-methylcytidine monophosphate, 5-
methylcytidine diphosphate, 5-methylcytidine triphosphate, 5-hydroxymethylcytidine
monophosphate, 5-hydroxymethylcytidine diphosphate, 5-hydroxymethylcytidine triphosphate,
cyclic adenosine monophosphate (cAMP), cyclic guanosine monophosphate (cGMP),
deoxyadenosine monophosphate (dAMP), deoxyadenosine diphosphate (dADP),
deoxyadenosine triphosphate (dATP), deoxyguanosine monophosphate (dGMP),
deoxyguanosine diphosphate (dGDP), deoxyguanosine triphosphate (dGTP), deoxythymidine
monophosphate (dTMP), deoxythymidine diphosphate (dTDP), deoxythymidine triphosphate
(dTTP), deoxyuridine monophosphate (dUMP), deoxyuridine diphosphate (dUDP), deoxyuridine
triphosphate (dUTP), deoxycytidine monophosphate (dCMP), deoxycytidine diphosphate
(dCDP) and deoxycytidine triphosphate (dCTP), 5-methyl-2'-deoxycytidine monophosphate, 5-
methyl-2' -deoxycytidine diphosphate, 5-methyl-2' -deoxycytidine triphosphate, 5-
hydroxymethyl-2 '-deoxycytidine monophosphate, 5-hydroxymethyl-2'-deoxycytidine
diphosphate and 5-hydroxymethyl-2 '-deoxycytidine triphosphate. The nucleotides are
preferably selected from AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP or dCMP.
A nucleotide may contain a sugar and at least one phosphate group (i.e. lack a
nucleobase).
The polynucleotide can be a nucleic acid, such as deoxyribonucleic acid (DNA) or
ribonucleic acid (RNA). The target polynucleotide can comprise one strand of RNA hybridized
to one strand of DNA. The polynucleotide may be any synthetic nucleic acid known in the art,
such as peptide nucleic acid (PNA), glycerol nucleic acid (GNA), threose nucleic acid (TNA),
locked nucleic acid (LNA) or other synthetic polymers with nucleotide side chains.
The target polynucleotide can be any length. For example, the polynucleotide can be at
least 10, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400
or at least 500 nucleotide pairs in length. The polynucleotide can be 1000 or more nucleotide
pairs, 5000 or more nucleotide pairs in length or 100000 or more nucleotide pairs in length.
The target polynucleotide is present in any suitable sample. The invention is typically
carried out on a sample that is known to contain or suspected to contain the target
polynucleotide. Alternatively, the invention may be carried out on a sample to confirm the
identity of one or more target polynucleotides whose presence in the sample is known or
expected.
The sample may be a biological sample. The invention may be carried out in vitro on a
sample obtained from or extracted from any organism or microorganism. The organism or
microorganism is typically archean, prokaryotic or eukaryotic and typically belongs to one the
five kingdoms: plantae, animalia, fungi, monera and protista. The invention may be carried out
in vitro on a sample obtained from or extracted from any virus. The sample is preferably a fluid
sample. The sample typically comprises a body fluid of the patient. The sample may be urine,
lymph, saliva, mucus or amniotic fluid but is preferably blood, plasma or serum. Typically, the
sample is human in origin, but alternatively it may be from another mammal animal such as from
commercially farmed animals such as horses, cattle, sheep or pigs or may alternatively be pets
such as cats or dogs. Alternatively a sample of plant origin is typically obtained from a
commercial crop, such as a cereal, legume, fruit or vegetable, for example wheat, barley, oats,
canola, maize, soya, rice, bananas, apples, tomatoes, potatoes, grapes, tobacco, beans, lentils,
sugar cane, cocoa, cotton, tea, coffee.
The sample may be a non-biological sample. The non-biological sample is preferably a
fluid sample. Examples of a non-biological sample include surgical fluids, water such as
drinking water, sea water or river water, and reagents for laboratory tests.
The sample is typically processed prior to being assayed, for example by centrifugation
or by passage through a membrane that filters out unwanted molecules or cells, such as red blood
cells. The sample may be measured immediately upon being taken. The sample may also be
typically stored prior to assay, preferably below -70°C.
If the target polynucleotide is coupled to the membrane as discussed in more detail
below, the method of the invention is particularly advantageous for human DNA sequencing
because only small amounts of purified DNA can be obtained from human blood. The method
preferably allows sequencing of a target polynucleotide that is present at a concentration of from
about O.lpM to about InM, such as less than lpM, less than IOrM or less than IOOrM.
Construct
The method of the invention involves providing a construct comprising the double
stranded target nucleotide to be sequenced. The construct typically allows both strands of the
target polynucleotide to be sequenced by a transmembrane pore.
The construct comprises a bridging moiety which is capable of linking the two strands of
the target polynucleotide. The bridging moiety typically covalently links the two strands of the
target polynucleotide. The bridging moiety can be anything that is capable of linking the two
strands of the target polynucleotide, provided that the bridging moiety does not interfere with
movement of the single stranded polynucleotide through the transmembrane pore. Suitable
bridging moieties include, but are not limited to a polymeric linker, a chemical linker, a
polynucleotide or a polypeptide. Preferably, the bridging moiety comprises DNA, RNA,
modified DNA (such as abasic DNA), RNA, PNA, LNA or PEG. The bridging moiety is more
preferably DNA or RNA.
The bridging moiety is most preferably a hairpin loop. The hairpin loop is typically 4 to
100 nucleotides in length, preferably 4 to 8 nucleotides in length.
The bridging moiety is linked to the target polynucleotide construct by any suitable
means known in the art. The bridging moiety may be synthesized separately and chemically
attached or enzymatically ligated to the target polynucleotide. Alternatively, the bridging moiety
may be generated in the processing of the target polynucleotide.
The bridging moiety is linked to the target polynucleotide at or near one end of the target
polynucleotide. The bridging moiety is preferably linked to the target polynucleotide within 10
nucleotides of the end of the target polynucleotide.
The construct comprising the target polynucleotide also preferably comprises at least one
polymer at the opposite end of the target polynucleotide to the bridging moiety. Such polymer(s)
aid the sequencing method of the invention as discussed in more detail below. Suitable polymers
include polynucleotides (DNA/RNA), modified polynucleotides such as modified DNA, PNA,
LNA, PEG or polypeptides.
The construct preferably comprises a leader polymer. The leader polymer is linked to the
target polynucleotide at the opposite end to the bridging moiety. The leader polymer helps the
double stranded target polynucleotide to engage with the transmembrane pore or with a
polynucleotide binding protein, such as Phi29 DNA polymerase, that helps to separate the two
strands and/or controls the movement of the single stranded polynucleotide through the pore.
Transmembrane pores and polynucleotide binding proteins are discussed in more detail below.
The leader polymer can be a polynucleotide such as DNA or RNA, a modified
polynucleotide (such as abasic DNA), PNA, LNA, PEG or a polypeptide. The leader polymer is
preferably a polynucleotide and is more preferably a single stranded polynucleotide. The leader
polymer can be any of the polynucleotides discussed above. The single stranded leader polymer
is most preferably a single strand of DNA. The leader polymer can be any length, but is
typically 27 to 1 0 nucleotides in length, such as from 50 to 150 nucleotides in length.
The addition of sections of single stranded polynucleotide to a double stranded
polynucleotide can be performed in various ways. A chemical or enzymatic ligation can be
done. In addition, the Nextera method by Epicentre is suitable. The inventors have developed a
PCR method using a sense primer that, as usual contains a complementary section to the start of
the target region of genomic DNA, but was additionally preceded with a 50 polyT section. To
prevent the polymerase from extending the complementary strand opposite the polyT section and
thereby create a blunt ended PCR product (as is normal), four abasic sites were added between
the polyT section and the complimentary priming section. These abasic sites will prevent the
polymerase from extending beyond this region and so the polyT section will remain as 5' single
stranded DNA on each of the amplified copies. Other possible modifications which could also
stop polymerase extension include RNA, PNA or morpholino bases, iso-dC or iso-dG.
The construct preferably further comprises a polymer tail (also linked to the target
polynucleotide at the opposite end to the bridging moiety). The polymer tail aids sequencing of
the target construct by the transmembrane pore. In particular, the polymer tail typically ensures
that the entirety of the double stranded polynucleotide (i.e. all of both strands) can be read and
sequenced by the transmembrane pore. As discussed below, polynucleotide binding proteins,
such as Phi29 DNA polymerase, can control the movement of the single stranded
polynucleotides through the transmembrane pore. The protein typically slows the movement of
the polynucleotide through the pore. For instance, Phi29 DNA polymerase acts like a brake
slowing the movement of the polynucleotide through the pore along the potential applied across
the membrane. Once the polynucleotide is no longer in contact with the binding protein, it is free
to move through the pore at such a rate that sequence information is difficult to obtain. Since
there is normally a short distance from the protein to the pore, typically approximately 20
nucleotides some sequence information (approximately equal to that distance) may be missed. A
tail polymer "extends" the length of the single stranded polynucleotide such that its movement
may be controlled by the nucleic acid binding protein while all of both strands of the target
polynucleotide pass through the pore and are sequenced. Such embodiments ensure that
sequence information can be obtained from the entirety of both strands in the target
polynucleotide. The tail polymer may also provide a site for a primer to bind, which allows the
nucleic acid binding protein to separate the two strands of the target polynucleotide.
The tail polymer can be a polynucleotide such as DNA or RNA, a modified
polynucleotide (such as abasic DNA), PNA, LNA, PEG or a polypeptide. The tail polymer is
preferably a polynucleotide and is more preferably a single stranded polynucleotide. The tail
polymer can be any of the polynucleotides discussed above.
The construct preferably also comprises one or more markers, which result in a
distinctive current (characteristic signature current) when passed through the transmembrane
pore. The markers are typically used to allow the position of the single stranded polynucleotide
in relation to the pore to be estimated or determined. For instance, the signal from a marker
positioned between both strands of the target polynucleotide indicates that one strand has been
sequenced and the other is about to enter the pore. Hence, such markers can be used to
differentiate between the sense and anti-sense strands of target DNA. The marker(s) may also be
used to identify the source of the target polynucleotide. Suitable markers include, but are not
limited to abasic regions, specific sequences of nucleotides, unnatural nucleotides, fluorophores
or cholesterol. The markers are preferably an abasic region or a specific sequence of
nucleotides.
The marker(s) may be positioned anywhere in the construct. The marker(s) can be
positioned in the bridging moiety. The marker(s) can also be positioned near the bridging
moiety. Near the bridging moiety preferably refers to within 10 to 100 nucleotides of the
bridging moiety.
The markers can also be positioned within the leader polymer or the tail polymer.
The construct may be coupled to the membrane using any known method. If the
membrane is an amphiphilic layer, such as a lipid bilayer (as discussed in detail below), the
construct is preferably coupled to the membrane via a polypeptide present in the membrane or a
hydrophobic anchor present in the membrane. The hydrophobic anchor is preferably a lipid,
fatty acid, sterol, carbon nanotube or amino acid.
The construct may be coupled directly to the membrane. The construct is preferably
coupled to the membrane via a linker. Preferred linkers include, but are not limited to, polymers,
such as polynucleotides, polyethylene glycols (PEGs) and polypeptides. If a polynucleotide is
coupled directly to the membrane, then some sequence data will be lost as the sequencing run
cannot continue to the end of the polynucleotide due to the distance between the membrane and
the detector. If a linker is used, then the polynucleotide can be processed to completion. If a
linker is used, the linker may be attached to the construct at any position. The linker is
preferably attached to the polynucleotide at the tail polymer.
The coupling may be stable or transient. For certain applications, the transient nature of
the coupling is preferred. If a stable coupling molecule were attached directly to either the 5' or
3' end of a polynucleotide, then some sequence data will be lost as the sequencing run cannot
continue to the end of the polynucleotide due to the distance between the bilayer and the
enzymes active site. If the coupling is transient, then when the coupled end randomly becomes
free of the bilayer, then the polynucleotide can be processed to completion. Chemical groups
that form stable or transient links with the membrane are discussed in more detail below. The
construct may be transiently coupled to an amphiphilic layer or lipid bilayer using cholesterol or
a fatty acyl chain. Any fatty acyl chain having a length of from 6 to 30 carbon atoms, such as
hexadecanoic acid, may be used.
In preferred embodiments, construct is coupled to an amphiphilic layer such as a lipid
bilayer . Coupling of polynucleotides to synthetic lipid bilayers has been carried out previously
with various different tethering strategies. These are summarised in Table 1 below.
Table 1
Polynucleotides may be functionalized using a modified phosphoramidite in the synthesis
reaction, which is easily compatible for the addition of reactive groups, such as thiol, cholesterol,
lipid and biotin groups. These different attachment chemistries give a suite of attachment
options for polynucleotides. Each different modification group tethers the polynucleotide in a
slightly different way and coupling is not always permanent so giving different dwell times for
the polynucleotide to the bilayer. The advantages of transient coupling are discussed above.
Coupling of polynucleotides can also be achieved by a number of other means provided
that a reactive group can be added to the polynucleotide. The addition of reactive groups to
either end of DNA has been reported previously. A thiol group can be added to the 5' of ssDNA
using polynucleotide kinase and ATPyS (Grant, G. P. and P. Z. Qin (2007). "A facile method for
attaching nitroxide spin labels at the 5' terminus of nucleic acids." Nucleic Acids Res 35(10):
e77). A more diverse selection of chemical groups, such as biotin, thiols and fluorophores, can
be added using terminal transferase to incorporate modified oligonucleotides to the 3' of ssDNA
(Kumar, A., P. Tchen, et al. (1988). "Nonradioactive labeling of synthetic oligonucleotide probes
with terminal deoxynucleotidyl transferase." Anal Biochem 169 (2) : 376-82).
Alternatively, the reactive group could be considered to be the addition of a short piece of
DNA complementary to one already coupled to the bilayer, so that attachment can be achieved
via hybridisation. Ligation of short pieces of ssDNA have been reported using T4 RNA ligase I
(Troutt, A. B., M. G. McHeyzer-Williams, et al. (1992). "Ligation-anchored PCR: a simple
amplification technique with single-sided specificity." Proc Natl Acad Sci U S A 89(20): 9823-
5). Alternatively either ssDNA or dsDNA could be ligated to native dsDNA and then the two
strands separated by thermal or chemical denaturation. To native dsDNA, it is possible to add
either a piece of ssDNA to one or both of the ends of the duplex, or dsDNA to one or both ends.
Then, when the duplex is melted, each single strand will have either a ' or 3' modification if
ssDNA was used for ligation or a modification at the ' end, the 3' end or both if dsDNA was
used for ligation. If the polynucleotide is a synthetic strand, the coupling chemistry can be
incorporated during the chemical synthesis of the polynucleotide. For instance, the
polynucleotide can be synthesized using a primer with a reactive group attached to it.
A common technique for the amplification of sections of genomic DNA is using
polymerase chain reaction (PCR). Here, using two synthetic oligonucleotide primers, a number
of copies of the same section of DNA can be generated, where for each copy the 5' of each
strand in the duplex will be a synthetic polynucleotide. By using an antisense primer that has a
reactive group, such as a cholesterol, thiol, biotin or lipid, each copy of the target DNA amplified
will contain a reactive group for coupling.
Separating
The two strands of the target polynucleotide are separated using a polynucleotide binding
protein.
The polynucleotide binding protein is preferably derived from a polynucleotide handling
enzyme. However, the enzyme may be used under conditions in which is does not catalyze a
reaction. For instance, a protein derived from Phi29 DNA polymerase may be run in an
unzipping mode as discussed in more detail below.
A polynucleotide handling enzyme is a polypeptide that is capable of interacting with and
modifying at least one property of a polynucleotide. The enzyme may modify the polynucleotide
by cleaving it to form individual nucleotides or shorter chains of nucleotides, such as di- or
trinucleotides. The enzyme may modify the polynucleotide by orienting it or moving it to a
specific position. The polynucleotide handling enzyme does not need to display enzymatic
activity as long as it is capable of binding the target polynucleotide and preferably controlling its
movement through the pore. For instance, the enzyme may be modified to remove its enzymatic
activity or may be used under conditions which prevent it from acting as an enzyme. Such
conditions are discussed in more detail below.
The polynucleotide binding protein is typically derived from the Picovirinae virus
family. Suitable viruses include, but are not limited to, AHJD-like viruses and Phi29 like
viruses. The polynucleotide binding protein is preferably derived from Phi29 DNA polymerase
or a helicase.
A protein derived from Phi29 DNA polymerase comprises the sequence shown in SEQ
ID NO: 6 or a variant thereof. Wild-type Phi29 DNA polymerase has polymerase and
exonuclease activity. It may also unzip double stranded polynucleotides under the correct
conditions. Hence, the enzyme may work in three modes. This is discussed in more detail
below. A variant of SEQ ID NOs: 6 is an enzyme that has an amino acid sequence which varies
from that of SEQ ID NO: 6 and which retains polynucleotide binding activity. The variant must
work in at least one of the three modes discussed below. Preferably, the variant works in all
three modes. The variant may include modifications that facilitate handling of the
polynucleotide and/or facilitate its activity at high salt concentrations and/or room temperature.
The variant may include Fidelity Systems' TOPO modification, which improves enzyme salt
tolerance.
Over the entire length of the amino acid sequence of SEQ ID NO: 6, a variant will
preferably be at least 40% homologous to that sequence based on amino acid identity. More
preferably, the variant polypeptide may be at least 50%, at least 55%, at least 60%, at least 65%,
at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least
95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of SEQ
ID NO: 6 over the entire sequence. There may be at least 80%, for example at least 85%, 90% or
95%, amino acid identity over a stretch of 200 or more, for example 230, 250, 270 or 280 or
more, contiguous amino acids ("hard homology"). Homology is determined as described below.
The variant may differ from the wild-type sequence in any of the ways discussed below with
reference to SEQ ID NO: 2. The enzyme may be covalently attached to the pore as discussed
below.
The method is preferably carried out using the protein derived from Phi29 DNA
polymerase in unzipping mode. In this embodiment, steps (b), (c) and (d) are carried out in the
absence of free nucleotides and the absence of an enzyme cofactor such that the polymerase
controls the movement of the single stranded polynucleotide through the pore with the field
resulting from the applied voltage (as it is unzipped). In this embodiment, the polymerase acts
like a brake preventing the single stranded polynucleotide from moving through the pore too
quickly under the influence of the applied voltage. The method preferably further comprises (e)
lowering the voltage applied across the pore such that the single stranded polynucleotide moves
through the pore in the opposite direction to that in steps (c) and (d) (i.e. as it re-anneals) and a
proportion of the nucleotides in the polynucleotide interacts with the pore and (f) measuring the
current passing through the pore during each interaction and thereby proof reading the sequence
of the target polynucleotide obtained in step (d), wherein steps (e) and (f) are also carried out
with a voltage applied across the pore.
The two strands of the target polynucleotide can be separated and duplicated at any stage
before sequencing is carried out and as many times as necessary. For example, after separating
the two strands of a first target polynucleotide construct as described above, a complementary
strand to the resulting single stranded polynucleotide can be generated to form another double
stranded polynucleotide. The two strands of this double stranded polynucleotide can then be
linked using a bridging moiety to form a second construct. This may be referred to herein as the
"DUO" method. This construct may then be used in the invention. In such an embodiment, one
strand of the double stranded polynucleotide in the resulting construct contains both strands of
the original target double stranded polynucleotide (in the first construct) linked by a bridging
moiety. The sequence of the original target double stranded polynucleotide or the complement
strand can be estimated or determined. This process of replication can be repeated as many
times as necessary and provides additional proof reading as the target polynucleotide is in effect
being read multiple times.
Membrane
Any membrane may be used in accordance with the invention. Suitable membranes are
well-known in the art. The membrane is preferably an amphiphilic layer. An amphiphilic layer
is a layer formed from amphiphilic molecules, such as phospholipids, which have both
hydrophihc and lipophilic properties. The amphiphilic molecules may be synthetic or naturally
occurring. Non-naturally occurring amphiphiles and amphiphiles which form a monolayer are
known in the art and include, for example, block copolymers (Gonzalez-Perez et al., Langmuir,
2009, 25, 10447-10450). Block copolymers are polymeric materials in which two or more
monomer sub-units that are polymerized together to create a single polymer chain. Block
copolymers typically have properties that are contributed by each monomer sub-unit. However,
a block copolymer may have unique properties that polymers formed from the individual subunits
do not possess. Block copolymers can be engineered such that one of the monomer subunits
is hydrophobic (i.e. lipophilic), whilst the other sub-unit(s) are hydrophihc whilst in
aqueous media. In this case, the block copolymer may possess amphiphilic properties and may
form a structure that mimics a biological membrane. The block copolymer may be a diblock
(consisting of two monomer sub-units), but may also be constructed from more than two
monomer sub-units to form more complex arrangements that behave as amphiphiles. The
copolymer may be a triblock, tetrablock or pentablock copolymer.
Archaebacterial bipolar tetraether lipids are naturally occurring lipids that are constructed
such that the lipid forms a monolayer membrane. These lipids are generally found in
extremophiles that survive in harsh biological environments, thermophiles, halophiles and
acidophiles. Their stability is believed to derive from the fused nature of the final bilayer. It is
straightforward to construct block copolymer materials that mimic these biological entities by
creating a triblock polymer that has the general motif hydrophilic-hydrophobic-hydrophilic.
This material may form monomelic membranes that behave similarly to lipid bilayers and
encompasse a range of phase behaviours from vesicles through to laminar membranes.
Membranes formed from these triblock copolymers hold several advantages over biological lipid
membranes. Because the triblock copolymer is synthesized, the exact construction can be
carefully controlled to provide the correct chain lengths and properties required to form
membranes and to interact with pores and other proteins.
Block copolymers may also be constructed from sub-units that are not classed as lipid
sub-materials; for example a hydrophobic polymer may be made from siloxane or other nonhydrocarbon
based monomers. The hydrophihc sub-section of block copolymer can also possess
low protein binding properties, which allows the creation of a membrane that is highly resistant
when exposed to raw biological samples. This head group unit may also be derived from nonclassical
lipid head-groups.
Triblock copolymer membranes also have increased mechanical and environmental
stability compared with biological lipid membranes, for example a much higher operational
temperature or pH range. The synthetic nature of the block copolymers provides a platform to
customize polymer based membranes for a wide range of applications.
The amphiphilic molecules may be chemically-modified or functionalised to facilitate
coupling of the analyte.
The amphiphilic layer may be a monolayer or a bilayer. The amphiphilic layer is
typically planar. The amphiphilic layer may be non-planar such as curved.
The amphiphilic layer is typically a lipid bilayer. Lipid bilayers are models of cell
membranes and serve as excellent platforms for a range of experimental studies. For example,
lipid bilayers can be used for in vitro investigation of membrane proteins by single-channel
recording. Alternatively, lipid bilayers can be used as biosensors to detect the presence of a
range of substances. The lipid bilayer may be any lipid bilayer. Suitable lipid bilayers include,
but are not limited to, a planar lipid bilayer, a supported bilayer or a liposome. The lipid bilayer
is preferably a planar lipid bilayer. Suitable lipid bilayers are disclosed in International
ApphcationNo. PCT/GB08/000563 (published as WO 2008/102121), International Application
No. PCT/GB08/004127 (published as WO 2009/077734) and International ApphcationNo.
PCT/GB2006/001057 (published as WO 2006/100484).
Methods for forming lipid bilayers are known in the art. Suitable methods are disclosed
in the Example. Lipid bilayers are commonly formed by the method of Montal and Mueller
(Proc. Natl. Acad. Sci. USA., 1972; 69: 3561-3566), in which a lipid monolayer is carried on
aqueous solution/air interface past either side of an aperture which is perpendicular to that
interface.
The method of Montal & Mueller is popular because it is a cost-effective and relatively
straightforward method of forming good quality lipid bilayers that are suitable for protein pore
insertion. Other common methods of bilayer formation include tip-dipping, painting bilayers and
patch-clamping of liposome bilayers.
In a preferred embodiment, the lipid bilayer is formed as described in International
ApphcationNo. PCT/GB08/004127 (published asWO 2009/077734). Advantageously in this
method, the lipid bilayer is formed from dried lipids. In a most preferred embodiment, the lipid
bilayer is formed across an opening as described in WO2009/077734 (PCT/GB08/004127).
In another preferred embodiment, the membrane is a solid state layer. A solid-state layer
is not of biological origin. In other words, a solid state layer is not derived from or isolated from
a biological environment such as an organism or cell, or a synthetically manufactured version of
a biologically available structure. Solid state layers can be formed from both organic and
inorganic materials including, but not limited to, microelectronic materials, insulating materials
such as Si3N4, A1203, and SiO, organic and inorganic polymers such as polyamide, plastics
such as Teflon® or elastomers such as two-component addition-cure silicone rubber, and glasses.
The solid state layer may be formed from graphene. Suitable graphene layers are disclosed in
International Application No. PCT/US2008/0 10637 (published as WO 2009/035647).
Transmembrane Pore
A transmembrane pore is a structure that permits hydrated ions driven by an applied
potential to flow from one side of the membrane to the other side of the membrane.
The transmembrane pore is preferably a transmembrane protein pore. A transmembrane
protein pore is a polypeptide or a collection of polypeptides that permits hydrated ions, such as
analyte, to flow from one side of a membrane to the other side of the membrane. In the present
invention, the transmembrane protein pore is capable of forming a pore that permits hydrated
ions driven by an applied potential to flow from one side of the membrane to the other. The
transmembrane protein pore preferably permits analyte such as nucleotides to flow from one side
of the membrane, such as a lipid bilayer, to the other. The transmembrane protein pore allows a
polynucleotide, such as DNA or RNA, to be moved through the pore.
The transmembrane protein pore may be a monomer or an oligomer. The pore is
preferably made up of several repeating subunits, such as 6, 7 or 8 subunits. The pore is more
preferably a heptameric or octameric pore.
The transmembrane protein pore typically comprises a barrel or channel through which
the ions may flow. The subunits of the pore typically surround a central axis and contribute
strands to a transmembrane b barrel or channel or a transmembrane a-helix bundle or channel.
The barrel or channel of the transmembrane protein pore typically comprises amino acids
that facilitate interaction with analyte, such as nucleotides, polynucleotides or nucleic acids.
These amino acids are preferably located near a constriction of the barrel or channel. The
transmembrane protein pore typically comprises one or more positively charged amino acids,
such as arginine, lysine or histidine, or aromatic amino acids, such as tyrosine or tryptophan.
These amino acids typically facilitate the interaction between the pore and nucleotides,
polynucleotides or nucleic acids.
Transmembrane protein pores for use in accordance with the invention can be derived
from b-barrel pores or a-helix bundle pores b-barrel pores comprise a barrel or channel that is
formed from b-strands. Suitable b-barrel pores include, but are not limited to, b-toxins, such as
a-hemolysin, anthrax toxin and leukocidins, and outer membrane proteins/porins of bacteria,
such as Mycobacterium smegmatis porin (Msp), for example MspA, outer membrane porin F
(OmpF), outer membrane porin G (OmpG), outer membrane phospholipase A and Neisseria
autotransporter lipoprotein (NalP). a-helix bundle pores comprise a barrel or channel that is
formed from a-helices. Suitable a-helix bundle pores include, but are not limited to, inner
membrane proteins and a outer membrane proteins, such as WZA and ClyA toxin. The
transmembrane pore may be derived from Msp or from a-hemolysin (a-HL).
The transmembrane protein pore is preferably derived from Msp, preferably from MspA.
Such a pore will be oligomeric and typically comprises 7, 8, 9 or 10 monomers derived from
Msp. The pore may be a homo-oligomeric pore derived from Msp comprising identical
monomers. Alternatively, the pore may be a hetero-oligomeric pore derived from Msp
comprising at least one monomer that differs from the others. The pore may also comprise one
or more constructs which comprise two or more covalently attached monomers derived from
Msp. Suitable pores are disclosed in US Provisional Application No. 61/441,718 (filed 11
February 201 1). Preferably the pore is derived from MspA or a homolog or paralog thereof.
A monomer derived from Msp comprises the sequence shown in SEQ ID NO: 2 or a
variant thereof. SEQ ID NO: 2 is the NNN-RRK mutant of the MspA monomer. It includes the
following mutations: D90N, D9IN, D93N, D118R, D134R and E139K. A variant of SEQ ID
NO: 2 is a polypeptide that has an amino acid sequence which varies from that of SEQ ID NO: 2
and which retains its ability to form a pore. The ability of a variant to form a pore can be
assayed using any method known in the art. For instance, the variant may be inserted into a lipid
bilayer along with other appropriate subunits and its ability to oligomerise to form a pore may be
determined. Methods are known in the art for inserting subunits into membranes, such as lipid
bilayers. For example, subunits may be suspended in a purified form in a solution containing a
lipid bilayer such that it diffuses to the lipid bilayer and is inserted by binding to the lipid bilayer
and assembling into a functional state. Alternatively, subunits may be directly inserted into the
membrane using the "pick and place" method described in M.A. Holden, H. Bayley. J . Am.
Chem. Soc. 2005, 127, 6502-6503 and International Application No. PCT/GB2006/001057
(published as WO 2006/100484).
Preferred variants are disclosed in International Application No. PCT/GB2012/050301
(claiming priority from US Provisional Application No. 61/441,718). Particularly preferred
variants include, but are not limited to, those comprising the following substitution(s): L88N;
L88S; L88Q; L88T; D90S; D90Q; D90Y; I105L; I105S; Q126R; G75S; G77S; G75S, G77S,
L88N and Q126R; G75S, G77S, L88N, D90Q and Q126R; D90Q and Q126R; L88N, D90Q and
Q126R; L88S and D90Q; L88N and D90Q; E59R; G75Q; G75N; G75S; G75T; G77Q; G77N;
G77S; G77T; I78L; S81N; T83N; N86S; N86T; I87F; I87V; I87L; L88N; L88S; L88Y; L88F;
L88V; L88Q; L88T; I89F; I89V; I89L; N90S; N90Q; N90L; N90Y; N91S; N91Q; N91L;
N91M; N91I; N91A; N91V; N91G; G92A; G92S; N93S; N93A; N93T; I94L; T95V; A96R;
A96D; A96V; A96N; A96S; A96T; P97S; P98S; F99S; G100S; L101F; N102 ; N102S; N102T;
S103A; S103Q; S103N; S103G; S103T; V104I; I105Y; I105L; I105A; I105G; I105Q; I105N;
I105S; I105T; T106F; T106I; T106V; T106S; N108P; N108S; D90Q and I105A; D90S and
G92S; L88T and D90S; I87Q and D90S; I89Y and D90S; L88N and I89F; L88N and I89Y;
D90S and G92A; D90S and I94N; D90S and V104I; L88D and I105K; L88N and Q126R; L88N,
D90Q and D91R; L88N, D90Q and D91S; L88N, D90Q and I105V; D90Q, D93S and I105A;
N91Y; N90Y and N91G; N90G andN91Y; N90G and N91G; 105G; N90R; N91R; N90R and
N91R; N90K; N91 ; N90K and N91K; N90Q andN91G; N90G andN91Q; N90Q andN91Q;
Rl 18N; N91C; N90C; N90W; N91W; N90K; N91K; N90R; N91R; N90S and N91S; N90Y and
I105A; N90G and I105A; N90Q and I105A; N90S and I105A; L88A and I105A; L88S and
I105S; L88N and I105N; N90G andN93G; N90G; N93G; N90G andN91A; I105K; I105R;
I105V; I105P; I105W; L88R; L88A; L88G; L88N; N90R and I105A; N90S and I105A; L88A
and I105A; L88S and I105S; L88N and I105N; L88C; S103C; I105C; D134R.
In addition to the specific mutations discussed above, the variant may include other
mutations. Over the entire length of the amino acid sequence of SEQ ID NO: 2, a variant will
preferably be at least 50% homologous to that sequence based on amino acid identity. More
preferably, the variant may be at least 55%, at least 60%, at least 65%, at least 70%, at least 75%,
at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99%
homologous based on amino acid identity to the amino acid sequence of SEQ ID NO: 2 over the
entire sequence. There may be at least 80%>, for example at least 85%, 90%> or 95%, amino acid
identity over a stretch of 100 or more, for example 125, 150, 175 or 200 or more, contiguous
amino acids ("hard homology").
Standard methods in the art may be used to determine homology. For example the
UWGCG Package provides the BESTFIT program which can be used to calculate homology, for
example used on its default settings (Devereux et al (1984) Nucleic Acids Research 12, p387-
395). The PILEUP and BLAST algorithms can be used to calculate homology or line up
sequences (such as identifying equivalent residues or corresponding sequences (typically on their
default settings)), for example as described in Altschul S. F. (1993) J Mol Evol 36:290-300;
Altschul, S.F et al (1990) J Mol Biol 215:403-10. Software for performing BLAST analyses is
publicly available through the National Center for Biotechnology Information
(http://www.ncbi.nlm.nih.gov/).
SEQ ID NO: 2 is the NNN-RRK mutant of the MspA monomer. The variant may
comprise any of the mutations in the MspB, C or D monomers compared with MspA. The
mature forms of MspB, C and D are shown in SEQ ID NOs: 7 to 9. In particular, the variant
may comprise the following substitution present in MspB: A138P. The variant may comprise
one or more of the following substitutions present in MspC: A96G, N102E and A138P. The
variant may comprise one or more of the following mutations present in MspD: Deletion of Gl,
L2V, E5Q, L8V, D13G, W21A, D22E, K47T, I49H, I68V, D91G, A96Q, N102D, S103T,
V104I, S136K and G141A. The variant may comprise combinations of one or more of the
mutations and substitutions from Msp B, C and D.
Amino acid substitutions may be made to the amino acid sequence of SEQ ID NO: 2 in
addition to those discussed above, for example up to 1, 2, 3, 4, 5, 10, 20 or 30 substitutions.
Conservative substitutions replace amino acids with other amino acids of similar chemical
structure, similar chemical properties or similar side-chain volume. The amino acids introduced
may have similar polarity, hydrophilicity, hydrophobicity, basicity, acidity, neutrality or charge
to the amino acids they replace. Alternatively, the conservative substitution may introduce
another amino acid that is aromatic or aliphatic in the place of a pre-existing aromatic or
aliphatic amino acid. Conservative amino acid changes are well-known in the art and may be
selected in accordance with the properties of the 20 main amino acids as defined in Table 2
below. Where amino acids have similar polarity, this can also be determined by reference to the
hydropathy scale for amino acid side chains in Table 3.
Table 2 - Chemical properties of amino acids
Table 3- Hydropathy scale
Side Chain Hydropathy
e
Val
Leu
P e 2.8
Cys 2.5
Met 1.9
Ala 1.8
Gly -0.4
Thr -0.7
Ser -0.8
Trp -0.9
Tyr -1.3
Pro -1.6
His -3.2
Glu -3.5
Gin -3.5
Asp -3.5
Asn -3.5
Lys -3.9
Arg -4.5
One or more amino acid residues of the amino acid sequence of SEQ ID NO: 2 may
additionally be deleted from the polypeptides described above. Up to 1, 2, 3, 4, 5, 10, 20 or 30
residues may be deleted, or more.
Variants may include fragments of SEQ ID NO: 2. Such fragments retain pore forming
activity. Fragments may be at least 50, 100, 150 or 200 amino acids in length. Such fragments
may be used to produce the pores. A fragment preferably comprises the pore forming domain of
SEQ ID NO: 2 . Fragments must include one of residues 88, 90, 91, 105, 118 and 134 of SEQ ID
NO: 2. Typically, fragments include all of residues 88, 90, 91, 105, 118 and 134 of SEQ ID NO:
2.
One or more amino acids may be alternatively or additionally added to the polypeptides
described above. An extension may be provided at the amino terminal or carboxy terminal of the
amino acid sequence of SEQ ID NO: 2 or polypeptide variant or fragment thereof. The
extension may be quite short, for example from 1 to 10 amino acids in length. Alternatively, the
extension may be longer, for example up to 50 or 100 amino acids. A carrier protein may be
fused to an amino acid sequence according to the invention. Other fusion proteins are discussed
in more detail below.
As discussed above, a variant is a polypeptide that has an amino acid sequence which
varies from that of SEQ ID NO: 2 and which retains its ability to form a pore. A variant
typically contains the regions of SEQ ID NO: 2 that are responsible for pore formation. The
pore forming ability of Msp, which contains a b-barrel, is provided by b-sheets in each subunit.
A variant of SEQ ID NO: 2 typically comprises the regions in SEQ ID NO: 2 that form b-sheets.
One or more modifications can be made to the regions of SEQ ID NO: 2 that form b-sheets as
long as the resulting variant retains its ability to form a pore. A variant of SEQ ID NO: 2
preferably includes one or more modifications, such as substitutions, additions or deletions,
within its a-helices and/or loop regions.
The monomers derived from Msp may be modified to assist their identification or
purification, for example by the addition of histidine residues (a hist tag), aspartic acid residues
(an asp tag), a streptavidin tag or a flag tag, or by the addition of a signal sequence to promote
their secretion from a cell where the polypeptide does not naturally contain such a sequence. An
alternative to introducing a genetic tag is to chemically react a tag onto a native or engineered
position on the pore. An example of this would be to react a gel-shift reagent to a cysteine
engineered on the outside of the pore. This has been demonstrated as a method for separating
hemolysin hetero-oligomers (Chem Biol. 1 97 Jul;4(7):497-505).
The monomer derived from Msp may be labelled with a revealing label. The revealing
label may be any suitable label which allows the pore to be detected. Suitable labels include, but
are not limited to, fluorescent molecules, radioisotopes, e.g. 1, 5S, enzymes, antibodies,
antigens, polynucleotides and ligands such as biotin.
The monomer derived from Msp may also be produced using D-amino acids. For
instance, the monomer derived from Msp may comprise a mixture of L-amino acids and Damino
acids. This is conventional in the art for producing such proteins or peptides.
The monomer derived from Msp contains one or more specific modifications to facilitate
nucleotide discrimination. The monomer derived from Msp may also contain other non-specific
modifications as long as they do not interfere with pore formation. A number of non-specific
side chain modifications are known in the art and may be made to the side chains of the
monomer derived from Msp. Such modifications include, for example, reductive alkylation of
amino acids by reaction with an aldehyde followed by reduction with NaBH4, amidination with
methylacetimidate or acylation with acetic anhydride.
The monomer derived from Msp can be produced using standard methods known in the
art. The monomer derived from Msp may be made synthetically or by recombinant means. For
example, the pore may be synthesized by in vitro translation and transcription (IVTT). Suitable
methods for producing pores are discussed in International Application Nos. PCT/GB09/001690
(published as WO 2010/004273), PCT/GB09/001679 (published as WO 2010/004265) or
PCT/GB 10/000 133 (published as WO 2010/086603). Methods for inserting pores into
membranes are discussed.
The transmembrane protein pore is also preferably derived from a-hemolysin (a-HL).
The wild type a-HL pore is formed of seven identical monomers or subunits (i.e. it is
heptameric). The sequence of one monomer or subunit of a-hemolysin-NN is shown in SEQ ID
NO: 4. The transmembrane protein pore preferably comprises seven monomers each comprising
the sequence shown in SEQ ID NO: 4 or a variant thereof. Amino acids 1, 7 to 21, 31 to 34, 45
to 51, 63 to 66, 72, 92 to 97, 104 to 111, 124 to 136, 149 to 153, 160 to 164, 173 to 206, 210 to
213, 217, 218, 223 to 228, 236 to 242, 262 to 265, 272 to 274, 287 to 290 and 294 of SEQ ID
NO: 4 form loop regions. Residues 113 and 147 of SEQ ID NO: 4 form part of a constriction of
the barrel or channel of a-HL.
In such embodiments, a pore comprising seven proteins or monomers each comprising
the sequence shown in SEQ ID NO: 4 or a variant thereof are preferably used in the method of
the invention. The seven proteins may be the same (homoheptamer) or different
(heteroheptamer) .
A variant of SEQ ID NO: 4 is a protein that has an amino acid sequence which varies
from that of SEQ ID NO: 4 and which retains its pore forming ability. The ability of a variant to
form a pore can be assayed using any method known in the art. For instance, the variant may be
inserted into a lipid bilayer along with other appropriate subunits and its ability to oligomerise to
form a pore may be determined. Methods are known in the art for inserting subunits into
membranes, such as lipid bilayers. Suitable methods are discussed above.
The variant may include modifications that facilitate covalent attachment to or interaction
with a nucleic acid binding protein. The variant preferably comprises one or more reactive
cysteine residues that facilitate attachment to the nucleic acid binding protein. For instance, the
variant may include a cysteine at one or more of positions 8, 9, 17, 18, 19, 44, 45, 50, 51, 237,
239 and 287 and/or on the amino or carboxy terminus of SEQ ID NO: 4 . Preferred variants
comprise a substitution of the residue at position 8, 9, 17, 237, 239 and 287 of SEQ ID NO: 4
with cysteine (A8C, T9C, N17C, K237C, S239C or E287C). The variant is preferably any one
of the variants described in International Application No. PCT/GB09/001690 (published as WO
2010/004273), PCT/GB09/001679 (published as WO 2010/004265) or PCT/GB 10/000 133
(published as WO 2010/086603).
The variant may also include modifications that facilitate any interaction with
nucleotides.
The variant may be a naturally occurring variant which is expressed naturally by an
organism, for instance by a Staphylococcus bacterium. Alternatively, the variant may be
expressed in vitro or recombinant ly by a bacterium such as Escherichia coli. Variants also
include non-naturally occurring variants produced by recombinant technology. Over the entire
length of the amino acid sequence of SEQ ID NO: 4, a variant will preferably be at least 50%
homologous to that sequence based on amino acid identity. More preferably, the variant
polypeptide may be at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least
80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous
based on amino acid identity to the amino acid sequence of SEQ ID NO: 4 over the entire
sequence. There maybe at least 80%, for example at least 85%, 90% or 95%, amino acid
identity over a stretch of 200 or more, for example 230, 250, 270 or 280 or more, contiguous
amino acids ("hard homology"). Homology can be determined as discussed above.
Amino acid substitutions may be made to the amino acid sequence of SEQ ID NO: 4 in
addition to those discussed above, for example up to 1, 2, 3, 4, 5, 10, 20 or 30 substitutions.
Conservative substitutions may be made as discussed above.
One or more amino acid residues of the amino acid sequence of SEQ ID NO: 4 may
additionally be deleted from the polypeptides described above. Up to 1, 2, 3, 4, 5, 10, 20 or 30
residues may be deleted, or more.
Variants may include fragments of SEQ ID NO: 4. Such fragments retain pore-forming
activity. Fragments may be at least 50, 100, 200 or 250 amino acids in length. A fragment
preferably comprises the pore-forming domain of SEQ ID NO: 4 . Fragments typically include
residues 119, 121, 135. 113 and 139 of SEQ ID NO: 4.
One or more amino acids may be alternatively or additionally added to the polypeptides
described above. An extension may be provided at the amino terminus or carboxy terminus of
the amino acid sequence of SEQ ID NO: 4 or a variant or fragment thereof. The extension may
be quite short, for example from 1 to 10 amino acids in length. Alternatively, the extension may
be longer, for example up to 50 or 100 amino acids. A carrier protein may be fused to a pore or
variant.
As discussed above, a variant of SEQ ID NO: 4 is a subunit that has an amino acid
sequence which varies from that of SEQ ID NO: 4 and which retains its ability to form a pore. A
variant typically contains the regions of SEQ ID NO: 4 that are responsible for pore formation.
The pore forming ability of a-HL, which contains a b-barrel, is provided by b-strands in each
subunit. A variant of SEQ ID NO: 4 typically comprises the regions in SEQ ID NO: 4 that form
b-strands. The amino acids of SEQ ID NO: 4 that form b-strands are discussed above. One or
more modifications can be made to the regions of SEQ ID NO: 4 that form b-strands as long as
the resulting variant retains its ability to form a pore. Specific modifications that can be made to
the b-strand regions of SEQ ID NO: 4 are discussed above.
A variant of SEQ ID NO: 4 preferably includes one or more modifications, such as
substitutions, additions or deletions, within its a-helices and/or loop regions. Amino acids that
form a-helices and loops are discussed above.
The variant may be modified to assist its identification or purification as discussed above.
Pores derived from a-HL can be made as discussed above with reference to pores derived
fromMsp.
In some embodiments, the transmembrane protein pore is chemically modified. The pore
can be chemically modified in any way and at any site. The transmembrane protein pore is
preferably chemically modified by attachment of a molecule to one or more cysteines (cysteine
linkage), attachment of a molecule to one or more lysines, attachment of a molecule to one or
more non-natural amino acids, enzyme modification of an epitope or modification of a terminus.
Suitable methods for carrying out such modifications are well-known in the art. The
transmembrane protein pore may be chemically modified by the attachment of any molecule.
For instance, the pore may be chemically modified by attachment of a dye or a fluorophore.
Any number of the monomers in the pore may be chemically modified. One or more,
such as 2, 3, 4, 5, 6, 7, 8, 9 or 10, of the monomers is preferably chemically modified as
discussed above.
The reactivity of cysteine residues may be enhanced by modification of the adjacent
residues. For instance, the basic groups of flanking arginine, histidine or lysine residues will
change the pKa of the cysteines thiol group to that of the more reactive S group. The reactivity
of cysteine residues may be protected by thiol protective groups such as dTNB. These may be
reacted with one or more cysteine residues of the pore before a linker is attached.
The molecule (with which the pore is chemically modified) may be attached directly to
the pore or attached via a linker as disclosed in International Application Nos.
PCT/GB09/001690 (published as WO 2010/004273), PCT/GB09/001679 (published as WO
2010/004265) or PCT/GB 10/000 133 (published as WO 2010/086603).
Moving
In the method of the invention, the single stranded polynucleotide is moved through the
transmembrane pore. Moving the single stranded polynucleotide through the transmembrane
pore refers to moving the polynucleotide from one side of the pore to the other. Movement of
the single stranded polynucleotide through the pore can be driven or controlled by potential or
enzymatic action or both. The movement can be unidirectional or can allow both backwards and
forwards movement.
A polynucleotide binding protein is preferably used to control the movement of the single
stranded polynucleotide through the pore. This protein is preferably the same protein that
separates the two strands of the polynucleotide. More preferably, this protein is Phi29 DNA
polymerase. The three modes of Phi29 DNA polymerase, as discussed above, can be used to
control the movement of the single stranded polynucleotide through the pore. Preferably, Phi29
DNA polymerase separates the target polynucleotide and controls the movement of the resulting
single stranded polynucleotide through the transmembrane pore.
In some embodiments, the entire target polynucleotide (as a single stranded
polynucleotide comprising the one strand of the target polynucleotide linked to the other strand
of the target polynucleotide by the bridging moiety) will move through the pore. Thus, the entire
target polynucleotide is moved through the pore and sequenced. In other embodiments, only part
of the target polynucleotide moves through the pore. Such embodiments where only part of the
target polynucleotide moves through the pore may be as follows:
(i) part of one strand of the target polynucleotide (for example part of the sense
strand of DNA)
(ii) all of one strand of the target polynucleotide (for example all of the sense strand
of DNA)
(iii) all of one strand (for example all of the sense strand of DNA) and part of the
second strand (for example part of the anti-sense strand of DNA)
In embodiments where only part of one strand, or all of one strand and part of the other
strand, moves through the pore it is irrelevant which of the original two strands (i.e. the sense
and anti-sense strands) is fully or partially moved through the pore. Furthermore, the order of
movement of the sense and antisense strands through the pore does not matter.
In some embodiments, as discussed above and shown in Fig 13, after linking of the two
strands of a double stranded analyte and separating the two linked strands into a single stranded
target polynucleotide, a complementary strand to the single stranded target polynucleotide is
generated to form a second construct. The two strands of the second construct may be linked
together as described herein. The complementary strand of the second construct is then separated
from the single stranded target polynucleotide. In this situation, the original single stranded
target polynucleotide may move through the pore and / or the complementary strand may move
through the pore. In some instances, only the complementary strand is sequenced. As described
above, this process of separation and complementary strand generation can be repeated as many
times as necessary. This may be referred to herein as the "DUO" method.
When the construct further comprises a leader polymer and a tail polymer, the single
stranded target polynucleotide created after separating the two strands of the target
polynucleotide preferably moves through the pore in the order of: (1) the leader polymer; (2) the
one strand of the target polynucleotide; (3) the bridging moiety; (4) the other strand of the target
polynucleotide; and (5) the tail polymer. This is an example of a sequencing a construct made
according to the 'MONO' method.
In an alternative embodiment, a construct produced according to the DUO method may
pass through the pore in the order of: (1) the leader polymer; (2) the first strand of the target
polynucleotide; (3) the first bridging moiety; (4) the second strand of the target polynucleotide;
(5) the second bridging moiety; (6) the complement of the secondr strand of the target
polynucleotide; (7) the complement of the first bridging moiety; (8) the complement of the first
strand of the target polynucletode and (9) the tail polymer.0
Methods of sequencing a double stranded target polynucleotide
The method of the invention comprises moving the single stranded polynucleotide
through a transmembrane pore such that a proportion of the nucleotides in the single stranded
polynucleotide interact with the pore.
The method may be carried out using any suitable membrane as discussed above,
preferably a lipid bilayer system in which a pore is inserted into a lipid bilayer. The method is
typically carried out using (i) an artificial bilayer comprising a pore, (ii) an isolated, naturallyoccurring
lipid bilayer comprising a pore, or (iii) a cell having a pore inserted therein. The
method is preferably carried out using an artificial lipid bilayer. The bilayer may comprise other
transmembrane and/or intramembrane proteins as well as other molecules in addition to the pore.
Suitable apparatus and conditions are discussed below with reference to the sequencing
embodiments of the invention. The method of the invention is typically carried out in vitro.
The present invention provides methods of sequencing a double stranded target
polynucleotide. As discussed above, a polynucleotide is a macromolecule comprising two or
more pairs of nucleotides. The nucleotides may be any of those discussed above. The
polynucleotide is preferably a nucleic acid.
These methods are possible because transmembrane protein pores can be used to
differentiate nucleotides of similar structure on the basis of the different effects they have on the
current passing through the pore. Individual nucleotides can be identified at the single molecule
level from their current amplitude when they interact with the pore. The nucleotide is present in
the pore if the current flows through the pore in a manner specific for the nucleotide (i.e. if a
distinctive current associated with the nucleotide is detected flowing through the pore).
Successive identification of the nucleotides in a target polynucleotide allows the sequence of the
polynucleotide to be estimated or determined.
The method comprises (a) providing a construct comprising the target polynucleotide,
wherein the two strands of the target polynucleotide are linked by the bridging moiety; (b)
separating the two strands of the target polynucleotide by contacting the construct with a nucleic
acid binding protein; (c) moving the resulting single stranded polynucleotide through the
transmembrane pore; and (d) measuring the current passing through the pore during each
interaction and thereby determining or estimating the sequence of the target polynucleotide.
Hence, the method involves transmembrane pore sensing of a proportion of the nucleotides in a
target polynucleotide as the nucleotides individually pass through the barrel or channel in order
to sequence the target polynucleotide. As discussed above, this is Strand Sequencing.
The whole or only part of the target polynucleotide may be sequenced using this method.
The polynucleotide can be any length. For example, the polynucleotide can be at least 10, at
least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400 or at least
500 nucleotide pairs in length. The polynucleotide can be 1000 or more nucleotide pairs, 5000
or more nucleotide pairs or 100000 or more nucleotide pairs in length. The polynucleotide can
be naturally occurring or artificial. For instance, the method may be used to verify the sequence
of a manufactured oligonucleotide. The methods are typically carried out in vitro.
The single stranded polynucleotide may interact with the pore on either side of the
membrane. The single stranded polynucleotide may interact with the pore in any manner and at
any site.
During the interaction between a nucleotide in the single stranded polynucleotide and the
pore, the nucleotide affects the current flowing through the pore in a manner specific for that
nucleotide. For example, a particular nucleotide will reduce the current flowing through the pore
for a particular mean time period and to a particular extent. In other words, the current flowing
through the pore is distinctive for a particular nucleotide. Control experiments may be carried
out to determine the effect a particular nucleotide has on the current flowing through the pore.
Results from carrying out the method of the invention on a test sample can then be compared
with those derived from such a control experiment in order to determine or estimate the sequence
of the target polynucleotide.
The sequencing methods may be carried out using any suitable membrane/pore system in
which a pore is inserted into a membrane. The methods are typically carried out using a
membrane comprising naturally-occurring or synthetic lipids. The membrane is typically formed
in vitro. The methods are preferably not carried out using an isolated, naturally occurring
membrane comprising a pore, or a cell expressing a pore. The methods are preferably carried
out using an artificial membrane. The membrane may comprise other transmembrane and/or
intramembrane proteins as well as other molecules in addition to the pore.
The membrane forms a barrier to the flow of ions, nucleotides and polynucleotides. The
membrane is preferably an amphiphilic layer such as a lipid bilayer. Lipid bilayers suitable for
use in accordance with the invention are described above.
The sequencing methods of the invention are typically carried out in vitro.
The sequencing methods may be carried out using any apparatus that is suitable for
investigating a membrane/pore system in which a pore is inserted into a membrane. The method
may be carried out using any apparatus that is suitable for transmembrane pore sensing. For
example, the apparatus comprises a chamber comprising an aqueous solution and a barrier that
separates the chamber into two sections. The barrier has an aperture in which the membrane
containing the pore is formed.
The sequencing methods may be carried out using the apparatus described in
International Application No. PCT/GB08/000562.
The methods of the invention involve measuring the current passing through the pore
during interaction with the nucleotide(s). Therefore the apparatus also comprises an electrical
circuit capable of applying a potential and measuring an electrical signal across the membrane
and pore. The methods may be carried out using a patch clamp or a voltage clamp. The methods
preferably involve the use of a voltage clamp.
The sequencing methods of the invention involve the measuring of a current passing
through the pore during interaction with the nucleotide. Suitable conditions for measuring ionic
currents through transmembrane protein pores are known in the art and disclosed in the Example.
The method is typically carried out with a voltage applied across the membrane and pore. The
voltage used is typically from -400mV to +400mV. The voltage used is preferably in a range
having a lower limit selected from -400 mV, -300mV, -200 mV, -150 mV, -100 mV, -50 mV, -
20mV and 0 mV and an upper limit independently selected from +10 mV, + 20 mV, +50 mV,
+ 100 mV, +1 0 mV, +200 mV, +300 mV and +400 mV. The voltage used is more preferably in
the range lOOmV to 240mV and most preferably in the range of 160mV to 240mV. It is possible
to increase discrimination between different nucleotides by a pore by using an increased applied
potential.
The sequencing methods are typically carried out in the presence of any alkali metal
chloride salt. In the exemplary apparatus discussed above, the salt is present in the aqueous
solution in the chamber. Potassium chloride (KC1), sodium chloride (NaCl), caesium chloride
(CsCl) or a mixture of potassium ferrocyanide and potassium ferricyanide is typically used.
C1, NaCl and a mixture of potassium ferrocyanide and potassium ferricyanide are preferred.
The salt concentration may be at saturation. The salt concentration is typically from 0.1 to 2.5M,
from 0.3 to 1.9M, from 0.5 to 1.8M, from 0.7 to 1.7M, from 0.9 to 1.6M or from 1M to 1.4M.
The salt concentration is preferably from 150mM to 1M. High salt concentrations provide a high
signal to noise ratio and allow for currents indicative of the presence of a nucleotide to be
identified against the background of normal current fluctuations. Lower salt concentrations may
be used if nucleotide detection is carried out in the presence of an enzyme. This is discussed in
more detail below.
The methods are typically carried out in the presence of a buffer. In the exemplary
apparatus discussed above, the buffer is present in the aqueous solution in the chamber. Any
buffer may be used in the method of the invention. Typically, the buffer is HEPES. Another
suitable buffer is Tris-HCl buffer. The methods are typically carried out at a pH of from 4.0 to
12.0, from 4.5 to 10.0, from 5.0 to 9.0, from 5.5 to 8.8, from 6.0 to 8.7 or from 7.0 to 8.8 or 7.5
to 8.5. The pH used is preferably about 7.5.
The methods may be carried out at from 0°C to 100°C, from 15°C to 95°C, from 16°C to
90°C, from 17°C to 85°C, from 18°C to 80°C, 19°C to 70°C, or from 20°C to 60°C. The methods
are typically carried out at room temperature. The methods are optionally carried out at a
temperature that supports enzyme function, such as about 37°C.
As mentioned above, good nucleotide discrimination can be achieved at low salt
concentrations if the temperature is increased. In addition to increasing the solution temperature,
there are a number of other strategies that can be employed to increase the conductance of the
solution, while maintaining conditions that are suitable for enzyme activity. One such strategy is
to use the lipid bilayer to divide two different concentrations of salt solution, a low salt
concentration of salt on the enzyme side and a higher concentration on the opposite side. One
example of this approach is to use 200 mM of KC1 on the cis side of the membrane and 500 mM
C1in the trans chamber. At these conditions, the conductance through the pore is expected to
be roughly equivalent to 400 mM C1under normal conditions, and the enzyme only
experiences 200 mM if placed on the cis side. Another possible benefit of using asymmetric salt
conditions is the osmotic gradient induced across the pore. This net flow of water could be used
to pull nucleotides into the pore for detection. A similar effect can be achieved using a neutral
osmolyte, such as sucrose, glycerol or PEG. Another possibility is to use a solution with
relatively low levels of KC1 and rely on an additional charge carrying species that is less
disruptive to enzyme activity.
The target polynucleotide being analysed can be combined with known protecting
chemistries to protect the polynucleotide from being acted upon by the binding protein while in
the bulk solution. The pore can then be used to remove the protecting chemistry. This can be
achieved either by using protecting groups that are unhybridised by the pore, binding protein or
enzyme under an applied potential (WO 2008/124107) or by using protecting chemistries that are
removed by the binding protein or enzyme when held in close proximity to the pore (J Am Chem
Soc. 2010 Dec 22;132(50):17961-72).
The Strand Sequencing method of the invention uses a polynucleotide binding protein to
separate the two strands of the target polynucleotide. More preferably, the polynucleotide
binding protein also controls the movement of the target polynucleotide through the pore.
Examples of such proteins are given and discussed above.
The two strategies for single strand sequencing are the translocation of the single
stranded polynucleotide through the transmembrane pore, both cis to trans and trans to cis,
either with or against an applied potential. The most advantageous mechanism for strand
sequencing is the controlled translocation of a single stranded polynucleotide through the
nanopore under an applied potential. Exonucleases that act progressively or processively on
double stranded polynucleotides can be used on the cis side of the pore to feed the remaining
single strand through under an applied potential or the trans side under a reverse potential.
Likewise, a helicase that unwinds the double stranded polynucleotide can also be used in a
similar manner. There are also possibilities for sequencing applications that require strand
translocation against an applied potential, but the polynucleotide must be first "caught" by the
enzyme under a reverse or no potential. With the potential then switched back following binding
the strand will pass cis to trans through the pore and be held in an extended conformation by the
current flow. The single strand polynucleotide exonucleases or single strand polynucleotide
dependent polymerases can act as molecular motors to pull the recently translocated single strand
back through the pore in a controlled stepwise manner, trans to cis, against the applied potential.
Alternatively, the single strand DNA dependent polymerases can act as molecular brake slowing
down the movement of a polynucleotide through the pore.
In the most preferred embodiment, Strand Sequencing is carried out using a pore derived
from Msp and a Phi29 DNA polymerase. The method comprises (a) providing the double
stranded target polynucleotide construct; (b) allowing the target polynucleotide to interact with a
Phi29 DNA polymerase, such that the strands are separated and the polymerase controls the
movement of the target polynucleotide through the Msp pore and a proportion of the nucleotides
in the target polynucleotide interacts with the pore; and (c) measuring the current passing
through the pore during each interaction and thereby estimating or determining the sequence of
the target polynucleotide, wherein steps (b) and (c) are carried out with a voltage applied across
the pore.
When the target polynucleotide is contacted with a P 29 DNA polymerase and a pore
derived from Msp, the target polynucleotide firstly forms a complex with the Phi29 DNA
polymerase. When the voltage is applied across the pore, the target polynucleotide/Phi29 DNA
polymerase complex forms a complex with the pore and controls the movement of the single
stranded polynucleotide through the pore.
This embodiment has three unexpected advantages. First, the target polynucleotide
moves through the pore at a rate that is commercially viable yet allows effective sequencing.
The target polynucleotide moves through the Msp pore more quickly than it does through a
hemolysin pore. Second, an increased current range is observed as the polynucleotide moves
through the pore allowing the sequence to be estimated or determined more easily. Third, a
decreased current variance is observed when the specific pore and polymerase are used together
thereby increasing the signal-to-noise ratio.
Any polynucleotide described above may be sequenced.
The pore may be any of the pores discussed above. The pore may comprise eight
monomers comprising the sequence shown in SEQ ID NO: 2 or a variant thereof.
As discussed above, wild-type Phi29 DNA polymerase has polymerase and exonuclease
activity. It may also unzip double stranded polynucleotides under the correct conditions. Hence,
the enzyme may work in three modes (as discussed above). The method of the invention
preferably involves an Msp pore and Phi29 DNA polymerase. The Phi29 DNA polymerase
preferably separates the double stranded target polynucleotide and controls the movement of the
resulting single stranded polynucleotide through the pore.
Any of the systems, apparatus or conditions discussed above may be used in accordance
with this preferred embodiment. The salt concentration is typically from 0.1 M to 0.6M. The
salt is preferably KC1.
Kits
The present invention also provides kits for preparing a double stranded target
polynucleotide for sequencing. The kit comprises (a) a bridging moiety capable of linking the
two strands of the target polynucleotide and (b) at least one polymer.
In a preferred embodiment, the kit further comprises a leader polymer and a tail polymer.
Leader polymers and tail polymers are described in detail above. If the leader and tail polymers
are polynucleotides, the leader and tail polymers can be provided as a single unit. In this unit, a
portion of the leader polymer and a portion of the tail polymer form a double strand. This double
stranded region may typically be from 5 to 20 nucleotide pairs in length. The end of the double
stranded portion of this unit is linked to the double stranded target polynucleotide. Suitable
methods for linking two double stranded polynucleotides are known in the art. The remainder of
the leader and tail polymer remain as single stranded polynucleotides.
The kit also preferably further comprises one or more markers that produce a distinctive
current when they interact with a transmembrane pore. Such markers are described in detail
above.
The kit preferably also comprises means to couple the target polynucleotide to a
membrane. Means of coupling the target polynucleotide to a membrane are described above.
The means of coupling preferably comprises a reactive group. Suitable groups include, but are
not limited to, thiol, cholesterol, lipid and biotin groups.
The kit may further comprise the components of a membrane, such as the phospholipids
needed to form a lipid bilayer.
Any of the embodiments discussed above with reference to the methods of the invention
are equally applicable to the kits of the invention.
The kits of the invention may additionally comprise one or more other reagents or
instruments which enable any of the embodiments mentioned above to be carried out. Such
reagents or instruments include one or more of the following: suitable buffer(s) (aqueous
solutions), means to obtain a sample from a subject (such as a vessel or an instrument comprising
a needle), means to amplify and/or express polynucleotides, a membrane as defined above or
voltage or patch clamp apparatus. Reagents may be present in the kit in a dry state such that a
fluid sample resuspends the reagents. The kit may also, optionally, comprise instructions to
enable the kit to be used in the method of the invention or details regarding which patients the
method may be used for. The kit may, optionally, comprise nucleotides.
Method of preparing a target polynucleotide for sequencing
The invention also provides a method of preparing a double stranded target
polynucleotide for sequencing. This method generates the construct that allows the target
polynucleotide to be sequenced. In this method, the two strands of the target polynucleotide are
linked by a bridging moiety and a polymer is attached to one strand at the other end of the target
polynucleotide. The polymer is preferably a leader polymer and the method also preferably
further comprises attaching a tail polymer to the other strand of the target polynucleotide (i.e. at
the same end as the leader polymer). Leader polymers and tail polymers are discussed in detail
above.
The method preferably also further comprises attaching a means to couple the construct
to the membrane to the construct. Such means are described above.
The bridging moiety may be synthesized separately and then chemically attached or
enzymatically ligated to the target polynucleotide. Means for doing so are known in the art.
Alternatively, the bridging moiety may be generated in the processing of the target
polynucleotide. Again, suitable means are known in the art.
A suitable means for preparing a target polynucleotide for sequencing is illustrated in
Example 3.
Apparatus
The invention also provides an apparatus for sequencing a double stranded target
polynucleotide. The apparatus comprises (a) a membrane, (b) a plurality of transmembrane
pores in the membrane, (c) a plurality of polynucleotide binding proteins capable of separating
the two strands of the target polynucleotide and (d) instructions for carrying out the method of
the invention. The apparatus may be any conventional apparatus for polynucleotide analysis,
such as an array or a chip. Any of the embodiments discussed above with reference to the
methods of the invention are equally applicable to the kits of the invention.
The apparatus is preferably set up to carry out the method of the invention.
Suitable nucleic acid binding proteins, such as Phi29 DNA polymerase, are described
above.
The apparatus preferably comprises:
a sensor device that is capable of supporting the membrane and plurality of pores and
being operable to perform polynucleotide sequencing using the pores and proteins;
at least one reservoir for holding material for performing the sequencing;
a fluidics system configured to controllably supply material from the at least one
reservoir to the sensor device; and
a plurality of containers for receiving respective samples, the fluidics system being
configured to supply the samples selectively from the containers to the sensor device. The
apparatus may be any of those described in International Application No. No.
PCT/GB08/004127 (published as WO 2009/077734), PCT/GB 10/000789 (published as WO
2010/122293), International Application No. PCT/GB 10/002206 (not yet published) or
International Application No. PCT/US99/25679 (published as WO 00/28312).
The following Examples illustrate the invention:
Example 1 - Reading around dsDNA hairpins:
The ability of an enzyme such as Phi29 DNA polymerase to act as a molecular brake
along ssDNA, but still also functionally pass along dsDNA sections, can be exploited to read
around the hairpin turn of dsDNA constructs, permitting DNA/RNA sequencing of both the
sense and anti-sense strands. Fig. 3 illustrates how both the sense and anti-sense strands of
dsDNA constructs with hairpin turns can be sequenced with an enzyme such as Phi29 DNA
polymerase. In this implementation based on Phi29 DNA polymerase, the dsDNA constructs
contain a 5'-ssDNA leader to enable capture under an applied field by a nanopore. This is
followed by a dsDNA section that is linked by a hairpin turn. The hairpin turn can optionally
contain a marker (X in Fig. 3) in the turn that creates a characteristic current signature to aid in
identification of the sense strand region from the anti-sense region. Since the last ~20bases in the
current implementation are not sequenced because the read-head is ~20 bases downstrand of the
enzyme when it falls off the end of the DNA, the constructs could also optionally contain a 3'-
ssDNA extension to permit reading to the end of the anti-sense region.
The hairpin turns that link the two dsDNA sections could be made of, but are not limited
to, sections of DNA/RNA, modified DNA or RNA, PNA, LNA, PEG, other polymer linkers, or
short chemical linkers. The hairpin linkers could be synthesised separately and chemically
attached or enzymatically ligated to dsDNA, or could be generated in processing of the genomic
DNA.
Methods:
DNA: Four separate DNA constructs were prepared as shown in Figs. 4-7 from synthetic
DNA (Table 4).
Table 4 - Synthetic DNA used in experiments of Example 1
All DNA were purchased from Integrated DNA Technologies (IDT) as a PAGE purified
dry pellet, and were resolvated in pure water to a final concentration of 100 mM. The short
dsDNA construct was prepared by hybridizing UZ08 to UZ12 (Table 4). To hybridize, equal
quantities of the 100 mM DNA solutions were mixed together, heated to 95 °C on a hot plate,
held at 95 °C for 10 min, then allowed to slowly cool to room temperature over the course of
~2 hours. This yields a final solution of hybridized DNA complex at 50uM. The UZ07, UA02
and MS23 DNA constructs are hairpins with 4T turns (Table 4). To hybridize the sense and antisense
regions, the IOOmM DNA solutions were heated to 95°C on a hot plate, held at 95 °C for
10 min, then rapidly cooled to 4 °C by placing the samples in a refrigerator. The rapid cooling
enhances intra-molecular hairpin formation over inter-molecular hybridization. The process
yields a final solution of hybridized DNA hairpins at 100 mM.
MspA production: Purified MspA (NNNRRK) oligomers were made in a cell- free
Escherichia coli in vitro transcription translation system (Promega). Purified oligomers were
obtained by cutting the appropriate oligomer band from a gel after SDS-PAGE, then re-solvating
in TE buffer.
Unzipping experiments: Electrical measurements were acquired from single MspA
nanopores inserted in l,2-diphytanoyl-glycero-3-phosphocholine lipid (Avanti Polar Lipids)
bilayers. Bilayers were formed across -100 mh diameter apertures in 20 mh thick PTFE films
(in custom Delrin chambers) via the Montal- Mueller technique, separating two 1mL buffered
solutions. All experiments were carried out in a Strand EP buffer of 400 mM KC1, 10 mM
Hepes, 1mM EDTA, 1mM DTT at pH 8.0. Single-channel currents were measured on
Axopatch 200B amplifiers (Molecular Devices) equipped with 1440A digitizers. Ag/AgCl
electrodes were connected to the buffered solutions so that the cis compartment (to which both
nanopore and enzyme/DNA are added) is connected to the ground of the Axopatch headstage,
and the trans compartment is connected to the active electrode of the headstage.
DNA construct and Phi29 DNA polymerase (Enzymatics, 1 0 mM) were added to 100 mE
of strand EP buffer and pre-incubated for 5mins (DNA = 1 mM, Enzyme = 2 mM) . This pre
incubation mix was added to 900mE of buffer in the cis compartment of the electrophysiology
chamber to initiate capture and unzipping of the complexes in the MspA nanopore (to give final
concentrations of DNA = 0.1 mM, Enzyme = 0.2 mM) . Only one type of DNA was added into the
system in a single experiment. Unzipping experiments were carried out at a constant potential of
+ 180mV.
Results:
Characteristic and consistent polymerase controlled DNA movements were observed
when the dsDNA constructs with and without hairpins were unzipped through MspA nanopores
using Phi29 DNA polymerase (Figs. 4-7). Figs. 4-7 show the consensus DNA sequence profiles
obtained from multiple single translocations of an analyte through the nanopore) for each of the
DNA constructs shown.
The dsDNA construct (UZ08+UZ12) with no hairpin shows a small number of sequence
dependent states (typically- 10, Fig. 4). This is consistent with -10-15 bases of the 3 1 in the
dsDNA section passing through the read-head of the nanopore before the enzyme falls off the 3'-
end of the DNA (-20 bases upstrand of the read-head), and the last -20 bases translocate unbraked
through the nanopore, too fast to be resolved.
UZ07 (Table 4) contains the same DNA sequence as UZ08+UZ12, but is a hairpin
construct with a 4T turn connecting the sense and anti-sense strands. The consensus sequence
obtained fromUZ07 (Fig. 5) shares the same initial profile as UZ08+UZ12, but shows many
more sequence states (typically >30) than that for UZ08+UZ12 dsDNA (Fig. 4). This shows that
the enzyme is proceeding around the hairpin of the sense strand, and along the anti-sense strand.
This allows downstrand reading of the entire sense strand, and part of the anti-sense strand
(except the last -20 bases before the enzyme falls of the 3'-end).
UA02 (Table 4) has the same sequence as UZ07, but with the addition of an extra 25
bases of non-complementary ssDNA on the 3'-end of the construct. The consensus sequence
obtained from UA02 (Fig. 6) shows a closely matching sequence profile to UZ07 (Fig. 5), but
with an additional ~20 states at the end. The additional 25 bases on the 3'-end permits the full
length of anti-sense to be read before the up-strand enzyme falls off the end of the DNA (~20
bases upstrand of the read-head). Fig. 8 shows the consensus sequence from UA02 - the
homopolymeric 5'-overhang initially in the nanopore (section 1), the sense (section 2), turn
(section 3) and antisense (section 4) regions.
Markers can be placed in or near the hairpin turn, that when sequenced can produce a
characteristic signal that permits simple identification of the sense and anti-sense regions of
unknown DNA sequences. MS23 (Table 4) has the same sequence as UZ07, but with the
addition of an abasic marker in the 4T turn of the hairpin separating the sense and anti-sense
strands. The consensus sequence obtained from MS23 (Fig. 7) shows a closely matching
sequence profile to UZ07 (Fig. 5), but with an altered large upwards spike in current in the turn
region (marked with *) as a result of the abasic passing through the nanopore read-head at this
point. This large upwards spike in current is characteristic of the reduced ionic blocking of
abasic residues in the nanopore constriction relative to normal bases, and provides a clear signal
by which to separate the sense and anti sense regions.
Summary:
These experiments demonstrate that Phi29 DNA polymerase is able to read around the
hairpin turn of dsDNA constructs, due to its ability to act as an efficient molecular brake along
the ssDNA section.
The read-head of MspA nanopores in this implementation is ~20bases downstrand from
the DNA at the entrance to the Phi29 enzyme. As a result, when the enzyme gets to the end of a
DNA strand and releases the substrate, the remaining ~20 bases translocate in an uncontrolled
manner through the nanopore too quickly to be resolved/sequenced. However, optional 3'-
extensions (in this 5' to 3' reading direction) can be added to the DNA constructs to extend the
reading distance, which permits full sequencing of the entire anti-sense strand.
Markers that produce characteristics current signatures can optionally be placed in or
near the hairpin turn of a DNA construct to aid in identification of the sense and anti-sense
regions of the sequence. The markers could be, but are not limited to, unique known sequence
motifs of normal bases, or unnatural or modified bases that produce alternative current
signatures.
Example 2 Reading around dsDNA hairpins on genomic DNA:
Reading around hairpins using Phi29 DNA polymerase can be extended to long genomic
dsDNA with ligated hairpins (Fig. 9). Fig. 9 shows a general design outline for creating dsDNA
suitable for reading around hairpins. The constructs have a leader sequence with optional marker
(e.g. abasic DNA) for capture in the nanopore, and hairpin with optional marker, and a tail for
extended reading into anti-sense strand with optional marker.
Methods:
DNA: A 400 base-pair section of PhiX 174 RF1 genomic DNA was amplified using PCR
primers containing defined restriction sites. Following Kasl and Mlul restriction endonuclease
digestion of the 400 bp fragment, DNA adapters containing complimentary ends were then
ligated (Fig. 10). The desired product was finally isolated by PAGE purification and quantified
by absorbance at A260 nm.
For ease of analysis each adapter piece contained set abasic markers so that the progress
of the DNA through the nanopore could be tracked. The sequences of all primers and adapters
are given in Table 5
Table 5 - Primer and adapter DNA for creating the genomic hairpin DNA constructs
Example 2. Abasic DNA bases (abasic = X) in the adapters provide markers for easily
identifying the start of the sense strand, the hairpin turn, and the end of the anti-sense strand.
MspA production: See Example 1.
Unzipping experiments: See Example 1.
The genomic DNA constructs were incubated with P 29 DNA polymerase (Enzymatics,
150mM) in IOOmE of strand EP buffer and pre-incubated for 5mins (DNA = 5 nM,
Enzyme = 2mM) . This pre-incubation mix was added to 900 mE of buffer in the cis compartment
of the electrophysiology chamber to initiate capture and unzipping of the complexes in the MspA
nanopore (to give final concentrations of DNA = 0.5 nM, Enzyme = 0.2 mM) . Only one type of
DNA was added into the system in a single experiment. Unzipping experiments were carried out
at a constant potential of + 180mV.
Results:
400mer-No3 (which has a 3' cholesterol TEG) (Table 6) added to MspA nanopores with
Phi29 DNA polymerase resulted in unzipping of the DNA and polymerase controlled DNA
movement lasting l-3mins with a large number of sequence dependent states (Figs. 11 and 12).
The abasic markers, at the start of the sense strand, in the middle of the hairpin turn, and at the
end of the anti-sense strand, permit easy identification of the separate sections of the sequence.
Figures 11 and 12 clearly show 3 abasic peaks, demonstrating the ability to read around hairpins
ligated to long genomic DNA, and thus sequence both the sense and anti-sense strands of the
dsDNA.
Table 6 - Full DNA sequence of genomic construct with ligated adapters used in this example
Example 3 - Sample preparation for sequencing
A 400 base-pair section of PhiX 174 RF1 genomic DNA was amplified using PCR
primers containing defined restriction sites. Following Kasl and Mlul restriction endonuclease
digestion of the 400 bp fragment, DNA adapters containing complimentary ends were then
ligated. The desired product was finally isolated by PAGE purification and quantified by
absorbance at A260 nm.
For ease of analysis each adapter piece contained set abasic markers so that the progress
of the DNA through the nanopore could be tracked. The sequences of all primers and adapters
are given below (X = abasic modification):
Kasl Sense Primer
TTTTTTTTTTGGCGCCCTGCCGTTTCTGATAAGTTGCTT (SEQ ID NO. 21)
Mlul Antisense Primer
AAAAAAAAAAACGCGTAAACCTGCTGTTGCTTGGAAAG (SEQ ID NO. 22)
Kasl Sense Adapter
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTXXXXTTTTTTTTT
TGGCTAACAAACAAGAAACATAAACAGAATAG (SEQ ID NO. 23)
Kasl Antisense Adapter
GCGCCTATTCTGTTTATGTTTCTTGTTTGTTAGCCTTTTTTXXXXTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTT (SEQ ID NO. 24)
Mlul HP
CGCGCTATTCTGTTTATGTTTCTTGTTTGTTAGCCXXXXGGCTAACAAACAAGAAAC
ATAAACAGAATAG (SEQ ID NO. 25)
The construct is shown in Fig. 10.
If the above adapters contain non-complimentary bases instead of abasic sites then it is
possible to open out the analyte using a primer complimentary to the ssDNA section of the
antisense Y-shaped adapter and a strand displacing polymerase (such as Phi29 or Klenow). If the
primer and Y-shaped adapter complimentary region also contain a restriction site then another
hairpin can be ligated, after the amplification, and the process repeated to expand the template
further. This gives the ability to sequence the molecule twice; the original sense-antisense in a
fully dsDNA unzipping mode and then the amplified sense-antisense strand in fully ssDNA
unzipping mode.
Ligation of DNA adapters to target DNA has been well described previously and this is
still the most widespread technique. Ligation of the adapters could occur either by single strand
ligation using T4 RNA ligase 1 or by double strand ligation of annealed adapters using T4 DNA
ligase. To prevent target dimer and adapter dimer formation the target can be first dA-tailed
using a polymerase such as Klenow exo-.
More recently, advances with artificial transposons (such as the Nextera system) have
begun to show promise in rapidly speeding up adapter attachment while also simultaneously
fragmenting the DNA. In theory a similar approach might be achieved using homologous
recombination, such as the Cre LoxP system (NEB), providing compatible sequences lie within
the DNA. Advances in chemical ligation have also improved recently, demonstrated with highest
success by the successful amplification of DNA strands containing an unnatural triazole linkage
(Sagheer and Brown, 2009). For chemical ligation the modification of either the 5' or 3' of the
DNA is usually first required to include a suitable reactive group. Groups can be easily added to
the 3' using a modified dNTP and terminal transferase. Modification of the 5' end of DNA has
also been demonstrated but this has so far been limited to thiol groups using using T4
Polynucleotide kinase. Some success for direct coupling of molecules to the 5' of DNA via
chemical means has been demonstrated using carboiimide coupling and such kits are
commercially available. However side products are a frequent problem with this chemistry.
Example 4 - Reading around dsDNA hairpins of Genomic DNA using a Helicase
Reading around DNA strands, which consist of connected long genomic dsDNA ligated
by hairpins, using a helicase enzyme was investigated (Fig. 17). The constructs used have a
leader sequence with optional marker (e.g. abasic DNA) for capture in the nanopore, a hairpin
with optional marker, and a tail which has an extended reading sequence and a cholesterol tether
attached to the end.
Methods:
To link the sense and antisense strands a bridging hairpin (SEQ ID NO: 32) was ligated
to one end. A synthetic Y-adaptor was ligated on to allow enzyme binding and threading into the
nanopore: the sense strand (SEQ ID NO: 29 attached to SEQ ID NO: 30 via four abasic DNA
bases, see Fig. 18) of this adaptor contains the 5' leader, a sequence that is complementary to the
tether sequence (SEQ ID NO: 35, which at the 3' end of the sequence has six iSpl8 spacers
attached to two thymine residues and a 3' cholesterol TEG) and 4 abasics. The antisense half of
the adaptor also has a 3' hairpin which will act as an intramolecular primer for later conversion to
a DUO analyte (SEQ ID NO: 31, see Fig. 22 starter template). MONO analyte DNA was
prepared using a -400 bp region of PhiX 174 (Sense strand sequence = SEQ ID NO: 33 and
antisense strand sequence = SEQ ID NO: 34). The region of interest was PCR amplified with
primers containing Sad and Kpnl restriction sites (SEQ ID NO's: 27 and 28 respectively).
Purified PCR product was then Sad and Kpnl digested before aY-shaped adapter (sense strand
sequence (SEQ ID NO: 29 attached to SEQ ID NO: 30 via four abasic DNA bases) is ligated
onto the 5' end of SEQ ID NO: 33 and the anti-sense strand (SEQ ID NO: 31) is ligated onto the
3' end of the SEQ ID NO: 34) and a hairpin (SEQ ID NO: 32, used to join SEQ ID NO's: 33 and
34) were ligated to either end, using T4 DNA ligase (See Fig. 18 for final DNA construct). The
product was purified from a 5% TBE PAGE gel and eluted by crush and soak method into TE
buffer.
MspA production: Purified MspA oligomers of the mutant MspA pore MS(B1-G75SG77S-
L88N-Q126R)8 MspA (SEQ ID NO: 2 with the mutations G75S/G77S/L88N/Q126R)
were made in a cell-free Escherichia coli in vitro transcription translation system (Promega).
Purified oligomers were obtained by cutting the appropriate oligomer band from a gel after SDSPAGE,
then re-solvating in TE buffer.
Helicase experiments - Electrical measurements were acquired from single MspA
nanopores (MS(B1-G75S-G77S-L88N-Q126R)8 MspA (SEQ ID NO: 2 with the mutations
G75S/G77S/L88N/Q126R) ) inserted in l,2-diphytanoyl-glycero-3-phosphocholine lipid (Avanti
Polar Lipids) bilayers. Bilayers were formed across ~100 m i diameter apertures in 20 mih thick
PTFE films (in custom Delrin chambers) via the Montal-Mueller technique, separating two 1mL
buffered solutions. All experiments were carried out in a buffer of 400 mM NaCl, 100 mM
Hepes, 10 mM potassium ferrocyanide, 10 mM potassium ferricyanide, pH8.0, at an applied
potential of +140 mV. Single-channel currents were measured on Axopatch 200B amplifiers
(Molecular Devices) equipped with 1440A digitizers. Platinum electrodes were connected to the
buffered solutions so that the cis compartment (to which both nanopore and enzyme/DNA are
added) is connected to the ground of the Axopatch headstage, and the trans compartment is
connected to the active electrode of the headstage.
A single pore was obtained before MgC12 and dTTP were added to the cis chamber to
give final concentrations of 10 mM and 5 mM respectively. Data was obtained for 5 mins at
+140 mV before DNA (SEQ ID NOs: 29-35 connected as shown in Fig. 18) was added to the cis
chamber for a final concentration of 0 .1 nM and data obtained for a further 5 mins. Helicase was
added to the cis chamber to a final concentration of 100 nM and any helicase controlled DNA
movements were recorded at +140 mV.
Results:
The 400 bp sense/antisense hairpin construct (SEQ ID NO's: 29-35 connected as shown
in Fig. 18) when added to an MspA nanopore (MS(B1-G75S-G77S-L88N-Q126R)8 MspA (SEQ
ID NO: 2 with the mutations G75S/G77S/L88N/Q126R)) with a helicase resulted in unzipping of
the DNA and helicase controlled DNA movement, with a large number of sequence dependent
states (Figs. 19 to 21). The 400 bp sense/antisense hairpin construct (SEQ ID NO's: 29-35
connected as shown in Fig. 18) produced helicase controlled DNA movement that permitted easy
identification of the start of the sequence, as the polyT region and the abasic DNA bases at the
start of the sense strand can be observed (highlighted with a * and a # respectively in Fig. 20).
Therefore, it was possible to show that the helicase could control the movement and unzipping of
a 400 bp hairpin. The clear change in speed, between the sense and antisense regions, highlights
the point where the enzyme passed around the corner and is a useful marker between these
regions (Fig. 21, the change from reading the sense region (1) to reading the anti-sense region (2)
is shown with a *). This alteration in speed eliminates the need for markers to mark the hairpin
turn. This demonstrates the ability to read around hairpins ligated to long genomic DNA with a
helicase, and thus sequence both the sense and anti-sense strands of the dsDNA.
Example 5 - Production of DUO Polynucleotide Hairpin Strands
It has been demonstrated already that linking the information from the sense and the
antisense strands is possible by ligating a synthetic hairpin to one end of the DNA. This serves to
give a read of the natural sense and antisense strands from one molecule at the same time, so
making base-calling more accurate as one gets two chances to call a single position.
To link the sense and antisense strands a bridging hairpin (SEQ ID NO: 32) can be
ligated to one end. It is also possible to ligate on a synthetic Y-adaptor to allow enzyme binding
and threading into the nanopore: the sense strand (SEQ ID NO: 29 attached to SEQ ID NO: 30
via four abasic DNA bases, see Fig. 22 starter template) of this adaptor contains the 5' leader, a
sequence that is complementary to the tether sequence (SEQ ID NO: 35, which at the 3' end of
the sequence has six iSpl8 spacers attached to two thymine residues and a 3' cholesterol TEG)
and 4 abasics, the antisense half of the adaptor also has a 3' hairpin (SEQ ID NO: 31) which will
act as an intramolecular primer for later conversion to a DUO analyte (see Fig. 22 starter
template).
When the bridging hairpin (SEQ ID NO: 32) is ligated it is also possible to ligate on a
synthetic Y-adaptor: the sense strand (SEQ ID NO: 29 attached to SEQ ID NO: 30 via four
abasic DNA bases, see Fig. 22 starter template) of this adaptor contains the 5' leader, a sequence
that is complementary to the tether sequence (SEQ ID NO: 35, which at the 3' end of the
sequence has six iSpl 8 spacers attached to two thymine residues and a 3' cholesterol TEG) and 4
abasics, and the antisense half of the adaptor has a 3' hairpin which will act as an intramolecular
primer (SEQ ID NO: 31, see Fig. 22 starter template) this affords us the opportunity to further
expand the template by copying the entire, now linked sense and antisense, using a strand
displacing polymerase that binds to the 3' end of the Y-adaptor (Step 2 of Fig. 22). The Y-shaped
and hairpin adaptors contain mis-matched restriction sites (not sensitive to restriction digest, see
top of Fig. 23). When the analyte is subsequently filled-in and expanded (see Fig. 22 steps 2 to
3), the restriction sites are completed (See bottom of Fig. 23) , therefore, the fully filled-in
analyte (SEQ ID NO: 29-36 connected as shown in Fig. 25) can be digested using site specific
restriction endonucleases to confirm successful fill-in.
DUO analyte was prepared from the MONO analyte disclosed in Example 4 above. The
doubly ligated MONO PAGE purified analyte (SEQ ID NO's: 29-35 connected as shown in Fig.
18) was further incubated with Klenow DNA polymerase, SSB and nucleotides to allow
extension from the Y-shaped adapter hairpin (SEQ ID NO: 31). To screen for successful DUO
product (SEQ D NOs: 29-36 connected as shown in Fig. 25) a series of mismatch restriction sites
were incorporated into the adapter sequences, whereby the enzyme will cut the analyte only if
the restriction site has been successfully replicated by the DUO extension process (See Fig. 23,
MONO analyte at the top and DUO analyte at the bottom).
Fig. 24 shows that the adapter modified analyte (MONO, SEQ ID NO: 29-35) in the
absence of polymerase does not digest with the restriction enzymes (see gel on the left in Fig. 24,
Key: M = Mfel, A = Agel, X = Xmal, N = NgoMIV, B = BspEi), due to the fact they are
mismatched to one another, as shown in Fig. 23 top. However, on incubation with polymerase
there is a noticeable size shift and the shifted product (DUO) now digests as expected with each
of the restriction enzymes (see gel on the right in Fig. 24, Key: M = Mfel, A = Agel, X = Xmal,
N = NgoMIV, B = BspEl). This shows that using the described method it is possible to produce
DUO product (SEQ ID NOs: 29-36 connected as shown in Fig. 25).
Example 6 - Reading around dsDNA DUO Polynucleotide Hairpins using a Helicase
Reading around DUO hairpins constructs (SEQ ID NO's: 29-36 connected as shown in
Fig. 25), which consist of original sense (SEQ ID NO: 33) and anti-sense strands (SEQ ID NO:
34) as well as replicate sense and replicate strands (SEQ ID NO: 36), using a helicase enzyme
was investigated.
Methods: The DNA construct used in this experiment was produced by the method
disclosed in Example 5 above.
MspA production: The MspA pore MS(B1-G75S-G77S-L88N-Q126R)8 MspA (SEQ ID
NO: 2 with the mutations G75S/G77S/L88N/Q126R) was produced by the method described in
Example 4.
Unzipping experiments - Electrical measurements were acquired, as described in
Example 4, from single MspA nanopores (MS(B1-G75S-G77S-L88N-Q126R)8 MspA (SEQ ID
NO: 2 with the mutations G75S/G77S/L88N/Q126R) ) inserted in l,2-diphytanoyl-glycero-3-
phosphocholine lipid (Avanti Polar Lipids) bilayers in buffer solution (400 mM NaCl, 100 mM
Hepes, 10 mM potassium ferrocyanide, 10 mM potassium ferricyanide, pH8.0) at an applied
potential of +140 mV.
Initially, MgC12 (10 mM) and dTTP (5 mM) were added to the cis compartment and a
control experiment run for 5 mins. Secondly, the DNA construct (0.1 nN, SEQ ID NOs: 29-36
connected as shown in Fig. 25) was added to the cis compartment and a further control
experiment run for 5 mins. Finally, the helicase (100 nM) was added to the electrophysiology
chamber to initiate helicase activity. All unzipping experiments were carried out at a constant
potential of +140 mV.
Results:
The DUO hairpin construct (SEQ ID NOs: 29-36 connected as shown in Fig. 25) when
added to an MspA nanopore (MS(B1-G75S-G77S-L88N-Q126R)8 MspA (SEQ ID NO: 2 with
the mutations G75S/G77S/L88N/Q126R)) with a helicase resulted in unzipping of the DNA and
helicase controlled DNA movement, with a large number of sequence dependent states (Figs. 26
to 28). Fig. 26 shows two typical helicase controlled DNA movements, the regions which
correspond to the original sense section, original antisense section, the replicate sense region and
the replicate antisense sections are labeled 1 to 4 respectively. Fig. 27 shows a magnified view of
one of the helicase controlled DNA movement from Fig. 26 and Fig. 28 shows another
magnified view of the transition between the original sense and anti sense strands. The change in
speed between the helicase controlling the movement of the sense strand in comparison to the
antisense strand is clearly visible (Figs. 26-28). This alteration in speed eliminates the need for
markers to mark the hairpin turn. This demonstrates the ability to read around DUO hairpin
constructs (SEQ ID NOs: 29-36 connected as shown in Fig. 25), and thus sequence both the
sense and anti-sense strands of the dsDNA twice. This makes base-calling more accurate as one
gets four chances to call a single position.

CLAIMS
1. A method of sequencing a double stranded target polynucleotide, comprising:
(a) providing a construct comprising the target polynucleotide, wherein the two
strands of the target polynucleotide are linked at or near one end of the target
polynucleotide by a bridging moiety;
(b) separating the two strands of the target polynucleotide to provide a single
stranded polynucleotide comprising one strand of the target polynucleotide
linked to the other strand of the target polynucleotide by the bridging moiety;
(c) moving the single stranded polynucleotide through a transmembrane pore such
that a proportion of the nucleotides in the single stranded polynucleotide
interact with the pore; and
(d) measuring the current passing through the pore during each interaction and
thereby determining the sequence of the target polynucleotide,
wherein the separating in step (b) comprises contacting the construct with a
polynucleotide binding protein which separates the two strands of the target
polynucleotide.
2. The method according to claim 1, wherein the bridging moiety comprises a polymeric
linker, a chemical linker, a polynucleotide or a polypeptide.
3. The method according to claim 2, wherein the bridging moiety comprises DNA, RNA,
modified DNA or RNA, PNA, LNA or PEG.
4. The method according to claim 2 or 3, wherein the bridging moiety is a hairpin loop.
5. The method according to any one of the preceding claims, wherein the entire single
stranded polynucleotide moves through the pore and the entire target polynucleotide is
sequenced.
6. The method according to any one of the preceding claims, wherein a polynucleotide
binding protein controls the movement of the single stranded polynucleotide though the
transmembrane pore.
7. The method according to any one of the preceding claims, wherein the polynucleotide
binding protein separates the two strands of the target polynucleotide and controls the movement
of the single stranded polynucleotide through the pore.
8. The method according to any one of the preceding claims, wherein the protein is:
(a) derived from a virus of the Picovirinae family;
(b) derived from Phi29 DNA polymerase; or
(c) derived from a helicase.
9. The method according to any one of the preceding claims, wherein the construct further
comprises at least one polymer at the opposite end of the target polynucleotide to the bridging
moiety.
10. The method according to claim 9, wherein the construct comprises a polymer leader on
the one strand of the target polynucleotide and a polymer tail on the other strand of the target
polynucleotide.
11. The method according to claim 10, wherein the single stranded polynucleotide moves
through the pore in the order of (1) the leader polymer, (2) the one strand of the target
polynucleotide, (3) the bridging moiety, (4) the other strand of the target polynucleotide and (5)
the tail polymer.
12. The method according to any one of the preceding claims, wherein the construct further
comprises means of coupling the construct to the membrane.
13. The method according to any one of the preceding claims, wherein the transmembrane
pore is a protein pore or a solid state pore.
14. The method according to claim 13, wherein the protein pore is derived from Msp or -
hemolysin (a-HL).
15. The method according to any one of the preceding claims, wherein the membrane is an
amphiphilic layer or a solid state layer.
16. The method according to claim 15, wherein the membrane is a lipid bilayer.
17. The method according to any one of the preceding claims, wherein the construct further
comprises one or more markers which result in a distinctive current when they interact with the
pore.
18. The method according to claim 17, wherein the one or more markers are abasic or a
specific sequence of nucleotides.
19. The method according to claim 17 or 19, wherein the one or more markers are in or near
the bridging moiety.
20. The method according to claim 17 or 18, wherein the one or more markers are positioned
in the polymer leader or the polymer tail.
21. The method according to any one of claims 19 to 20, wherein the one or more markers
identify the source of the target polynucleotide.
22. A kit for preparing a double stranded target polynucleotide for sequencing comprising:
(a) a bridging moiety capable of linking the two strands of the target polynucleotide at or
near one end; and
(b) at least one polymer.
23. The kit according to claim 22, comprising a leader polymer and a tail polymer.
24. The kit according to claim 23, wherein the leader polymer and tail polymer are
polynucleotides and a portion of the leader polymer and a portion of the tail polymer form a
double stranded sequence.
25. The kit according to any one of claims 22 to 24, further comprising means of coupling
the target polynucleotide to a membrane.
26. The kit according to any one of claims 22 to 25, further comprising one or more markers
which result in a distinctive current when they interact with a transmembrane pore.
27. A method of preparing a double stranded target polynucleotide for sequencing,
comprising:
(a) linking the two strands of the target polynucleotide at or near one end with a
bridging moiety; and
(b) attaching one polymer to one strand at the other end of the target polynucleotide
and thereby forming a construct that allows the target polynucleotide to be
sequenced using a transmembrane pore.
28. The method according to claim 27, wherein the polymer is a leader polymer and the
method further comprises attaching a tail polymer to the other strand of the target polynucleotide
at the same end as the leader polymer.
29. The method according to claim 28, further comprising attaching to the construct means to
couple the construct to a membrane.
30. The method according to any one of claims 27 to 29 wherein:
(a) the bridging moiety is synthesised separately and is chemically attached or enzymatically
ligated to the target polynucleotide; or
(b) the bridging moiety is generated in the processing of the target polynucleotide.
31. A method of sequencing a double stranded target polynucleotide, comprising:
(a) providing a construct comprising the target polynucleotide, wherein the two strands of
the target polynucleotide are linked at or near one end of the target polynucleotide by a
bridging moiety;
(b) separating the two strands of the target polynucleotide to provide a single stranded
polynucleotide comprising one strand of the target polynucleotide linked to the other
strand of the target polynucleotide by the bridging moiety;
(c) synthesising a complement of the single stranded polynucleotide, such that the single
stranded polynucleotide and complement form a double stranded polynucleotide;
(d) linking the two strands of the double stranded polynucleotide at or near one end of the
double stranded polynucleotide using a bridging moiety;
(e) separating the two strands of the double stranded polynucleotide to provide a further
single stranded polynucleotide comprising the original single stranded polynucleotide
linked to the complement by the bridging moiety;
(f) moving the complement through a transmembrane pore such that a proportion of the
nucleotides in the complement interact with the pore; and
(g) measuring the current passing through the pore during each interaction and thereby
determining the sequence of the target polynucleotide,
wherein the separating in step (e) comprises contacting the construct with a polynucleotide
binding protein which separates the two strands of the target polynucleotide.
32. An apparatus for sequencing a double stranded target polynucleotide, comprising: (a) a
membrane; (b) a plurality of transmembrane pores in the membrane; (c) a plurality of
polynucleotide binding proteins which are capable of separating the two strands of the target
polynucleotide; and (d) instructions for carrying out the method of claim 1.
33. An apparatus for sequencing a double stranded target polynucleotide, comprising: (a) a
membrane; (b) a plurality of transmembrane pores in the membrane; and (c) a plurality of
polynucleotide binding proteins which are capable of separating the two strands of the target
polynucleotide, wherein the apparatus is set up to carry out the method of claim 1.
34. An apparatus according to claim 32 or 33, wherein the apparatus comprises:
a sensor device that is capable of supporting the membrane and plurality of pores and
being operable to perform polynucleotide sequencing using the pores and proteins;
at least one reservoir for holding material for performing the sequencing;
a fluidics system configured to controllably supply material from the at least one
reservoir to the sensor device; and
a plurality of containers for receiving respective samples, the fluidics system being configured to
supply the samples selectively from the containers to the sensor device.
35. A method according to any one of claims 1to 21, further comprising optionally after step
(b) and before step (c):
(m)synthesising a complement of the single stranded polynucleotide, such that the single
stranded polynucleotide and complement form a double stranded polynucleotide;
(n) linking the two strands of the double stranded polynucleotide at or near one end of the
double stranded polynucleotide using a bridging moiety;
(o) separating the two strands of the double stranded polynucleotide to provide a further
single stranded polynucleotide comprising the original single stranded polynucleotide
linked to the complement by the bridging moiety.

Documents

Application Documents

#	Name	Date
1	221-DELNP-2014.pdf	2014-01-20
2	221-delnp-2014-GPA-(06-03-2014).pdf	2014-03-06
3	221-delnp-2014-Correspondence-Others-(06-03-2014).pdf	2014-03-06
4	221-delnp-2014-Form-5.pdf	2014-06-03
5	221-delnp-2014-Form-3.pdf	2014-06-03
6	221-delnp-2014-Form-2.pdf	2014-06-03
7	221-delnp-2014-Form-1.pdf	2014-06-03
8	221-delnp-2014-Correspondence-others.pdf	2014-06-03
9	221-delnp-2014-Claims.pdf	2014-06-03
10	221-delnp-2014-Form-3-(13-06-2014).pdf	2014-06-13
11	221-delnp-2014-Correspondence Others-(13-06-2014).pdf	2014-06-13
12	Revised claims.pdf	2015-07-27
13	Marked up claims.pdf	2015-07-27
14	Letter dated 24.07.2015.pdf	2015-07-27
15	Form-13.pdf	2015-07-27
16	221-DELNP-2014-FER.pdf	2018-12-19
17	221-DELNP-2014-SEQUENCE LISTING [06-06-2019(online)].txt	2019-06-06
18	221-DELNP-2014-OTHERS [06-06-2019(online)].pdf	2019-06-06
19	221-DELNP-2014-FORM 3 [06-06-2019(online)].pdf	2019-06-06
20	221-DELNP-2014-FER_SER_REPLY [06-06-2019(online)].pdf	2019-06-06
21	221-DELNP-2014-DRAWING [06-06-2019(online)].pdf	2019-06-06
22	221-DELNP-2014-CORRESPONDENCE [06-06-2019(online)].pdf	2019-06-06
23	221-DELNP-2014-COMPLETE SPECIFICATION [06-06-2019(online)].pdf	2019-06-06
24	221-DELNP-2014-CLAIMS [06-06-2019(online)].pdf	2019-06-06
25	221-DELNP-2014-ABSTRACT [06-06-2019(online)].pdf	2019-06-06
26	221-DELNP-2014-RELEVANT DOCUMENTS [07-06-2019(online)].pdf	2019-06-07
27	221-DELNP-2014-PETITION UNDER RULE 137 [07-06-2019(online)].pdf	2019-06-07
28	221-DELNP-2014-HearingNoticeLetter-(DateOfHearing-01-11-2019).pdf	2019-10-11
29	221-DELNP-2014-Written submissions and relevant documents (MANDATORY) [12-11-2019(online)].pdf	2019-11-12
30	221-DELNP-2014-RELEVANT DOCUMENTS [12-11-2019(online)].pdf	2019-11-12
31	221-DELNP-2014-PETITION UNDER RULE 137 [12-11-2019(online)].pdf	2019-11-12
32	221-DELNP-2014-PatentCertificate15-11-2019.pdf	2019-11-15
33	221-DELNP-2014-IntimationOfGrant15-11-2019.pdf	2019-11-15
34	221-DELNP-2014-RELEVANT DOCUMENTS [17-03-2020(online)].pdf	2020-03-17
35	221-DELNP-2014-RELEVANT DOCUMENTS [27-09-2021(online)].pdf	2021-09-27
36	221-DELNP-2014-RELEVANT DOCUMENTS [28-09-2022(online)].pdf	2022-09-28
37	221-DELNP-2014-RELEVANT DOCUMENTS [28-09-2022(online)]-1.pdf	2022-09-28
38	221-DELNP-2014-RELEVANT DOCUMENTS [23-12-2022(online)].pdf	2022-12-23
39	221-DELNP-2014-RELEVANT DOCUMENTS [08-09-2023(online)].pdf	2023-09-08

Search Strategy

1	searchstrategy_17-12-2018.pdf

ERegister / Renewals

3rd: 13 Jan 2020

From 25/07/2014 - To 25/07/2015

4th: 13 Jan 2020

From 25/07/2015 - To 25/07/2016

5th: 13 Jan 2020

From 25/07/2016 - To 25/07/2017

6th: 13 Jan 2020

From 25/07/2017 - To 25/07/2018

7th: 13 Jan 2020

From 25/07/2018 - To 25/07/2019

8th: 13 Jan 2020

From 25/07/2019 - To 25/07/2020

9th: 20 Jul 2020

From 25/07/2020 - To 25/07/2021

10th: 14 Jul 2021

From 25/07/2021 - To 25/07/2022

11th: 13 Jul 2022

From 25/07/2022 - To 25/07/2023

12th: 21 Jul 2023

From 25/07/2023 - To 25/07/2024

13th: 19 Jul 2024

From 25/07/2024 - To 25/07/2025

14th: 17 Jul 2025

From 25/07/2025 - To 25/07/2026