Q.43 The sequence of a 1 Mb long DNA is random. This DNA has all four bases
occurring in equal proportion. The number of nucleotides, on average, between two
successive EcoRI recognition site GAATTC is ___________.
Calculation Method
The EcoRI site GAATTC spans 6 specific bases. In random DNA where A, T, C, and G each occur with probability 1/4, the chance of any 6-base position matching this exact sequence is (1/4)6=1/4,096. Sites therefore appear once every 4,096 bases on average, making the gap between successive sites 4,096−1=4,095 nucleotides.
Step-by-Step Derivation
- Probability per base match: p=1/4=0.25.
- For 6 bases: p6=(0.25)6=0.00024414, or 1 in 4,096 positions.
- In a long sequence like 1 Mb (1,000,000 bp), expected sites ≈ 1,000,000/4,096≈244.
- Average spacing: Total length divided by number of intervals (244 sites create 243 gaps), yielding ≈4,095 bp between starts.
No options are provided in the query, but common distractors include 256 (for 4-base cutters like TaqI), 1,024 (5-base), or 1,024×4=4,096 (confusing site length with spacing).
Introduction to EcoRI Site Frequency
The EcoRI recognition site GAATTC is a classic 6-base palindrome in molecular biology, crucial for DNA restriction in cloning and genomics. In random DNA where all four bases occur equally, average nucleotides between two successive EcoRI sites follows geometric distribution principles, yielding precise spacing for exam problems. This topic appears frequently in IIT JAM Biotechnology and CSIR NET Life Sciences.
Detailed Probability Calculation
Each position has a 1/4 chance of matching G, A, A, T, T, or C. Thus, full site probability is (1/4)6=1/4096. For a 1 Mb DNA, sites occur every 4,096 bp, so nucleotides between successive sites = 4,095 (subtract site length from interval). This assumes no overlap, valid for rare 6-cutters.
Exam Relevance and Common Errors
In competitive exams, confuse with 4-cutters (every 256 bp, spacing 255) or forget to subtract 1, picking 4,096. Practice verifies: 1,000,000 bp / 244 sites ≈ 4,095 bp gaps. Double-stranded palindromic nature (CTTAA/G complement) doesn’t alter frequency in random sequence.