23. The transcription factor X binds a 10 base pair DNA stretch. In the DNA of an organism, X was found to bind at 20 distinct sites. An analysis of these 20 binding sites showed the following distribution:
| Base | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| A | 11 | 0 | 0 | 0 | 16 | 2 | 4 | 0 | 4 | 3 |
| T | 3 | 0 | 19 | 0 | 0 | 1 | 3 | 20 | 2 | 4 |
| G | 4 | 20 | 0 | 0 | 2 | 4 | 6 | 0 | 12 | 2 |
| C | 2 | 0 | 1 | 20 | 1 | 11 | 6 | 0 | 2 | 11 |
What is the consensus sequence for the binding site of X?
- (a) NGTCNNNTNN
- (b) AGTCACNTGC
- (c) CACCTANCTG
- (d) ANNNAACGNGC
Introduction
Consensus sequences are widely used in bioinformatics and molecular biology to summarize conserved DNA motifs recognized by proteins such as transcription factors. This solved example explains how to extract a consensus sequence from a binding site table, interpret base frequencies, and analyze multiple-choice answers to select the correct DNA motif.
Question Recap
A transcription factor X binds a 10-bp DNA stretch.
Twenty binding sites were observed, and base frequencies at each position are given.We identify the most frequent nucleotide at every position to derive the consensus.
Step-by-Step Consensus Derivation
Position A T G C Consensus 1 11 3 4 2 A 2 0 0 20 0 G 3 2 19 0 1 T 4 0 0 0 20 C 5 16 1 2 1 A 6 2 3 4 11 C 7 4 4 6 6 (tie) Ambiguous (N or M) 8 0 20 0 0 T 9 4 2 12 2 G 10 3 4 2 11 C At position 7, no base decisively dominates
(C=6, G=6, A=4, T=4).
That leads to ambiguity.
Most exams assign N where no nucleotide clearly dominates.Final Consensus
AGTCA C N T G C
Correct Answer
✔ (b) AGTCACNTGC
Option-by-Option Analysis
(a) NGTCNNNTNN
❌ Incorrect
-
First base should be A, not N
-
Too many Ns inserted—data show clear single-base majorities at most positions
(b) AGTCACNTGC
✔ Correct
-
Matches the majority base at 9 positions
-
Uses N at position 7 where no base strongly dominates
-
Exactly reflects the binding frequency table
(c) CACCTANCTG
❌ Incorrect
-
Position 1 starts with C instead of A
-
Several other mismatches show it doesn’t follow the frequency distribution
(d) ANNNACGNGC
❌ Incorrect
-
Replaces many clearly dominant positions with N
-
Data support strong consensus at most positions; only one should be ambiguous
Conclusion
Interpreting nucleotide frequency tables is a fundamental bioinformatics skill. By selecting the most frequent base at each position and using N only for ambiguous positions, we derive the correct consensus sequence:
⭐ AGTCACNTGC
This approach helps identify regulatory motifs in DNA, predict transcription factor binding sites, and analyze genetic control networks.
-


