23. The transcription factor X binds a 10 base pair DNA stretch. In the DNA of an organism, X was found to bind at 20 distinct sites. An analysis of these 20 binding sites showed the following distribution: Base 1 2 3 4 5 6 7 8 9 10 A 11 0 0 0 16 2 4 0 4 3 T 3 0 19 0 0 1 3 20 2 4 G 4 20 0 0 2 4 6 0 12 2 C 2 0 1 20 1 11 6 0 2 11 What is the consensus sequence for the binding site of X? (a) NGTCNNNTNN (b) AGTCACNTGC (c) CACCTANCTG (d) ANNNAACGNGC

23. The transcription factor X binds a 10 base pair DNA stretch. In the DNA of an organism, X was found to bind at 20 distinct sites. An analysis of these 20 binding sites showed the following distribution:

Base 1 2 3 4 5 6 7 8 9 10
A 11 0 0 0 16 2 4 0 4 3
T 3 0 19 0 0 1 3 20 2 4
G 4 20 0 0 2 4 6 0 12 2
C 2 0 1 20 1 11 6 0 2 11

What is the consensus sequence for the binding site of X?

  • (a) NGTCNNNTNN
  • (b) AGTCACNTGC
  • (c) CACCTANCTG
  • (d) ANNNAACGNGC

    Introduction

    Consensus sequences are widely used in bioinformatics and molecular biology to summarize conserved DNA motifs recognized by proteins such as transcription factors. This solved example explains how to extract a consensus sequence from a binding site table, interpret base frequencies, and analyze multiple-choice answers to select the correct DNA motif.


    Question Recap

    A transcription factor X binds a 10-bp DNA stretch.
    Twenty binding sites were observed, and base frequencies at each position are given.

    We identify the most frequent nucleotide at every position to derive the consensus.


    Step-by-Step Consensus Derivation

    Position A T G C Consensus
    1 11 3 4 2 A
    2 0 0 20 0 G
    3 2 19 0 1 T
    4 0 0 0 20 C
    5 16 1 2 1 A
    6 2 3 4 11 C
    7 4 4 6 6 (tie) Ambiguous (N or M)
    8 0 20 0 0 T
    9 4 2 12 2 G
    10 3 4 2 11 C

    At position 7, no base decisively dominates
    (C=6, G=6, A=4, T=4).
    That leads to ambiguity.
    Most exams assign N where no nucleotide clearly dominates.

    Final Consensus

    AGTCA C N T G C


    Correct Answer

    (b) AGTCACNTGC


    Option-by-Option Analysis

    (a) NGTCNNNTNN

    ❌ Incorrect

    • First base should be A, not N

    • Too many Ns inserted—data show clear single-base majorities at most positions

    (b) AGTCACNTGC

    Correct

    • Matches the majority base at 9 positions

    • Uses N at position 7 where no base strongly dominates

    • Exactly reflects the binding frequency table

    (c) CACCTANCTG

    ❌ Incorrect

    • Position 1 starts with C instead of A

    • Several other mismatches show it doesn’t follow the frequency distribution

    (d) ANNNACGNGC

    ❌ Incorrect

    • Replaces many clearly dominant positions with N

    • Data support strong consensus at most positions; only one should be ambiguous


    Conclusion

    Interpreting nucleotide frequency tables is a fundamental bioinformatics skill. By selecting the most frequent base at each position and using N only for ambiguous positions, we derive the correct consensus sequence:

    AGTCACNTGC

    This approach helps identify regulatory motifs in DNA, predict transcription factor binding sites, and analyze genetic control networks.

Leave a Reply

Your email address will not be published. Required fields are marked *

Latest Courses