Q.22 While searching a database for similar sequences, E value does NOT depend on the
(A) sequence length (B) number of sequences in the database
(C) scoring system (D) probability from a normal distribution
The E-value in sequence database searches measures the expected number of chance matches with a similar score. The correct answer is (D), as it does not depend on probability from a normal distribution.
Correct Answer
D) probability from a normal distribution.
E-value follows an extreme value distribution (Gumbel), not normal, calculated as E=K⋅m⋅n⋅e−λS, where it adjusts for statistical significance beyond random alignments.
Option Explanations
A) Sequence Length
Query (m) and database/subject (n) lengths directly scale the search space in the E-value formula.
Longer sequences increase opportunities for chance hits, raising E-values for the same score.
B) Number of Sequences in the Database
Database size contributes to total effective length (n), expanding the search space.
Larger databases yield higher E-values for equivalent alignments due to more potential random matches.
C) Scoring System
Scoring matrix (e.g., BLOSUM62) and gap penalties determine raw score S, influencing λ and K parameters.
These directly affect normalized bit scores and thus E-value computation.
Bioinformatics Applications
E-values guide homology detection in tools like BLAST, with thresholds like 1e-5 indicating significance for genomics and protein studies.
Biotechnologists use low E-values to identify orthologs in microbial engineering or gene function prediction.