Q.22 While searching a database for similar sequences, E value does NOT depend on the (A) sequence length (B) number of sequences in the database (C) scoring system (D) probability from a normal distribution

Q.22 While searching a database for similar sequences, E value does NOT depend on the
(A) sequence length (B) number of sequences in the database
(C) scoring system (D) probability from a normal distribution

The E-value in sequence database searches measures the expected number of chance matches with a similar score. The correct answer is (D), as it does not depend on probability from a normal distribution.

Correct Answer

D) probability from a normal distribution.
E-value follows an extreme value distribution (Gumbel), not normal, calculated as E=K⋅m⋅n⋅e−λS, where it adjusts for statistical significance beyond random alignments.

Option Explanations

A) Sequence Length

Query (m) and database/subject (n) lengths directly scale the search space in the E-value formula.
Longer sequences increase opportunities for chance hits, raising E-values for the same score.

B) Number of Sequences in the Database

Database size contributes to total effective length (n), expanding the search space.
Larger databases yield higher E-values for equivalent alignments due to more potential random matches.

C) Scoring System

Scoring matrix (e.g., BLOSUM62) and gap penalties determine raw score S, influencing λ and K parameters.
These directly affect normalized bit scores and thus E-value computation.

Bioinformatics Applications

E-values guide homology detection in tools like BLAST, with thresholds like 1e-5 indicating significance for genomics and protein studies.
Biotechnologists use low E-values to identify orthologs in microbial engineering or gene function prediction.

Leave a Reply

Your email address will not be published. Required fields are marked *

Latest Courses