We have compared Affymetrix and Bioconductor annotations for the MOE430A (mouse) GeneChip® array. The mappings of probe sets to LocusLink identifiers (LocusIDs) were found to be dynamic, with many changes between successive releases of annotation for both Affymetrix and Bioconductor. There are 49 probe sets that are assigned to one LocusID by Affymetrix and to a different LocusID by Bioconductor from mid-2004 onwards. For virtually all of these examples, the Affymetrix annotation was found to be the one that is in agreement with the current gene prediction.
Reference sequence (RefSeq) identifiers are considered to be the gold standard of annotations. However, we could not use these identifiers to discriminate between the accuracy of Bioconductor and Affymetrix because not all of the probes map to the RefSeq transcript to which the probe set is assigned. Moreover, in some cases, probes align to regions downstream of the 3′ end of a RefSeq transcript.
Adjacent genes were found to be a major cause of discrepancies between the Bioconductor and Affymetrix assignments. Case studies of several probe sets indicated that incorrect assignments are caused by the UniGene cluster assignments of expressed sequence tags representing the probe sets, and by errors in GenBank® sequences.
Our results indicate that there are a number of errors remaining in the annotation sources used by the microarray community.