• btrim

    A fast and accurate adapter, barcodes, and low-quality region trimming and binning program written in C for next-generating sequencing reads. The search algorithm is based on Eugene Myers' fast bit-vector algorithm.

    • Yong Kong (2011) Btrim: A fast, lightweight adapter and quality trimming program for next-generation sequencing technologies, Genomics, 98, 152-153. [doi] [pubmed] [Elsevier] [arXiv]

  • Genotyping using chromatograms

    Low sequencing quality inevitably leads to errors in base calling, which in turn results in wrong genotypes. In addition, for heterozygous alleles, which are co-amplified and sequenced in the same Sanger sequencing reaction, the sequences often contain ambiguous bases, which usually have higher error rate in the base calling stage. The ultimate source of resolution is the chromatograms from which the sequences are called. To manually read chromatograms, especially chromatograms of heterozygous sequences, is laborious and error-prone.

    I developed a program that automatically does genotyping using chromatograms directly. The program is highly accurate. An online version of the program is here . The program needs two files:

    • a text file that contains the names of the genotypes and the corresponding sequences,
    • and the chromatogram file.
    The program searches every sequences in the first file against the chromatogram file to find the best match.

    The algorithm itself has not been published. It was used in the following publications:

    • Natalie R Powers, John D Eicher, Falk Butter, Yong Kong, Laura L Miller, Susan M Ring, Matthias Mann, Jeffrey R Gruen (2013) Alleles of a polymorphic ETV6 binding site in DCDC2 confer risk of reading and language impairment, The American Journal of Human Genetics, 93, 19-28. [doi] [pubmed] [Cell]
    • Natalie R Powers, John D Eicher, Laura L Miller, Yong Kong, Shelley D Smith, Bruce F Pennington, Erik G Willcutt, Richard K Olson, Susan M Ring, Jeffrey R Gruen (2016) The regulatory element READ1 epistatically influences reading and language, with both deleterious and protective alleles, Journal of Medical Genetics, 53, 163-171. [doi]

  • Convert gene symbols to ensembl IDs

    Online program to convert gene symbols to ensembl IDs

  • Maple code for Type III runs


    • Yong Kong (2015) Number of appearances of events in random sequences: a new generating function approach to Type II and Type III runs, Annals of the Institute of Statistical Mathematics, 69 489-495. [doi] [Springer] [Maple code]

  • Distributions of positive signals in pyrosequencing


  • Length distribution of sequencing by synthesis: fixed flow cycle model


  • Calculating complexity of large randomized libraries



Publications from independent_research

Publications from collaborative_research

All publications

Last updated: 2017/7/20