Software

  • btrim

    A fast and accurate adapter, barcodes, and low-quality region trimming and binning program written in C for next-generating sequencing reads. The search algorithm is based on Eugene Myers' fast bit-vector algorithm.

    Reference:
    • Yong Kong (2011) Btrim: A fast, lightweight adapter and quality trimming program for next-generation sequencing technologies, Genomics, 98, 152-153. [doi] [pubmed] [Elsevier] [arXiv]

  • Genotyping using chromatograms

    Low sequencing quality inevitably leads to errors in base calling, which in turn results in wrong genotypes. In addition, for heterozygous alleles, which are co-amplified and sequenced in the same Sanger sequencing reaction, the sequences often contain ambiguous bases, which usually have higher error rate in the base calling stage. The ultimate source of resolution is the chromatograms from which the sequences are called. To manually read chromatograms, especially chromatograms of heterozygous sequences, is laborious and error-prone.

    I developed a program that automatically does genotyping using chromatograms directly. The program is highly accurate. An online version of the program is here . The program needs two files:

    • a text file that contains the names of the genotypes and the corresponding sequences,
    • and the chromatogram file.
    The program searches every sequences in the first file against the chromatogram file to find the best match.

    The algorithm itself has not been published. It was used in the following publications:

    Reference:
    • Natalie R Powers, John D Eicher, Falk Butter, Yong Kong, Laura L Miller, Susan M Ring, Matthias Mann, Jeffrey R Gruen (2013) Alleles of a polymorphic ETV6 binding site in DCDC2 confer risk of reading and language impairment, The American Journal of Human Genetics, 93, 19-28. [doi] [pubmed] [Cell]
    • Natalie R Powers, John D Eicher, Laura L Miller, Yong Kong, Shelley D Smith, Bruce F Pennington, Erik G Willcutt, Richard K Olson, Susan M Ring, Jeffrey R Gruen (2016) The regulatory element READ1 epistatically influences reading and language, with both deleterious and protective alleles, Journal of Medical Genetics, 53, 163-171. [doi]

  • Convert gene symbols to ensembl IDs

    Online program to convert gene symbols to ensembl IDs

  • Gene symbols to synonyms and aliases

    Online program to find synonyms and aliases for a list of gene symbols

  • Pattern search in multiple fasta files with specified error limit

    A program written in C programming language to search patterns in multiple fasta files with specified maximum errors (edit distances).

  • Maple code for Type III runs

    Reference:

    • Yong Kong (2015) Number of appearances of events in random sequences: a new generating function approach to Type II and Type III runs, Annals of the Institute of Statistical Mathematics, 69 489-495. [doi] [Springer] [Maple code]

  • Distributions of positive signals in pyrosequencing

    Reference:

  • Length distribution of sequencing by synthesis: fixed flow cycle model

    Reference:

  • Calculating complexity of large randomized libraries

    Reference:


Publications

Publications from independent_research

Publications from collaborative_research

All publications



Last updated: 2017/7/20