Btrim is a fast and lightweight tool to trim adapters and low quality regions in reads from ultra high-throughput next-generation sequencing machines. It was used in the largest genetic sequencing study of human diseases (Nature, 2013, 498, 232-235: "Negligible impact of rare autoimmune-locus coding-region variants on missing heritability"). Two Linux executables can be downloaded: Btrim32 for 32-bit machine, and Btrim64 for 64-bit machine. If the program does not run on your Linux machine, try btrim32-static or btrim64-static. If you use these programs in your research, please cite: Kong, Y (2011) Btrim: A fast, lightweight adapter and quality trimming program for next-generation sequencing technologies, Genomics, 98, 152-153. http://dx.doi.org/10.1016/j.ygeno.2011.05.009 Some major changes were introduced in version 0.3.0. See details in "changelog". Some examples are listed below. More instructions and examples can be found in the file "howto", including a better way to handle barcodes and the "N" letter in Illumina sequences, and how to trim Illumina AGATCGGAAGAGC adapter. Options (the same will be printed if the program is run without any options): btrim: -q -p -t -o [-u 5'-error -v 3'-error -l minlen -b <5'-cut> -e <3'-cut> -w -a -f <5'-trim> -I] Required for pattern trimming: -p each line contains one pair of 5'- and 3'-adaptors; ignored if -q in effect -t fastq file to be trimmed -o fastq file of trimmed sequences Required for quality trimming (-q in effect): -t fastq file to be trimmed -o fastq file of trimmed sequences Optional: -q toggle to quality trimming [default=adaptor trimming] -3 3'-adaptor trimming only [default=off] -P pass if no adaptor is found [default=off] -Q do a quality trimming even if adaptor is found [default=off] -s detailed trimming info for each sequence -u <5'-error> maximum number of errors in 5'-adaptor [default=3] -v <3'-error> maximum number of errors in 3'-adaptor [default=4] -l minimal insert size [default=25] -b <5'-range> the length of sequence to look for 5'-adaptor at the beginning of the sequence [default=1.3 x adaptor length] -e <3'-range> the starting position to look for 3'-adaptor at the end of the sequence [default: the 5'-trimming point] -w size of moving window for quality trimming [default=5] -a cutoff for average quality scores within the moving window for quality trimming [default=15] -f <5'-trim> number of bases to be trimmed at 5'-end [default=0] -I toggle to case sensitive search [default=case insensitive] -c toggle to check fastq file [default=no check] -i toggle to fastq format with phred_offset=64 [default=phred_offset=33] -B barcode assignment -k keep all reads in the same output file specified by -o, even for failed reads (if -B is used, the reads are put in a file named "failed_reads_pid.fastq" (where pid is the process id) in the current directory; -K can be used for another file name) [default=no] -K keep failed reads in a separate output file (overwrites -k) -z compress the output file [default=no] -Z zip command and options (put the entire command and option within quotes) [default="/bin/gzip -f"] -T 3'-end search first, then the best match's 5'-adaptor is used for 5'-end search [default=5'-end search first] -C when this option is used, don't check input sequence file and assume it is not zipped; useful if named pipe is used, such as -t <(gunzip -c *.gz) [default: check zipped or unzipped automatically] Examples: (1) Btrim -w 10 -a 25 -p illumina_adapter.txt -3 -P -o output.fastq -l 40\ -t <(gunzip -c path_to_your_fastq/*.gz ) -C -z Use the "pattern" file in this same web site to trim Illumina AGATCGGAAGAGC adapter. This setting can routinely achieve 97-99% mapping rate. (2) Btrim64 -t input_sequence.txt -p adapters.txt -o output.txt Trim FASTQ file "input_sequence.txt" using the adapters in "adapters.txt", write the output in "output.txt". Each line in "adapters.txt" contains two tab-delimited columns: the first is 5'-adapter, the second 3'-adapter. (3) Btrim64 -p adapters.txt -t s_1_sequence.txt -o s_1.out -s s_1.sum -P -3 -Q -v 1 Trim FASTQ file "s_1_sequence.txt" using the adapters in "adapters.txt", write the output in "s_1.out" and detailed trimming information in "s_1.sum". Only 3'-adaptor will be used (the 5'-adapter in "adapters.txt" are ignored), and the maximum number of errors in 3'-adaptor is set as 1. No matter whether the adapters are found or not, the read is passed to quality trimming (-P -Q). Yong Kong Yale University Contact: yong.kong@yale.edu