I'll add more to this file when I have time, but the following are some 
important tips.

- Since the program is fast, the best strategy when new trimming situations
come up is trial and error.  It does not take much time to run for the whole
fastq file. For instant response, you can select a subset of your input 
sequences:

> zcat fastq.gz | head -1000 > test.fq

or to get a randomly selected subset:

> zcat fastq.gz | sort -R | head -1000 > test.fq

and use test.fq for testing.

- Pay attention to the lengths of the adaptors. The longer the adaptors, 
the bigger number of mismatches can be allowed.  These values are controlled
by the "-u" and "-v" options.  It does not make sense if the adaptors are
of length 6 and 4 mismatches are allowed.

- Pay attention also to the locations of the adaptors in your reads.  
The options "-b" and "-e" can be used to trim adaptors in different adaptor 
locations.

- The cutoff threshold of average quality score inside the moving window 
(option "-a") and the size of the window (option "-w") should also be adjusted
to meet your needs.

- The program can search degenerate patterns or wildcard letters.
* Use [] to include degenerate letters in the "pattern" file specified by 
	option "-p".  For example, AT[CG]TAC will match either C or G in
	the third position.
* Use "." as the wildcard that can match anything.
* Use "^" to negate the letter following it. For example, AT^TGTAC will 
	match anything that is not a T in the third position.
* The degenerate letters can appear multiple times in the patterns, such as
	AT[CG]T[AT]C.

- Due to the parallel nature of the algorithm, the "regular expression" kind
of search mentioned above does not incur any extra computational cost: 
the search time is the same as the plain patterns such as ATCGTAC.

- We can take advantage of these "regular expression" search in real situations.
One example is the barcode trimming and assignment for Illumina sequences.  
Usually in many Illumina reads the first base is a "N".  In this case the 
regular expression search can be used.  For example, if you have four 6-bp barcodes
CGGAAT, CGTGGC, TGCGTA, and TTCTGG, you can set up the pattern file as 
(assuming the barcodes are only in the 5'-end):

[CN]GGAAT	ZZZZZZZ
[NC]GTGGC	ZZZZZZZ
[NT]GCGTA	ZZZZZZZ
[NT]TCTGG	ZZZZZZZ

This "regular expression" search method is better than the plain pattern search.
For example, if you ignore the first base and use only the remaining 5 bases,
you will end up with a pattern file like this:

GGAAT	ZZZZZZZ
GTGGC	ZZZZZZZ
GCGTA	ZZZZZZZ
TCTGG	ZZZZZZZ
 
The reason why the 6-base search is better is that a 6-base pattern will have higher 
specificity than a 5-base pattern. Of course for this kind of short patterns you should 
also use the "-b" and "-e" options discussed above to restrict the range of
barcode locations, and don't forget to adjust the "-u" and "-v" values.

- For paired-end sequences, refer to "readme.paired_end" in this web site.

- For a specific example to trim Illumina AGATCGGAAGAGC adapter, use
	
Btrim -w 10 -a 25 -p illumina_adapter.txt -3 -P -o output.fastq -l 40 -t <(gunzip -c path_to_your_fastq/*.gz ) -C -z

The "pattern" file "illumina_adapter.txt" can be downloaded in this same web
site.  It also acts as an example for the new 6-column "pattern" file 
introduced in version 0.3.0. This new feature provides finer control of maximum errors allowed in both 5'-end and 3'-end, and control of the regions where the adapter is expected in the sequence (for both 5'-end adapter and 3'-end adapter). These controls are specified by extra columns in the "pattern" file given by the "-p" option.

The details are: now the "pattern" file accepts 6 columns for each 
line (tab-delimited) in the following format:

5'-adapter 3'-adapter 5'-max-err 3'-max-err 5'-pos 3'-pos

For 3'-pos, if negative, it indicates the counts are from the end of the sequence.  For example, 5'-pos=20 means the search for the 5'-adapter will only be carried out in 0-20 base region, while 3'-pos=-10 means the search of 3'-adapter will be in sequence_length-10 to the end of the sequence.

This setting can routinely achieve 97-99% mapping rate.