bed.words¶
plot nucleotide words frequencies over the length of intervals
description¶
bed.words plots the frequency of nucleotide words along intervals aligned by their 5’ end or their midpoint. Currently only plots dinucleotide frequencies.
usage¶
bed.words ( bedfiles, genomefa , pdfname , wordlengths = 2 , numbases=50, sizerange=c(1,Inf), numfrags=NULL, reference="center", symmetric=TRUE , cores="max", lspan=0, slop=0 , strand=FALSE )
arguments¶
Main options | Description |
---|---|
bedfiles | a vector of bed file names from which to draw nucleotide word frequencies. |
genomefa | path to a whole-genome fasta from which to obtain sequences from intervals. chromosome names in fasta and bed must match. |
pdfname | name of output pdf. must contain .pdf extension |
numbases | number of bases to examine in the 3’ direction from the 5’ end of the extracted sequences. Must be an even number. |
sizerange | numeric vector of length 2 specifying the minimum and maximum sizes of intervals to examine. Default is c(1,Inf), which causes all fragment sizes to be examined. |
numfrags | positive integer specifying the number of intervals to sample to calculate nucleotide word frequencies. Default is NULL, which causes all of intervals within constraints of ‘sizerange’ to be examined. |
reference | string defining if intervals should be aligned by their “center” or by their 5’ “end”. If “center”, intervals will be aligned by their center and numbases/2 bp on each side of the midpoint will be examined. |
strand | boolean indicating if strand should be taken into account if available in bed file. Default is FALSE. |
slop | integer specifying how many bases to shift the 5’ base to the 5’ (positive integer) or 3’ (negative integer) direction before extracting sequences. useful to examine regions flanking (positive) intervals in addition to the interval itself. |
lspan | loess span to use when smoothing nucleotide word frequency plots. Default is 0 (no smoothing). |
symmetric | boolean value inidicating if the plots shoudl be symmetrized by averaging each side. Default is TRUE. |
cores | positive integer specifying the number of files to examine simultaneously. |
output¶
bed.words will output a pdf of nucleotide word plots.
examples¶
generate dinucleotide frequency plots of 70 bp on each side of the midpoint of 140-150 bp intervals from two files¶
make a list of bed file names
> beds <- c( "untreated-MNaseSeqFragments.bed" , "treated-MNaseSeqFragments.bed" )
generate dinucleotide frequency plots
> bed.words ( beds , "/path/to/genome.fa" , "output.pdf" , numbases = 140 , sizerange = c(140,150) , reference = "center" )