bed.words¶

plot nucleotide words frequencies over the length of intervals

description¶

bed.words plots the frequency of nucleotide words along intervals aligned by their 5’ end or their midpoint. Currently only plots dinucleotide frequencies.

usage¶

bed.words ( bedfiles, genomefa , pdfname , wordlengths = 2 , numbases=50, sizerange=c(1,Inf), numfrags=NULL, reference="center", symmetric=TRUE , cores="max", lspan=0, slop=0 , strand=FALSE )

arguments¶

Main options	Description
bedfiles	a vector of bed file names from which to draw nucleotide word frequencies.
genomefa	path to a whole-genome fasta from which to obtain sequences from intervals. chromosome names in fasta and bed must match.
pdfname	name of output pdf. must contain .pdf extension
numbases	number of bases to examine in the 3’ direction from the 5’ end of the extracted sequences. Must be an even number.
sizerange	numeric vector of length 2 specifying the minimum and maximum sizes of intervals to examine. Default is c(1,Inf), which causes all fragment sizes to be examined.
numfrags	positive integer specifying the number of intervals to sample to calculate nucleotide word frequencies. Default is NULL, which causes all of intervals within constraints of ‘sizerange’ to be examined.
reference	string defining if intervals should be aligned by their “center” or by their 5’ “end”. If “center”, intervals will be aligned by their center and numbases/2 bp on each side of the midpoint will be examined.
strand	boolean indicating if strand should be taken into account if available in bed file. Default is FALSE.
slop	integer specifying how many bases to shift the 5’ base to the 5’ (positive integer) or 3’ (negative integer) direction before extracting sequences. useful to examine regions flanking (positive) intervals in addition to the interval itself.
lspan	loess span to use when smoothing nucleotide word frequency plots. Default is 0 (no smoothing).
symmetric	boolean value inidicating if the plots shoudl be symmetrized by averaging each side. Default is TRUE.
cores	positive integer specifying the number of files to examine simultaneously.

output¶

bed.words will output a pdf of nucleotide word plots.

examples¶

generate dinucleotide frequency plots of 70 bp on each side of the midpoint of 140-150 bp intervals from two files¶

make a list of bed file names

> beds <- c( "untreated-MNaseSeqFragments.bed" , "treated-MNaseSeqFragments.bed" )

generate dinucleotide frequency plots

> bed.words ( beds , "/path/to/genome.fa" , "output.pdf" , numbases = 140 , sizerange = c(140,150) , reference = "center" )