1. maf_tile(1)
  2. BioRuby Manual
  3. maf_tile(1)

NAME

maf_tile - synthesize an alignment for a given region

SYNOPSIS

maf_tile [options] -i [SEQ:]BEGIN:END [-s SPECIES[:NAME] ...] maf [index]

maf_tile [options] --bed BED -o BASE [-s SPECIES[:NAME] ...] maf [index]

DESCRIPTION

maf_tile takes a MAF file, with optional index, or directory of indexed MAF files, extracts alignment blocks overlapping the given genomic interval, and constructs a single alignment block covering the entire interval for the specified species. Optionally, any gaps in coverage of the MAF file's reference sequence can be filled in from a FASTA sequence file.

If a single interval is specified, the output will be written to stdout in FASTA format. If a directory of MAF files is supplied as the maf parameter, the interval must include the sequence identifier in the form sequence:begin:end. If the --output-base option is specified, _<begin>:<end>.fa will be appended to the given parameter and used to construct the output path. If a BED file is specified with --bed, --output-base is also required.

Species can be renamed for output by specifying them as SPECIES:NAME; the first component will be used to select the species from the MAF file, and the second will be used in the FASTA description line for output.

OPTIONS

-r, --reference SEQ

The FASTA reference sequence file given, which may be gzipped, will be used to fill in any gaps between alignment blocks.

-i, --interval [CHR:]BEGIN-END

The given zero-based genomic interval will be used to select alignment blocks from the MAF file. If the chromosome is not specified, it will be taken from the first species specified with --species or --species-file.

-s, --species SPECIES[:NAME]

The given species will be selected for output. If given as species:name, it will appear in the FASTA output as name.

--species-file FILE

Species to select, and optional mapping names, will be read from the given file, one species per line. If the species name is followed by whitespace and an additional name, this will be taken as the output name. Lines beginning with # will be ignored.

-b, --bed BED

The given BED file will be used to provide a list of intervals to process. If present, --interval will be ignored and --output-base must be given as well.

--bed-species SPECIES

The given species name will be prepended to the chromosome name indicated in the BED file, separated by a period. This is necessary if the BED file simply indicates chr12, but the sequence identifiers in the MAF file are e.g. hg19.chr12.

--concat

The alignments specified in the BED file will be individually tiled and concatenated.

-o, --output-base BASE

The given path will be used as the base name for output files, as described above.

--fill-char C

Gaps where no aligning sequence data exists will be filled with the given character instead of *.

--upcase

All sequence data will be folded to upper case.

-q, --quiet

Run quietly, with warnings suppressed.

-v, --verbose

Run verbosely, with additional informational messages.

--debug

Log debugging information.

EXAMPLES

Generate an alignment of the hg19, petMar1, and ornAna1 sequences from chrY.maf over the interval 14400 to 15000 on the reference sequence of the MAF file. Fills in gaps from chrY.refseq.fa.gz. Writes FASTA output to stdout.

$ maf_tile --reference ~/maf/chrY.refseq.fa.gz \
  --interval 14400:15000 \
  -s hg19:human -s petMar1 -s ornAna1 \
  chrY.maf chrY.kct
>human
GGGTGACGAAAAGAGCCGA-----[...]
>petMar1
gagtgccggggagtgccggggagt[...]
>ornAna1
AGGGATCTGGGAATTCTGG-----[...]

Write out a FASTA file for each interval in the given BED file, prefixed with /tmp/mm8, and without filling in data from a reference sequence:

$ maf_tile --bed /tmp/mm8.bed --output-base /tmp/mm8 \
  -s mm8:mouse -s rn4:rat -s hg18:human \
  mm8_chr7_tiny.maf mm8_chr7_tiny.kct

FILES

The output is generated in FASTA format, with one sequence per species.

The maf parameter must specify either a Multiple Alignment Format (MAF) file or a directory of such files, with indexes.

MAF files can optionally be BGZF-compressed, as produced by bgzip(1) from samtools.

The index must be a MAF index built with maf_index(1). This parameter is ignored if the maf parameter is a directory. It can be omitted if a single MAF file is given, but in this case the entire file will be parsed to build a temporary index. For large files which will be reused, this is not advisable.

If --bed bed is specified, its argument must be a BED file. Only the second and third columns will be used, to specify the zero-based start and end positions of intervals.

ENVIRONMENT

maf_tile is a Ruby program and relies on ordinary Ruby environment variables.

maf_tile is copyright (C) 2012 Clayton Wheeler.

SEE ALSO

maf_index(1), ruby(1), bgzip(1)

  1. BioRuby
  2. August 2012
  3. maf_tile(1)