PIP Tutorial
John Conery
April 26, 2005
The pipeline interface program (PIP) is a workflow system for managing complex projects.
Although it was designed primarily for bioinformatics projects, the system should work well
for any project that manages a large number of applications that are invoked by Unix com
mand lines.
PIP uses arule based workflow paradigm. Each individual stepin the workflow is defined by
a rule that tells the system what inputs are required by the step, the application(s) to run to
executethestep,andtheoutputsproducedbythestep. Theinitialinputs,intermediatework
products,andfinaloutputsareallstoredinadatabase,andPIPusestimestampsondatabase
tables to automatically schedule the workflow steps.
This tutorial will explain how to create a PIP workflow by going through the steps in the
development of a project that downloads yeast chromosomes from NCBI and searches for
pairs of genes that may be recent tandem duplicates. The first section is a project overview.
RemainingsectionsshowhowtosetupthedatabaseconnectionandinitialPIPfile,andthen
how to add rules to the workflow in order to implement each step.
To do all the steps in the tutorial you will need to have Perl installed on your workstation,
along with the Bio::Perl library and CPAN modules for accessing a MySQL database and
downloadingfilesviaFTP(seetheSoftwareEnvironmentsectionfordetails). Itispossibleto
dothetutorialifyoudonothaveallthenecessaryPerlmodules;attheendofthedescription
of each step there are ...
Voir