Docop pipelines explained
Pipelines are defined as YAML formatted files. They specify an ordered sequence of tasks to run.
How to review available pipelines
Use the pipes command to list them:
Presuming the earlier 'mypipe' pipeline existed, the command would output:
How to create a task pipeline
Create a YAML file with a descriptive name, add a comment line to describe it and list the tasks in an ordered sequence:
How to run a pipeline
Just use the run command. Docproc will automatically find the given pipeline and run it. No need to give a path to the pipeline definition file or include the .yaml suffix.
Using the --help option gives more details:
Usage: docop run [OPTIONS] TASKNAME or PIPENAME [EXTRAS]...
Run a task or pipeline.
Options:
-s, --source TEXT Sources that will be fetched and stored as documents.
-c, --content PATH Stored documents to process.
-t, --target TEXT Targets to export document content to
-a, --account TEXT Account to use (source or target)
--help Show this message and exit.
Docproc will provide ample status information when it runs the pipeline.
How Docop runs task pipelines
The following diagram describes how docop loops over sources, content and targets and runs a pipe of tasks to fetch, process and export content.
graph LR
S((Start)) --> QS{Sources\nfetched?};
QS -- No --> RT(⚡ Run 1st task\nto fetch);
RT --> QS;
QS -- Yes --> QL{Next\ntask\nlast?};
QL -- No --> RP(⚡ Run next task\nto process);
RP --> QL;
QL -- Yes --> RE(⚡ Run last task\nto export);
RE --> QE{Docs\nexported?};
QE -- Yes --> E((End));
QE -- No --> RE;
To recap:
-
When a task runs, it is provided a set of execution context variables
-
The first task in a pipe should check the sources and fetch them
-
The next tasks should process the fetched content
-
The last task should export content to targets
-
Each task can process one or more source, collection, document or target per run