Raw data (FASTA, GFF, and masterVar files) are contained in Raw Input Data. These data are the input to Tiling – Raw Data to FASTJ, which outputs FASTJ (a verbose tiling format) files. The description of FASTJ is in Tiling – Raw Data to FASTJ. These FASTJ files are the input to Tiling – FASTJ to tile library and pythonic tilings, which outputs pythonic tilings and a tile library. These are contained in this Project and are used as inputs to Blood Type Classifiers and Principal Component Analysis. Blood Type Classifiers is an example of supervised learning and uses ABO blood type phenotypes. Principal Component Analysis is an example of unsupervised learning and uses ethnicity phenotypes. These phenotypes and others for the PGP participants can be found in Harvard PGP Database Snapshot. Two Sub-projects are used for provenance (to enable users to rerun jobs): Log files and Docker images. VCF-based Precision Medicine contains CAVA annotations on the BRCA regions for the 1000 Genomes and Harvard PGP genomes and cross-references each annotation with ClinVar and ExAC. Specific documentation for each step is given within each project.