This document discusses open-source software tools for generating and analyzing large materials data sets developed by Anubhav Jain and collaborators. It summarizes several software packages including pymatgen for materials analysis, FireWorks for scientific workflows, custodian for error recovery in calculations, and matminer for data mining. Applications of the tools include generating the Materials Project database containing properties of over 65,000 materials compounds calculated using high-performance computing resources. The document emphasizes the importance of open-source collaborative software development and automation to accelerate materials discovery.