Codebase containing all the implementations for the Udemy course on Build Text Mining Applications with Live-Coding in Python
Follow the instructions to create a conda environment here
conda activate YOUR_ENV_NAME Install all the dependencies using the following command
pip install -r requirements.txt The modules are named by section numbers so that you can access relevant code for each section.
- Introduction: You will get a general introduction to the course structure and teaching style of the course.
- Unstructured Data: You will learn about motivational examples of the power of unstructured data and challenges in processing it.
- Python Programming Primer: You will learn basic programming constructs you need to follow along the course. You can use this section to understand the basics preparing yourself to learn advanced Python to write production quality code.
- Text Mining Basics: You will learn the basics of text processing, document representation using vector space model, and ranking documents for a given query. You will learn to implement these algorithms in Python.
- Build a Search Engine: You will build your own search engine using all the implementation you did in the previous section. Your search engine will be wrapped as a data service for potential deployment as a product. You will also have the option of adding a user search interface to your search engine!
- Deploy your Text Mining Application: You will go from a student skillful in text mining to a professional with skills to build real-world applications and services using text mining skills you have picked up in this course.
- Conclusion: You will gain awareness on some advanced topics and a bonus section on potential products you can build with your text mining skills!
- Build a Text Summarization Tool: You will learn basic text summarization techniques that are crucial to explore large document collection and implement code to compute N-gram and tag-cloud generation in Python. You will also have the option of adding an interactive tag-cloud so that you can quickly drill down to the documents underlying the tag-cloud just by clicking on the tags.