Lars Vogel, (©) 2009 - 2025 vogella GmbH :revnumber: 3.0 :revdate: 28.09.2025

Learn Python programming with practical examples including web crawling and AsciiDoc validation. This comprehensive tutorial covers modern Python development using Python 3.12+ features. You’ll build real-world applications including a web crawler and a document validation tool.

Python is a high-level, interpreted programming language renowned for its simplicity, readability, and powerful capabilities. Created by Guido van Rossum and first released in 1991, Python has become one of the world’s most popular programming languages.

The name Python comes from the British comedy group Monty Python’s Flying Circus, reflecting the language’s emphasis on fun and accessibility. Python code is executed by an interpreter that converts source code into bytecode, which is then executed by the Python virtual machine.

1.1.1. Why Python is Popular

Python’s popularity stems from several key strengths:

Readable syntax: Code looks almost like natural English
Versatile applications: Web development, data science, automation, AI/ML, and more
Rich ecosystem: Extensive standard library and third-party packages via PyPI
Cross-platform: Runs on Windows, macOS, Linux, and many other platforms
Strong community: Excellent documentation, tutorials, and community support
Rapid development: Faster to write and maintain than many other languages

1.2. Modern Python Features (3.12+)

Python continues to evolve with powerful new features:

F-strings: Modern string formatting with embedded expressions
Type hints: Optional static typing for better code documentation
Async/await: Built-in support for asynchronous programming
Pattern matching: Structural pattern matching (match/case statements)
Performance improvements: Faster startup times and execution
Better error messages: More helpful debugging information

1.3. Real-World Applications

In this tutorial, you’ll build practical applications that demonstrate Python’s capabilities:

Web Crawler: Extract and process data from websites using requests and BeautifulSoup
Document Validator: Check AsciiDoc files for formatting issues using LanguageTools API
Data Processing: Handle files, APIs, and structured data with modern Python techniques

1.4. About this tutorial

This tutorial provides a hands-on approach to learning Python through practical examples. You’ll start with Python fundamentals, then progress to building real applications. By the end, you’ll have the skills to create your own Python projects and understand modern development practices.

2. Setting Up Your Python Development Environment

2.1. Installing Python

Python 3.12+ is recommended for modern development. Most systems come with Python pre-installed, but you should ensure you have the latest version.

2.1.1. Windows

Download Python from https://www.python.org/downloads/ and run the installer. Important: Check "Add Python to PATH" during installation.

# Verify installation python --version # or python3 --version

2.1.2. macOS

Use Homebrew (recommended) or download from python.org:

# Install Homebrew first, then: brew install python # Verify python3 --version

2.1.3. Linux (Ubuntu/Debian)

sudo apt update sudo apt install python3 python3-pip python3-venv # Verify python3 --version

2.2. Virtual Environments

Virtual environments isolate your project dependencies. Always use virtual environments for Python projects.

# Create a virtual environment python3 -m venv myproject_env # Activate it # On Windows: myproject_env\Scripts\activate # On macOS/Linux: source myproject_env/bin/activate # Install packages pip install requests beautifulsoup4 language-tool-python # Deactivate when done deactivate

2.3. Recommended Development Tools

While you can use any text editor, these tools enhance productivity:

Visual Studio Code: Free, powerful, with excellent Python extension
PyCharm: Full-featured IDE (Community edition is free)
Jupyter Notebooks: Great for data analysis and learning
Terminal/Command Line: Essential for running Python scripts

2.3.1. Visual Studio Code Setup

Download from https://code.visualstudio.com/
Install the Python extension by Microsoft
Open a Python file - VS Code will help you select the right interpreter

2.4. Package Management with pip

pip is Python’s package installer. Use requirements.txt files to manage dependencies:

# Install a package pip install requests # Install from requirements file pip install -r requirements.txt # List installed packages pip list # Generate requirements file pip freeze > requirements.txt

3. Your First Python Program

Let’s start with a simple but practical Python program that demonstrates modern syntax and best practices.

3.1. Hello, World! - Modern Style

Create a new file called hello_world.py:

#!/usr/bin/env python3 """ A simple Hello World program demonstrating modern Python syntax. """ def greet(name: str) -> str: """Return a personalized greeting.""" return f"Hello, {name}! Welcome to Python programming." def main(): """Main program entry point.""" # Get user input  user_name = input("What's your name? ") # Create and display greeting  greeting = greet(user_name) print(greeting) # Show some Python features  languages = ["Python", "Java", "JavaScript", "Go"] print(f"\nHere are some popular programming languages:") for i, lang in enumerate(languages, 1): print(f"{i}. {lang}") if __name__ == "__main__": main()

Run it from your terminal:

python3 hello_world.py

3.2. A More Practical Example

Let’s create a simple file analyzer that demonstrates Python’s strengths:

#!/usr/bin/env python3 """ File analyzer that demonstrates modern Python features. """ import os from pathlib import Path from typing import Dict, List def analyze_file(file_path: Path) -> Dict[str, any]: """Analyze a text file and return statistics.""" try: with open(file_path, 'r', encoding='utf-8') as file: content = file.read() lines = content.split('\n') return { 'filename': file_path.name, 'size_bytes': file_path.stat().st_size, 'line_count': len(lines), 'word_count': len(content.split()), 'char_count': len(content), 'extension': file_path.suffix } except FileNotFoundError: return {'error': f'File not found: {file_path}'} except Exception as e: return {'error': f'Error reading file: {e}'} def analyze_directory(directory: str) -> List[Dict[str, any]]: """Analyze all text files in a directory.""" path = Path(directory) results = [] if not path.exists(): return [{'error': f'Directory not found: {directory}'}] # Find text files  text_extensions = {'.txt', '.py', '.md', '.adoc', '.rst'} for file_path in path.iterdir(): if file_path.is_file() and file_path.suffix in text_extensions: results.append(analyze_file(file_path)) return results def main(): """Main program demonstrating file analysis.""" print("🔍 File Analyzer - Modern Python Example") print("=" * 40) # Analyze current directory  directory = "." results = analyze_directory(directory) if not results: print("No text files found in current directory.") return # Display results  total_files = len(results) total_lines = sum(r.get('line_count', 0) for r in results if 'error' not in r) print(f"\nFound {total_files} text files:") print(f"Total lines: {total_lines:,}") print("\nFile details:") print("-" * 60) for result in results: if 'error' in result: print(f"❌ {result['error']}") else: print(f"📄 {result['filename']:20} | " f"{result['line_count']:4} lines | " f"{result['size_bytes']:6} bytes") if __name__ == "__main__": main()

This example shows:

Modern f-string formatting
Type hints for better code documentation
Exception handling
Working with files and paths
Using Python’s standard library

3.3. Interactive Python Development

Python includes an interactive interpreter perfect for experimentation:

# Start interactive Python python3 # Try some expressions >>> name = "Python" >>> print(f"Hello, {name}!") Hello, Python! >>> numbers = [1, 2, 3, 4, 5] >>> sum(numbers) 15 >>> exit()

3.4. Organizing Your First Project

For real projects, use this structure:

my_python_project/ ├── requirements.txt # Project dependencies ├── README.md # Project documentation ├── main.py # Entry point ├── src/ # Source code │ ├── __init__.py │ └── my_module.py └── tests/ # Test files ├── __init__.py └── test_my_module.py

4. Python Programming Fundamentals

4.1. Python Syntax Overview

Python 3.12+ includes many features that make code more readable and maintainable. Let’s explore the most important concepts.

4.2. Variables and Type Hints

Python is dynamically typed, but you can add type hints for better code documentation:

#!/usr/bin/env python3 """ Modern Python variables and type hints examples. """ from typing import List, Dict, Optional # Basic variables with type hints name: str = "Alice" age: int = 30 height: float = 5.6 is_student: bool = True # Collections with type hints numbers: List[int] = [1, 2, 3, 4, 5] scores: Dict[str, int] = {"math": 95, "science": 88, "history": 92} middle_name: Optional[str] = None # Can be None or string  # Dynamic typing still works dynamic_var = "starts as string" dynamic_var = 42 # now it's an integer dynamic_var = ["now", "it's", "a", "list"] # Multiple assignment x, y, z = 1, 2, 3 first, *rest = [1, 2, 3, 4, 5] # first=1, rest=[2,3,4,5]  # Constants (by convention, use UPPER_CASE) MAX_CONNECTIONS: int = 100 API_URL: str = "https://api.example.com" print(f"Hello, {name}! You are {age} years old.") print(f"Your scores: {scores}") print(f"First number: {first}, rest: {rest}")

4.3. String Operations

Python offers powerful string manipulation with f-strings being the preferred approach:

#!/usr/bin/env python3 """ Modern Python string operations with f-strings and advanced techniques. """ # F-string formatting (preferred in Python 3.6+) name = "Python" version = 3.12 print(f"Welcome to {name} {version}!") # Multi-line f-strings user = {"name": "Alice", "age": 30, "city": "New York"} message = f""" Hello {user['name']}! You are {user['age']} years old and live in {user['city']}. """ print(message) # F-strings with expressions and formatting numbers = [1, 2, 3, 4, 5] print(f"Sum of {numbers} = {sum(numbers)}") print(f"Pi to 3 decimal places: {3.14159:.3f}") # String methods and operations text = " Hello, World! " print(f"Original: '{text}'") print(f"Stripped: '{text.strip()}'") print(f"Uppercase: '{text.upper()}'") print(f"Lowercase: '{text.lower()}'") print(f"Title Case: '{text.title()}'") # String slicing and indexing sentence = "Python programming is fun" print(f"First word: {sentence[:6]}") print(f"Last word: {sentence.split()[-1]}") print(f"Every 2nd character: {sentence[::2]}") # String checking methods email = "user@example.com" print(f"Contains @: {'@' in email}") print(f"Starts with 'user': {email.startswith('user')}") print(f"Ends with '.com': {email.endswith('.com')}") # Joining and splitting words = ["Python", "is", "awesome"] joined = " ".join(words) print(f"Joined: {joined}") print(f"Split back: {joined.split()}") # Raw strings for regex patterns import re pattern = r"\d{3}-\d{3}-\d{4}" # Phone number pattern phone = "123-456-7890" print(f"Phone match: {bool(re.match(pattern, phone))}")

4.4. Working with Collections

Python provides rich data structures for organizing and manipulating data:

#!/usr/bin/env python3 """ Working with Python collections: lists, dictionaries, sets, and tuples. """ from collections import defaultdict, Counter from typing import List, Dict, Set, Tuple # Lists - ordered, mutable collections fruits: List[str] = ["apple", "banana", "cherry", "date"] print(f"Fruits: {fruits}") # List comprehensions (Pythonic way to create lists) squares = [x**2 for x in range(1, 6)] even_numbers = [x for x in range(20) if x % 2 == 0] print(f"Squares: {squares}") print(f"Even numbers: {even_numbers}") # List operations fruits.append("elderberry") fruits.extend(["fig", "grape"]) print(f"After additions: {fruits}") # Dictionaries - key-value pairs person: Dict[str, any] = { "name": "Alice", "age": 30, "skills": ["Python", "Java", "JavaScript"], "is_employed": True } # Dictionary comprehension word_lengths = {word: len(word) for word in fruits} print(f"Word lengths: {word_lengths}") # Safe dictionary access age = person.get("age", 0) # Returns 0 if "age" not found print(f"Age: {age}") # Sets - unique, unordered collections unique_numbers: Set[int] = {1, 2, 3, 3, 4, 4, 5} print(f"Unique numbers: {unique_numbers}") # Set operations set1 = {1, 2, 3, 4} set2 = {3, 4, 5, 6} print(f"Union: {set1 | set2}") print(f"Intersection: {set1 & set2}") print(f"Difference: {set1 - set2}") # Tuples - immutable sequences coordinates: Tuple[float, float] = (10.5, 20.3) rgb_color: Tuple[int, int, int] = (255, 128, 0) print(f"Coordinates: {coordinates}") # Named tuples for better structure from collections import namedtuple Point = namedtuple('Point', ['x', 'y']) p = Point(10, 20) print(f"Point: x={p.x}, y={p.y}") # Advanced collections # defaultdict - provides default values word_count = defaultdict(int) text = "hello world hello python world" for word in text.split(): word_count[word] += 1 print(f"Word count: {dict(word_count)}") # Counter - counts occurrences counter = Counter(text.split()) print(f"Most common word: {counter.most_common(1)}") # Unpacking and packing numbers = [1, 2, 3, 4, 5] first, second, *rest = numbers print(f"First: {first}, Second: {second}, Rest: {rest}") # Zip for parallel iteration names = ["Alice", "Bob", "Charlie"] ages = [25, 30, 35] for name, age in zip(names, ages): print(f"{name} is {age} years old")

4.5. Functions with Modern Features

Functions in Python support default arguments, type hints, and advanced features:

#!/usr/bin/env python3 """ Modern Python functions with type hints and advanced features. """ from typing import List, Optional, Callable, Any from functools import wraps # Basic function with type hints def greet(name: str, age: int = 25) -> str: """Return a personalized greeting.""" return f"Hello, {name}! You are {age} years old." # Function with optional parameters def create_user(name: str, email: str, age: Optional[int] = None) -> dict: """Create a user dictionary with optional age.""" user = {"name": name, "email": email} if age is not None: user["age"] = age return user # Function with variable arguments def calculate_average(*numbers: float) -> float: """Calculate average of any number of values.""" if not numbers: return 0.0 return sum(numbers) / len(numbers) # Function with keyword arguments def create_config(**kwargs: Any) -> dict: """Create configuration dictionary from keyword arguments.""" defaults = {"debug": False, "port": 8080} defaults.update(kwargs) return defaults # Lambda functions (anonymous functions) square = lambda x: x**2 add = lambda x, y: x + y # Higher-order functions def apply_operation(numbers: List[int], operation: Callable[[int], int]) -> List[int]: """Apply an operation to each number in the list.""" return [operation(n) for n in numbers] # Decorator function def timer(func): """Decorator to time function execution.""" @wraps(func) def wrapper(*args, **kwargs): import time start = time.time() result = func(*args, **kwargs) end = time.time() print(f"{func.__name__} took {end - start:.4f} seconds") return result return wrapper @timer def slow_function(): """A function that takes some time.""" import time time.sleep(0.1) return "Done!" # Generator function def fibonacci(n: int): """Generate fibonacci numbers up to n.""" a, b = 0, 1 count = 0 while count < n: yield a a, b = b, a + b count += 1 # Example usage if __name__ == "__main__": # Basic functions  print(greet("Alice")) print(greet("Bob", 30)) # User creation  user1 = create_user("Alice", "alice@example.com") user2 = create_user("Bob", "bob@example.com", 30) print(f"User 1: {user1}") print(f"User 2: {user2}") # Variable arguments  avg = calculate_average(10, 20, 30, 40, 50) print(f"Average: {avg}") # Keyword arguments  config = create_config(debug=True, host="localhost", port=3000) print(f"Config: {config}") # Lambda and higher-order functions  numbers = [1, 2, 3, 4, 5] squared = apply_operation(numbers, square) print(f"Squared: {squared}") # Decorator  slow_function() # Generator  fib_numbers = list(fibonacci(10)) print(f"Fibonacci: {fib_numbers}")

4.6. Modern Class Design

Object-oriented programming in Python with modern best practices:

#!/usr/bin/env python3 """ Modern Python classes with type hints, dataclasses, and properties. """ from dataclasses import dataclass from typing import List, Optional, ClassVar from abc import ABC, abstractmethod # Modern class with type hints and properties class Person: """A person with name, age, and email.""" # Class variable  species: ClassVar[str] = "Homo sapiens" def __init__(self, name: str, age: int, email: str) -> None: self._name = name self._age = age self._email = email self._friends: List[str] = [] @property def name(self) -> str: """Get the person's name.""" return self._name @property def age(self) -> int: """Get the person's age.""" return self._age @age.setter def age(self, value: int) -> None: """Set the person's age with validation.""" if value < 0: raise ValueError("Age cannot be negative") self._age = value @property def email(self) -> str: """Get the person's email.""" return self._email def add_friend(self, friend_name: str) -> None: """Add a friend to the person's friend list.""" if friend_name not in self._friends: self._friends.append(friend_name) def get_friends(self) -> List[str]: """Get a copy of the friend list.""" return self._friends.copy() def __str__(self) -> str: return f"Person(name='{self.name}', age={self.age}, email='{self.email}')" def __repr__(self) -> str: return self.__str__() # Dataclass - automatically generates __init__, __str__, __eq__, etc. @dataclass class Product: """A product with name, price, and quantity.""" name: str price: float quantity: int = 0 category: Optional[str] = None def total_value(self) -> float: """Calculate total value of this product.""" return self.price * self.quantity def __post_init__(self): """Validate data after initialization.""" if self.price < 0: raise ValueError("Price cannot be negative") # Abstract base class class Animal(ABC): """Abstract animal class.""" def __init__(self, name: str, species: str): self.name = name self.species = species @abstractmethod def make_sound(self) -> str: """Make a sound - must be implemented by subclasses.""" pass def sleep(self) -> str: """All animals can sleep.""" return f"{self.name} is sleeping..." # Concrete implementation class Dog(Animal): """A dog that inherits from Animal.""" def __init__(self, name: str, breed: str): super().__init__(name, "Canis lupus") self.breed = breed def make_sound(self) -> str: return f"{self.name} says Woof!" def fetch(self, item: str) -> str: return f"{self.name} fetches the {item}!" # Class with static and class methods class MathUtils: """Utility class for mathematical operations.""" PI: ClassVar[float] = 3.14159 @staticmethod def add(a: float, b: float) -> float: """Add two numbers.""" return a + b @classmethod def circle_area(cls, radius: float) -> float: """Calculate circle area using class constant.""" return cls.PI * radius * radius # Example usage if __name__ == "__main__": # Regular class  person = Person("Alice", 30, "alice@example.com") person.add_friend("Bob") person.add_friend("Charlie") print(person) print(f"Friends: {person.get_friends()}") # Dataclass  product = Product("Laptop", 999.99, 5, "Electronics") print(f"Product: {product}") print(f"Total value: ${product.total_value():.2f}") # Inheritance and polymorphism  dog = Dog("Buddy", "Golden Retriever") print(dog.make_sound()) print(dog.fetch("ball")) print(dog.sleep()) # Static and class methods  result = MathUtils.add(5, 3) area = MathUtils.circle_area(10) print(f"5 + 3 = {result}") print(f"Circle area (r=10): {area:.2f}")

4.7. Error Handling and Exceptions

Robust error handling is essential for reliable applications:

#!/usr/bin/env python3 """ Modern error handling and exception management in Python. """ import logging from typing import Optional, List, Dict from pathlib import Path # Configure logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # Custom exceptions class ValidationError(Exception): """Raised when data validation fails.""" pass class NetworkError(Exception): """Raised when network operations fail.""" def __init__(self, message: str, status_code: Optional[int] = None): super().__init__(message) self.status_code = status_code # Basic exception handling def safe_divide(a: float, b: float) -> Optional[float]: """Safely divide two numbers.""" try: result = a / b return result except ZeroDivisionError: logger.error("Cannot divide by zero") return None except TypeError as e: logger.error(f"Type error: {e}") return None # Multiple exception handling def process_user_input(user_input: str) -> Optional[int]: """Process user input and return integer.""" try: # Try to convert to integer  number = int(user_input) # Validate range  if number < 0: raise ValidationError("Number must be positive") return number except ValueError: logger.error(f"'{user_input}' is not a valid number") return None except ValidationError as e: logger.error(f"Validation error: {e}") return None # File operations with exception handling def read_config_file(filename: str) -> Dict[str, any]: """Read configuration from a file with proper error handling.""" config = {} file_path = Path(filename) try: # Check if file exists  if not file_path.exists(): raise FileNotFoundError(f"Config file {filename} not found") # Read and parse file  with open(file_path, 'r', encoding='utf-8') as file: for line_num, line in enumerate(file, 1): line = line.strip() if line and not line.startswith('#'): try: key, value = line.split('=', 1) config[key.strip()] = value.strip() except ValueError: logger.warning(f"Invalid line {line_num}: {line}") return config except FileNotFoundError as e: logger.error(f"File error: {e}") return {"error": "file_not_found"} except PermissionError: logger.error(f"Permission denied reading {filename}") return {"error": "permission_denied"} except Exception as e: logger.error(f"Unexpected error reading {filename}: {e}") return {"error": "unexpected_error"} # Context manager for resource handling class DatabaseConnection: """Mock database connection with proper cleanup.""" def __init__(self, connection_string: str): self.connection_string = connection_string self.connected = False def __enter__(self): """Enter context - establish connection.""" logger.info("Connecting to database...") self.connected = True return self def __exit__(self, exc_type, exc_val, exc_tb): """Exit context - clean up connection.""" if self.connected: logger.info("Closing database connection...") self.connected = False # Handle exceptions that occurred in the context  if exc_type is not None: logger.error(f"Exception in context: {exc_type.__name__}: {exc_val}") # Return False to propagate exceptions  return False def query(self, sql: str) -> List[Dict]: """Execute a database query.""" if not self.connected: raise ConnectionError("Not connected to database") # Simulate database operation  logger.info(f"Executing query: {sql}") return [{"id": 1, "name": "example"}] # Finally block example def process_data_with_cleanup(data_file: str) -> bool: """Process data file with guaranteed cleanup.""" temp_file = None try: # Open temporary file  temp_file = open("temp_processing.txt", "w") # Process data (might raise exceptions)  with open(data_file, "r") as file: data = file.read() temp_file.write(data.upper()) logger.info("Data processed successfully") return True except FileNotFoundError: logger.error(f"Data file {data_file} not found") return False except Exception as e: logger.error(f"Error processing data: {e}") return False finally: # This always runs, even if exception occurred  if temp_file and not temp_file.closed: temp_file.close() logger.info("Temporary file closed") # Example usage if __name__ == "__main__": # Safe division  print(f"10 / 2 = {safe_divide(10, 2)}") print(f"10 / 0 = {safe_divide(10, 0)}") # User input processing  test_inputs = ["42", "-5", "not_a_number", "100"] for inp in test_inputs: result = process_user_input(inp) print(f"Input '{inp}' -> {result}") # Context manager usage  try: with DatabaseConnection("sqlite://memory") as db: results = db.query("SELECT * FROM users") print(f"Query results: {results}") except Exception as e: print(f"Database operation failed: {e}") # Configuration file reading  config = read_config_file("nonexistent_config.txt") print(f"Config: {config}") # Processing with cleanup  success = process_data_with_cleanup("nonexistent_data.txt") print(f"Processing successful: {success}")

4.8. File Operations and Context Managers

Working with files safely using context managers:

#!/usr/bin/env python3 """ Modern file operations using context managers and pathlib. """ from pathlib import Path from typing import List, Dict, Optional import json import csv import logging # Configure logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # Reading files with context managers def read_text_file(filename: str) -> Optional[str]: """Read a text file safely using context manager.""" try: file_path = Path(filename) with open(file_path, 'r', encoding='utf-8') as file: content = file.read() logger.info(f"Successfully read {file_path.name} ({len(content)} characters)") return content except FileNotFoundError: logger.error(f"File {filename} not found") return None except Exception as e: logger.error(f"Error reading {filename}: {e}") return None # Writing files with automatic cleanup def write_text_file(filename: str, content: str) -> bool: """Write content to a text file.""" try: file_path = Path(filename) # Create parent directories if they don't exist  file_path.parent.mkdir(parents=True, exist_ok=True) with open(file_path, 'w', encoding='utf-8') as file: file.write(content) logger.info(f"Successfully wrote to {file_path.name}") return True except Exception as e: logger.error(f"Error writing to {filename}: {e}") return False # Working with JSON files def read_json_file(filename: str) -> Optional[Dict]: """Read and parse JSON file.""" try: file_path = Path(filename) with open(file_path, 'r', encoding='utf-8') as file: data = json.load(file) logger.info(f"Successfully loaded JSON from {file_path.name}") return data except json.JSONDecodeError as e: logger.error(f"Invalid JSON in {filename}: {e}") return None except FileNotFoundError: logger.error(f"JSON file {filename} not found") return None def write_json_file(filename: str, data: Dict) -> bool: """Write data to JSON file with proper formatting.""" try: file_path = Path(filename) file_path.parent.mkdir(parents=True, exist_ok=True) with open(file_path, 'w', encoding='utf-8') as file: json.dump(data, file, indent=2, ensure_ascii=False) logger.info(f"Successfully wrote JSON to {file_path.name}") return True except Exception as e: logger.error(f"Error writing JSON to {filename}: {e}") return False # Working with CSV files def read_csv_file(filename: str) -> Optional[List[Dict]]: """Read CSV file and return list of dictionaries.""" try: file_path = Path(filename) data = [] with open(file_path, 'r', encoding='utf-8', newline='') as file: reader = csv.DictReader(file) for row in reader: data.append(row) logger.info(f"Successfully read {len(data)} rows from {file_path.name}") return data except Exception as e: logger.error(f"Error reading CSV {filename}: {e}") return None def write_csv_file(filename: str, data: List[Dict], fieldnames: List[str]) -> bool: """Write data to CSV file.""" try: file_path = Path(filename) file_path.parent.mkdir(parents=True, exist_ok=True) with open(file_path, 'w', encoding='utf-8', newline='') as file: writer = csv.DictWriter(file, fieldnames=fieldnames) writer.writeheader() writer.writerows(data) logger.info(f"Successfully wrote {len(data)} rows to {file_path.name}") return True except Exception as e: logger.error(f"Error writing CSV to {filename}: {e}") return False # Working with paths using pathlib def analyze_directory(directory: str) -> Dict[str, any]: """Analyze directory contents using pathlib.""" try: dir_path = Path(directory) if not dir_path.exists(): return {"error": f"Directory {directory} does not exist"} if not dir_path.is_dir(): return {"error": f"{directory} is not a directory"} files = [] total_size = 0 for file_path in dir_path.iterdir(): if file_path.is_file(): size = file_path.stat().st_size files.append({ "name": file_path.name, "size": size, "extension": file_path.suffix, "modified": file_path.stat().st_mtime }) total_size += size return { "directory": str(dir_path), "file_count": len(files), "total_size": total_size, "files": files } except Exception as e: return {"error": f"Error analyzing directory: {e}"} # Processing lines from large files def process_large_file(filename: str, line_processor=None) -> int: """Process a large file line by line to save memory.""" if line_processor is None: line_processor = lambda line, num: print(f"Line {num}: {line.strip()}") try: file_path = Path(filename) line_count = 0 with open(file_path, 'r', encoding='utf-8') as file: for line_num, line in enumerate(file, 1): line_processor(line, line_num) line_count += 1 logger.info(f"Processed {line_count} lines from {file_path.name}") return line_count except Exception as e: logger.error(f"Error processing file {filename}: {e}") return 0 # Example usage and demonstrations def create_sample_files(): """Create sample files for demonstration.""" # Sample text file  text_content = """This is a sample text file. It contains multiple lines. Each line demonstrates file handling capabilities.""" write_text_file("sample_data/sample.txt", text_content) # Sample JSON file  json_data = { "name": "Python Tutorial", "version": "3.12", "features": ["modern syntax", "type hints", "async support"], "author": {"name": "Alice", "email": "alice@example.com"} } write_json_file("sample_data/config.json", json_data) # Sample CSV file  csv_data = [ {"name": "Alice", "age": "30", "city": "New York"}, {"name": "Bob", "age": "25", "city": "San Francisco"}, {"name": "Charlie", "age": "35", "city": "Chicago"} ] write_csv_file("sample_data/users.csv", csv_data, ["name", "age", "city"]) if __name__ == "__main__": # Create sample files  create_sample_files() # Read and display files  text = read_text_file("sample_data/sample.txt") if text: print("Text file content:") print(text) json_data = read_json_file("sample_data/config.json") if json_data: print(f"JSON data: {json_data}") csv_data = read_csv_file("sample_data/users.csv") if csv_data: print(f"CSV data: {csv_data}") # Analyze directory  analysis = analyze_directory("sample_data") print(f"Directory analysis: {analysis}") # Process file line by line  def word_counter(line, line_num): words = len(line.split()) print(f"Line {line_num} has {words} words") process_large_file("sample_data/sample.txt", word_counter)

5. Modern Python Deployment

Today’s Python applications can be deployed in many ways, from traditional web hosting to modern cloud platforms.

5.1. Popular Deployment Platforms

Cloud Platforms: * Heroku: Simple deployment with Git integration * Google Cloud Platform: Powerful infrastructure with App Engine, Cloud Run * AWS: Comprehensive services including Lambda, EC2, Elastic Beanstalk * Microsoft Azure: Full-featured cloud platform with App Service * DigitalOcean: Developer-friendly with App Platform

Containerization: * Docker: Package applications with all dependencies * Kubernetes: Orchestrate containers at scale

5.2. Modern Web Frameworks

For web development, consider these popular Python frameworks:

FastAPI: Modern, fast API framework with automatic documentation
Django: Full-featured web framework with admin interface
Flask: Lightweight and flexible micro-framework
Streamlit: Quick data science web apps

5.3. Simple Deployment Example

Here’s how to create a basic web API with FastAPI:

# main.py from fastapi import FastAPI app = FastAPI(title="Python Tutorial API") @app.get("/") def read_root(): return {"message": "Hello from Python!"} @app.get("/crawl-status") def crawl_status(): return {"status": "Web crawler ready", "version": "1.0"}

Install dependencies:

pip install fastapi uvicorn

Run locally:

uvicorn main:app --reload

This creates a REST API that can be deployed to any cloud platform.

6. Building a Web Crawler

Web crawling is a common task in data science, SEO analysis, and content aggregation. Our web crawler will extract data from web pages using modern Python libraries.

6.1. Installation and Setup

First, install the required libraries:

pip install requests beautifulsoup4 lxml

6.2. Web Crawler Implementation

#!/usr/bin/env python3 """ Modern web crawler using requests and BeautifulSoup. Demonstrates best practices for web scraping in Python. """ import requests from bs4 import BeautifulSoup, Tag from typing import Dict, List, Optional, Set from urllib.parse import urljoin, urlparse import time import logging from dataclasses import dataclass from pathlib import Path import json # Configure logging logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s' ) logger = logging.getLogger(__name__) @dataclass class CrawlResult: """Data structure for crawl results.""" url: str title: str status_code: int links: List[str] text_content: str meta_description: str word_count: int crawl_time: float class WebCrawler: """A respectful web crawler with rate limiting and error handling.""" def __init__(self, delay: float = 1.0, max_retries: int = 3): """ Initialize the web crawler. Args: delay: Delay between requests in seconds max_retries: Maximum number of retry attempts """ self.delay = delay self.max_retries = max_retries self.session = requests.Session() # Set a reasonable user agent  self.session.headers.update({ 'User-Agent': 'Python Web Crawler Tutorial Bot 1.0 (+https://example.com/bot)' }) self.crawled_urls: Set[str] = set() self.results: List[CrawlResult] = [] def crawl_page(self, url: str) -> Optional[CrawlResult]: """ Crawl a single web page and extract information. Args: url: The URL to crawl Returns: CrawlResult object or None if crawl failed """ if url in self.crawled_urls: logger.info(f"Already crawled: {url}") return None logger.info(f"Crawling: {url}") start_time = time.time() try: # Make request with retry logic  response = self._make_request(url) if not response: return None # Parse HTML content  soup = BeautifulSoup(response.content, 'html.parser') # Extract page information  title = self._extract_title(soup) links = self._extract_links(soup, url) text_content = self._extract_text(soup) meta_description = self._extract_meta_description(soup) word_count = len(text_content.split()) crawl_time = time.time() - start_time result = CrawlResult( url=url, title=title, status_code=response.status_code, links=links, text_content=text_content[:500] + "..." if len(text_content) > 500 else text_content, meta_description=meta_description, word_count=word_count, crawl_time=crawl_time ) self.crawled_urls.add(url) self.results.append(result) # Respect the website with delay  time.sleep(self.delay) return result except Exception as e: logger.error(f"Error crawling {url}: {e}") return None def _make_request(self, url: str) -> Optional[requests.Response]: """Make HTTP request with retry logic.""" for attempt in range(self.max_retries): try: response = self.session.get(url, timeout=10) response.raise_for_status() return response except requests.exceptions.RequestException as e: logger.warning(f"Attempt {attempt + 1} failed for {url}: {e}") if attempt < self.max_retries - 1: time.sleep(2 ** attempt) # Exponential backoff  else: logger.error(f"All retry attempts failed for {url}") return None def _extract_title(self, soup: BeautifulSoup) -> str: """Extract page title.""" title_tag = soup.find('title') if title_tag and isinstance(title_tag, Tag): return title_tag.get_text().strip() return "No title found" def _extract_links(self, soup: BeautifulSoup, base_url: str) -> List[str]: """Extract all links from the page.""" links = [] for link in soup.find_all('a', href=True): if isinstance(link, Tag): href = link['href'] absolute_url = urljoin(base_url, href) links.append(absolute_url) return links def _extract_text(self, soup: BeautifulSoup) -> str: """Extract visible text content from the page.""" # Remove script and style elements  for script in soup(["script", "style"]): script.decompose() # Get text and clean it up  text = soup.get_text() lines = (line.strip() for line in text.splitlines()) chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) text = ' '.join(chunk for chunk in chunks if chunk) return text def _extract_meta_description(self, soup: BeautifulSoup) -> str: """Extract meta description.""" meta_desc = soup.find('meta', attrs={'name': 'description'}) if meta_desc and isinstance(meta_desc, Tag): return meta_desc.get('content', '') return "" def crawl_multiple_pages(self, urls: List[str]) -> List[CrawlResult]: """Crawl multiple pages and return results.""" logger.info(f"Starting crawl of {len(urls)} pages") results = [] for url in urls: result = self.crawl_page(url) if result: results.append(result) logger.info(f"Crawling completed. Successfully crawled {len(results)} pages") return results def save_results(self, filename: str = "crawl_results.json") -> bool: """Save crawl results to JSON file.""" try: # Convert dataclass objects to dictionaries  results_data = [ { 'url': result.url, 'title': result.title, 'status_code': result.status_code, 'links_count': len(result.links), 'first_10_links': result.links[:10], # Save only first 10 links  'text_preview': result.text_content, 'meta_description': result.meta_description, 'word_count': result.word_count, 'crawl_time': result.crawl_time } for result in self.results ] with open(filename, 'w', encoding='utf-8') as f: json.dump(results_data, f, indent=2, ensure_ascii=False) logger.info(f"Results saved to {filename}") return True except Exception as e: logger.error(f"Error saving results: {e}") return False def get_statistics(self) -> Dict[str, any]: """Get crawling statistics.""" if not self.results: return {"error": "No crawling results available"} total_words = sum(result.word_count for result in self.results) avg_crawl_time = sum(result.crawl_time for result in self.results) / len(self.results) total_links = sum(len(result.links) for result in self.results) return { "pages_crawled": len(self.results), "total_words": total_words, "average_words_per_page": total_words // len(self.results), "total_links_found": total_links, "average_crawl_time": round(avg_crawl_time, 2), "successful_crawls": len(self.results), "domains_crawled": len(set(urlparse(result.url).netloc for result in self.results)) } if __name__ == "__main__": # Example usage  crawler = WebCrawler(delay=1.0) # Example URLs (using HTTP examples that are safe to crawl)  test_urls = [ "http://httpbin.org/html", "http://httpbin.org/robots.txt", ] # Crawl the pages  results = crawler.crawl_multiple_pages(test_urls) # Display results  for result in results: print(f"\n{'='*60}") print(f"URL: {result.url}") print(f"Title: {result.title}") print(f"Status: {result.status_code}") print(f"Word Count: {result.word_count}") print(f"Links Found: {len(result.links)}") print(f"Crawl Time: {result.crawl_time:.2f}s") print(f"Text Preview: {result.text_content[:100]}...") # Show statistics  stats = crawler.get_statistics() print(f"\n{'='*60}") print("CRAWLING STATISTICS") print(f"{'='*60}") for key, value in stats.items(): print(f"{key.replace('_', ' ').title()}: {value}") # Save results  crawler.save_results("web_crawl_results.json")

This web crawler demonstrates: * HTTP requests with proper error handling * HTML parsing with BeautifulSoup * Rate limiting to be respectful to websites * Data extraction and structuring * Robustness with retry logic

6.3. Using the Web Crawler

#!/usr/bin/env python3 """ Example usage of the web crawler. """ from web_crawler import WebCrawler import logging # Configure logging to see crawler activity logging.basicConfig(level=logging.INFO) def main(): """Demonstrate web crawler usage.""" print("🕷️ Web Crawler Example") print("=" * 50) # Create crawler with 2-second delay between requests  crawler = WebCrawler(delay=2.0, max_retries=2) # URLs to crawl (using safe test websites)  urls_to_crawl = [ "http://httpbin.org/html", "http://httpbin.org/robots.txt", "https://jsonplaceholder.typicode.com/", # API service with HTML  ] print(f"Crawling {len(urls_to_crawl)} URLs...") # Perform the crawl  results = crawler.crawl_multiple_pages(urls_to_crawl) # Display detailed results  print(f"\n✅ Successfully crawled {len(results)} pages\n") for i, result in enumerate(results, 1): print(f"📄 Page {i}: {result.url}") print(f" Title: {result.title}") print(f" Status: {result.status_code}") print(f" Words: {result.word_count}") print(f" Links: {len(result.links)}") print(f" Time: {result.crawl_time:.2f}s") if result.meta_description: print(f" Description: {result.meta_description}") print(f" Preview: {result.text_content[:100]}...\n") # Show overall statistics  stats = crawler.get_statistics() print("📊 Crawling Statistics:") print("-" * 30) for key, value in stats.items(): print(f"{key.replace('_', ' ').title()}: {value}") # Save results to file  saved = crawler.save_results("example_crawl_results.json") if saved: print("\n💾 Results saved to example_crawl_results.json") if __name__ == "__main__": main()

6.4. Best Practices Demonstrated

The web crawler example showcases important Python development practices:

Type hints: Make code more maintainable and self-documenting
Error handling: Graceful failure handling with informative messages
Logging: Proper logging for debugging and monitoring
Modular design: Functions and classes with single responsibilities
Documentation: Clear docstrings and comments
External libraries: Leveraging the Python ecosystem
Resource management: Proper cleanup and context managers

6.5. Extending the Web Crawler

Consider these enhancements to deepen your learning:

Add support for different content types (PDF, images)
Implement concurrent crawling with asyncio
Add data storage to databases or files
Create a web interface with Flask or FastAPI

7. AsciiDoc Validation with LanguageTools

LanguageTools provides grammar and style checking for text documents. We’ll create a tool to check AsciiDoc files for writing issues.

7.1. Installation and Setup

Install the required library:

pip install language-tool-python

Note: This will download the LanguageTools server on first use.

7.2. AsciiDoc Validator Implementation

Create a file named asciidoc_validator.py with the following content:

#!/usr/bin/env python3 """ AsciiDoc validator using LanguageTools to check grammar and style. Demonstrates file processing and external API integration. """ import language_tool_python from pathlib import Path from typing import List, Dict, Optional, NamedTuple import re import logging import json from dataclasses import dataclass import argparse # Configure logging logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s' ) logger = logging.getLogger(__name__) class ValidationIssue(NamedTuple): """Structure for validation issues.""" line_number: int column: int message: str rule_id: str suggestions: List[str] context: str @dataclass class FileReport: """Report for a single file.""" file_path: str total_lines: int issues_found: int issues: List[ValidationIssue] processing_time: float class AsciiDocValidator: """Validator for AsciiDoc files using LanguageTools.""" def __init__(self, language: str = 'en-US'): """ Initialize the validator. Args: language: Language code for LanguageTools (default: en-US) """ self.language = language self.tool: Optional[language_tool_python.LanguageTool] = None self.reports: List[FileReport] = [] # Patterns to ignore in AsciiDoc files  self.ignore_patterns = [ r'include::', # Include directives  r'image::', # Image directives  r'\[source,', # Source code blocks  r'----', # Code block delimiters  r'====', # Example block delimiters  r'^\|', # Table rows  r'^\+', # Table continuation  r'^\*\*\*', # Section breaks  r'^\=\=\=', # Headers  r'^\#', # Comments  r'^\[', # Attribute definitions  r'^\:', # Attribute assignments  ] def _initialize_language_tool(self): """Initialize LanguageTools (lazy loading).""" if self.tool is None: logger.info("Initializing LanguageTools... (this may take a moment)") try: self.tool = language_tool_python.LanguageTool(self.language) logger.info("LanguageTools initialized successfully") except Exception as e: logger.error(f"Failed to initialize LanguageTools: {e}") raise def _should_check_line(self, line: str) -> bool: """Determine if a line should be checked for language issues.""" line_stripped = line.strip() # Skip empty lines  if not line_stripped: return False # Check against ignore patterns  for pattern in self.ignore_patterns: if re.match(pattern, line_stripped): return False return True def _extract_text_content(self, file_path: Path) -> List[str]: """Extract text content from AsciiDoc file.""" try: with open(file_path, 'r', encoding='utf-8') as file: lines = file.readlines() # Filter lines that should be checked  text_lines = [] for line_num, line in enumerate(lines, 1): if self._should_check_line(line): text_lines.append((line_num, line.strip())) return text_lines except Exception as e: logger.error(f"Error reading file {file_path}: {e}") return [] def validate_file(self, file_path: Path) -> FileReport: """ Validate a single AsciiDoc file. Args: file_path: Path to the AsciiDoc file Returns: FileReport with validation results """ import time start_time = time.time() logger.info(f"Validating: {file_path}") # Initialize LanguageTools if needed  self._initialize_language_tool() # Extract text content  text_lines = self._extract_text_content(file_path) if not text_lines: processing_time = time.time() - start_time return FileReport( file_path=str(file_path), total_lines=0, issues_found=0, issues=[], processing_time=processing_time ) # Check each line for issues  all_issues = [] for line_number, text in text_lines: try: matches = self.tool.check(text) for match in matches: issue = ValidationIssue( line_number=line_number, column=match.offset, message=match.message, rule_id=match.ruleId, suggestions=[s for s in match.replacements[:3]], # First 3 suggestions  context=text[max(0, match.offset-10):match.offset+match.errorLength+10] ) all_issues.append(issue) except Exception as e: logger.warning(f"Error checking line {line_number}: {e}") processing_time = time.time() - start_time report = FileReport( file_path=str(file_path), total_lines=len(text_lines), issues_found=len(all_issues), issues=all_issues, processing_time=processing_time ) self.reports.append(report) return report def validate_directory(self, directory: Path, pattern: str = "*.adoc") -> List[FileReport]: """ Validate all AsciiDoc files in a directory. Args: directory: Directory to scan pattern: File pattern to match (default: *.adoc) Returns: List of FileReport objects """ logger.info(f"Scanning directory: {directory}") if not directory.exists(): logger.error(f"Directory not found: {directory}") return [] # Find all matching files  adoc_files = list(directory.glob(pattern)) if not adoc_files: logger.warning(f"No {pattern} files found in {directory}") return [] logger.info(f"Found {len(adoc_files)} files to validate") # Validate each file  reports = [] for file_path in adoc_files: report = self.validate_file(file_path) reports.append(report) return reports def generate_report(self, output_file: Optional[str] = None) -> Dict: """Generate summary report of all validations.""" if not self.reports: return {"error": "No validation reports available"} total_files = len(self.reports) total_issues = sum(report.issues_found for report in self.reports) files_with_issues = sum(1 for report in self.reports if report.issues_found > 0) # Group issues by rule ID  issue_types = {} for report in self.reports: for issue in report.issues: rule_id = issue.rule_id if rule_id not in issue_types: issue_types[rule_id] = {"count": 0, "message": issue.message} issue_types[rule_id]["count"] += 1 # Create summary report  summary = { "validation_summary": { "total_files": total_files, "files_with_issues": files_with_issues, "total_issues": total_issues, "average_issues_per_file": round(total_issues / total_files, 2), }, "issue_breakdown": issue_types, "file_reports": [ { "file": report.file_path, "lines_checked": report.total_lines, "issues": report.issues_found, "processing_time": round(report.processing_time, 2) } for report in self.reports ] } # Save to file if requested  if output_file: try: with open(output_file, 'w', encoding='utf-8') as f: json.dump(summary, f, indent=2, ensure_ascii=False) logger.info(f"Report saved to {output_file}") except Exception as e: logger.error(f"Error saving report: {e}") return summary def print_detailed_report(self): """Print detailed validation report to console.""" if not self.reports: print("No validation reports available.") return print("\n" + "="*60) print("ASCIIDOC VALIDATION REPORT") print("="*60) total_issues = sum(report.issues_found for report in self.reports) files_with_issues = [r for r in self.reports if r.issues_found > 0] print(f"Files scanned: {len(self.reports)}") print(f"Files with issues: {len(files_with_issues)}") print(f"Total issues found: {total_issues}") print("\n" + "-"*60) # Show issues by file  for report in self.reports: if report.issues_found > 0: print(f"\n📄 {Path(report.file_path).name}") print(f" Issues: {report.issues_found}") for issue in report.issues[:5]: # Show first 5 issues  print(f" Line {issue.line_number}: {issue.message}") if issue.suggestions: suggestions = ", ".join(issue.suggestions) print(f" Suggestions: {suggestions}") if len(report.issues) > 5: print(f" ... and {len(report.issues) - 5} more issues") def main(): """Command-line interface for the validator.""" parser = argparse.ArgumentParser(description="Validate AsciiDoc files using LanguageTools") parser.add_argument("path", help="File or directory path to validate") parser.add_argument("--language", default="en-US", help="Language code (default: en-US)") parser.add_argument("--output", help="Output file for JSON report") parser.add_argument("--pattern", default="*.adoc", help="File pattern for directory scanning") args = parser.parse_args() validator = AsciiDocValidator(language=args.language) path = Path(args.path) if path.is_file(): # Validate single file  validator.validate_file(path) elif path.is_dir(): # Validate directory  validator.validate_directory(path, args.pattern) else: print(f"Error: Path {path} does not exist") return # Generate and display report  validator.print_detailed_report() if args.output: validator.generate_report(args.output) if __name__ == "__main__": main()

This validator demonstrates:

Working with file system paths
Text processing and filtering
Integration with external tools
Report generation
Command-line interface design

7.3. Using the AsciiDoc Validator

Create the following file named Test.adoc.

include::res/practical/Test.adoc

Run this program in a folder which contains Asciidoc (*.adoc) files to validate them.

python asciidoc_validator.py ~/git/content/TestContent

7.4. Ignoring Specific Words

When working with technical documentation, you often have specialized terms, product names, or abbreviations that should be ignored by the spell checker. You can create an external file to maintain a list of words to exclude from spell checking.

7.4.1. Creating an Ignore List

Create a file named ignored_words.txt with one word per line:

# Technical terms JFace SWT OSGi Maven Tycho IDE APIs AsciiDoc AsciiDoctor # Company and product names vogella Eclipse IntelliJ VSCode # Programming terms foreach classpath runtime workflow

7.4.2. Updated Validator Implementation

Here’s how to modify the validator to use the ignore list:

def load_ignored_words(file_path: str) -> set[str]: """Load words to ignore from a file.""" try: with open(file_path, 'r') as f: # Read lines, strip whitespace, and filter out comments and empty lines  return {line.strip() for line in f if line.strip() and not line.strip().startswith('#')} except FileNotFoundError: print(f"Warning: Ignore file {file_path} not found. No words will be ignored.") return set() def is_valid_word(word: str, ignored_words: set[str]) -> bool: """Check if a word should be validated.""" return word.lower() not in {w.lower() for w in ignored_words} # In your main validation function: ignored_words = load_ignored_words('ignored_words.txt') # When checking words, add: if not is_valid_word(word, ignored_words): continue # Skip this word

7.4.3. Usage with Ignored Words

Run the validator with the ignore list:

# The ignored_words.txt file will be loaded automatically python asciidoc_validator.py ~/git/content/TestContent

The validator will now skip any words found in the ignore list. This is particularly useful for:

Technical terms (e.g., JFace, OSGi)
Product names (e.g., Eclipse, IntelliJ)
Programming terminology
Company names and trademarks

Keep your ignored_words.txt under version control to share it with your team and maintain consistency across your documentation.

7.5. Best Practices Demonstrated

The validator example showcases important Python development practices:

Type hints: Make code more maintainable and self-documenting
Error handling: Graceful failure handling with informative messages
Logging: Proper logging for debugging and monitoring
Modular design: Functions and classes with single responsibilities
Documentation: Clear docstrings and comments
External libraries: Leveraging the Python ecosystem
Resource management: Proper cleanup and context managers

7.6. Extending the Validator

Consider these enhancements to deepen your learning:

Support for multiple document formats
Integration with CI/CD pipelines
Custom rule definitions
Batch processing of multiple directories
HTML report generation

8. Links and Literature

Python Homepage

PyDev Homepage

Python Tutorial

8.1. vogella Java example code

If you need more assistance we offer Online Training and Onsite training as well as consulting

See License for license information.

Home Tutorials Training Consulting Books Company Contact us

Get more...

Legal Privacy Policy

Python Programming - Tutorial with Practical Examples

Get more...