A MediaWiki wikitext parser for Elixir that converts wikitext markup into structured AST nodes. WikitextEx supports templates, links, formatting, tables, and other wikitext elements commonly found in MediaWiki content.
⚠️ Current Status: Beta - WikitextEx successfully parses many common wikitext patterns but has known limitations with certain edge cases. See Known Limitations for details.
- Complete Wikitext Support: Parse headers, links, templates, formatting (bold/italic), lists, tables, and more
- Structured AST: Clean, typed AST nodes for easy manipulation and analysis
- Template Parsing: Full support for MediaWiki template syntax with named and positional arguments
- HTML Tag Support: Parse HTML tags, comments, and special tags like
<ref>and<nowiki> - Table Parsing: Complete MediaWiki table syntax support with headers and data cells
- Robust Formatting: Handles complex bold/italic combinations and nested formatting
- Link Types: Support for internal links, categories, files, and interlanguage links
Add wikitext_ex to your list of dependencies in mix.exs:
def deps do [ {:wikitext_ex, "~> 0.1.0"} ] end# Parse simple wikitext {:ok, ast, _, _, _, _} = WikitextEx.Parser.parse("'''Bold text''' and ''italic text''") # The result is a list of AST nodes [ %WikitextEx.AST{type: :bold, children: [%WikitextEx.AST{type: :text, value: %WikitextEx.AST.Text{content: "Bold text"}}]}, %WikitextEx.AST{type: :text, value: %WikitextEx.AST.Text{content: " and "}}, %WikitextEx.AST{type: :italic, children: [%WikitextEx.AST{type: :text, value: %WikitextEx.AST.Text{content: "italic text"}}]} ]# Parse templates with arguments {:ok, ast, _, _, _, _} = WikitextEx.Parser.parse("{{template|arg1|key=value}}") # Results in template AST node %WikitextEx.AST{ type: :template, value: %WikitextEx.AST.Template{ name: "template", args: [ {:positional, "arg1"}, {:named, %{"key" => "value"}} ] } }# Parse various link types {:ok, ast, _, _, _, _} = WikitextEx.Parser.parse("[[Article]] [[Category:Example]] [[File:image.jpg|thumb]]") # Results in different AST node types: # - :link for regular internal links # - :category for category links # - :file for file/media links# Parse headers {:ok, ast, _, _, _, _} = WikitextEx.Parser.parse("== Section Header ==") %WikitextEx.AST{ type: :header, value: %WikitextEx.AST.Header{level: 2}, children: [%WikitextEx.AST{type: :text, value: %WikitextEx.AST.Text{content: "Section Header"}}] }wikitext = """ {| ! Header 1 !! Header 2 |- | Cell 1 || Cell 2 |} """ {:ok, ast, _, _, _, _} = WikitextEx.Parser.parse(wikitext) # Results in table AST with rows and cellsWikitextEx produces a structured AST where each node follows this pattern:
%WikitextEx.AST{ type: atom(), # The type of element (:text, :template, :link, etc.) value: struct() | nil, # Type-specific data (e.g., %AST.Text{content: "..."}) children: [%AST{}] # Nested AST nodes }:text- Plain text content:header- Headers (=, ==, ===, etc.):template- Template invocations ({{template|args}}):link- Internal wiki links ([[Page]]):category- Category links ([[Category:Name]]):file- File/media links ([[File:image.jpg]]):interlang_link- Interlanguage links ([[de:Page]]):bold- Bold formatting ('''text'''):italic- Italic formatting (''text''):list_item- List items (* or #):table- Tables ({| ... |}):table_row- Table rows:table_cell- Table cells (header or data):html_tag- HTML tags (,, etc.):ref- Reference tags ():comment- HTML comments ():nowiki- Nowiki sections ()
# Extract text content from headers or other containers WikitextEx.AST.text_content(ast_node.children) # Navigate the AST tree defmodule WikitextWalker do def find_templates(ast_nodes) do Enum.flat_map(ast_nodes, fn %WikitextEx.AST{type: :template} = node -> [node] %WikitextEx.AST{children: children} -> find_templates(children) _ -> [] end) end enddefmodule WikitextProcessor do def extract_links(wikitext) do case WikitextEx.Parser.parse(wikitext) do {:ok, ast, _, _, _, _} -> ast |> find_links() |> Enum.map(& &1.value.target) {:error, _} -> [] end end defp find_links(ast_nodes) do Enum.flat_map(ast_nodes, fn %WikitextEx.AST{type: :link} = node -> [node] %WikitextEx.AST{children: children} -> find_links(children) _ -> [] end) end end# Clone the repository git clone https://github.com/your-username/wikitext_ex.git cd wikitext_ex # Install dependencies mix deps.get # Run tests mix test # Generate documentation mix docsWikitextEx works well for typical wiki content, but has some known limitations:
- Complex whitespace handling: Some complex whitespace patterns may not parse correctly
- Deeply nested structures: Very deeply nested content may cause parsing issues
- Advanced MediaWiki syntax: Some advanced or rarely-used MediaWiki features are not yet supported
- Large content blocks: Performance may degrade with extremely large wikitext files
- The parser may return partial results with unparsed content in the
restfield for complex edge cases - Most common wikitext patterns parse successfully
- Test with your content: Always test WikitextEx with your specific wikitext before production use
- Handle partial parsing: Check the
restfield in parse results for unparsed content - Report issues: Please report parsing failures with examples to help improve the parser
WikitextEx includes a comprehensive test suite with 58 tests covering various wikitext patterns and edge cases. Run the tests with:
mix test- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Write tests for your changes
- Ensure all tests pass (
mix test) - Commit your changes (
git commit -am 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with NimbleParsec for robust parsing
- Inspired by MediaWiki's wikitext specification
- Designed for use with Wikipedia and other MediaWiki-based wikis