I have a huge XML file that I want to split into chunks based on the product type attribute.
I don't know how to use XSLT. I found xml_split but can't figure out how to use it with a regex or XPath to split the document depending on the type attribute
<?xml version="1.0"?> <!DOCTYPE catalog SYSTEM "catalog.dtd"> <catalog> <product type="cloths" product_image="cardigan.jpg"> <catalog_item gender="Men's"> <item_number>QWZ5671</item_number> <price>39.95</price> <size description="Medium"> <color_swatch image="red_cardigan.jpg">Red</color_swatch> <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch> </size> <size description="Large"> <color_swatch image="red_cardigan.jpg">Red</color_swatch> <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch> </size> </catalog_item> <catalog_item gender="Women's"> <item_number>RRX9856</item_number> <price>42.50</price> <size description="Small"> <color_swatch image="red_cardigan.jpg">Red</color_swatch> <color_swatch image="navy_cardigan.jpg">Navy</color_swatch> <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch> </size> <size description="Medium"> <color_swatch image="red_cardigan.jpg">Red</color_swatch> <color_swatch image="navy_cardigan.jpg">Navy</color_swatch> <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch> <color_swatch image="black_cardigan.jpg">Black</color_swatch> </size> <size description="Large"> <color_swatch image="navy_cardigan.jpg">Navy</color_swatch> <color_swatch image="black_cardigan.jpg">Black</color_swatch> </size> <size description="Extra Large"> <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch> <color_swatch image="black_cardigan.jpg">Black</color_swatch> </size> </catalog_item> </product> </catalog> I used this command
xml_split -c /catalog/product[@type='cloths'] products.xml but it reproduces the complete XML document without the XPath filtering.
product' and that's everything in your XML. Which elements are you trying to separate out?<product type="X">where X is cloths, electronics, .. etc so this is just a single part of a product type but I want to split this 400k to many chunks based on the type attribute.productelement? And are they unique? (is there only oneproductelement of each type?)grep -c '<product type="cloths" products.xml // output : 8039and it is 400 MB :)