1

I need to store xml data

<pathway name="path:ko00010" org="ko" number="00010" title="Glycolysis / Gluconeogenesis" image="http://www.kegg.jp/kegg/pathway/ko/ko00010.png" link="http://www.kegg.jp/kegg-bin/show_pathway?ko00010"> <entry id="13" name="ko:K01623 ko:K01624 ko:K01622 ko:K11645 ko:K16305 ko:K16306" type="ortholog" reaction="rn:R01070" link="http://www.kegg.jp/dbget-bin/www_bget?K01623+K01624+K01622+K11645+K16305+K16306"> <graphics name="K01623..." fgcolor="#000000" bgcolor="#BFBFFF" type="rectangle" x="483" y="404" width="46" height="17"/> </entry> </pathway> 

into data structures for further use. DS like hash and arrays, This is my code

#!/usr/bin/perl use XML::LibXML; use strict; use warnings; my $parser = new XML::LibXML; my $xmlp= $parser -> parse_file("ko00010.xml"); my $rootel = $xmlp -> getDocumentElement(); my $elname = $rootel -> getName(); my @rootelements=$rootel -> getAttributes(); foreach my $rootatt(@rootelements){ my $name = $rootatt -> getName(); my $value = $rootatt -> getValue(); print " ${name}[$value]\n "; } my @kids = $rootel -> childNodes(); foreach my $child(@kids) { my $elname = $child -> getName(); my @atts = $child -> getAttributes(); foreach my $at (@atts) { my $name = $at -> getName(); my $value = $at -> getValue(); print " ${name}[$value]\n "; } } 

So far I have access to all elements except for Graphics nodes and its children

1
  • Cross-posted from perlmonks Commented Jul 18, 2013 at 17:48

4 Answers 4

6

Another completely different approach: use an XML schema, and use the CPAN module XML::Compile for automatic conversion of the XML data. In constrast to other xml-to-data tools like XML::Simple, XML::Compile does not have to guess or to be tweaked with options like "ForceArray", and there are no surprises if a subelement sometimes turns into an array and sometimes into a scalar.

If you don't have a XML schema for your data, then you may automatically create one with trang:

trang testdata.xml schema.xsd 

XML::Compile comes with the commandline tool xml2yaml for quick conversion:

xml2yaml testdata.xml schema.xsd > testdata.yaml 
Sign up to request clarification or add additional context in comments.

2 Comments

I really don't know how to proceed .Before posting this question I have tried so many different modules and I prefer to stick with this XML::lib
@shaq - Well, for this purpose, you are sticking w/the wrong(i.e. more difficult) module. libxml is good for some things, but for returning the data structure you want, there are more appropriate modules.
2

It's not clear to me what data structure you want to create exactly. or why you'd want to create data structures when you could be using XPath to get the data you need without having to map the XML into something else.

It looks to me like you're kinda trying to emulate what XML::Simple does. In this case, not use XML::Simple directly? I know it's not recommended in general for any complex XML, but if your XML is simple and if the data created by XML::Simple works for you, then it's probably safer for you to use a widely used module than to try to rewrite it (I should know, I rewrote it in XML::Twig, it's not specially difficult, but not necessarily completely trivial either).

Comments

1

You need to do

my @grand_kids = $child -> childNodes(); 

within your second foreach and do another step through the attributes

I have worked the example for you

#!/usr/bin/perl use XML::LibXML; use strict; use warnings; my $parser = new XML::LibXML; my $xmlp= $parser->parse_file("ko00010.xml"); my $rootel = $xmlp->getDocumentElement(); my $elname = $rootel->getName(); my @rootelements=$rootel->getAttributes(); foreach my $rootatt(@rootelements){ printf "R {%s}[%s]\t", $rootatt->getName(), $rootatt->getValue(); } my @kids = $rootel -> childNodes(); foreach my $child(@kids) { printf "\nCH = %s\n", $child->getName(); my @atts = $child->getAttributes(); foreach my $at (@atts) { printf "C {%s}[%s]\t", $at->getName(), $at->getValue(); } my @grand_kids=$child->childNodes(); foreach my $grand_child(@grand_kids) { printf "\nGR CH = %s\n", $grand_child->getName(); my @atts2 = $grand_child->getAttributes(); foreach my $at2 (@atts2) { printf "GC {%s}[%s]\t", $at2->getName(), $at2->getValue(); } } } 

giving this output - (I'm not sure where the #text nodes are coming from)

R {name}[path:ko00010] R {org}[ko] R {number}[00010] R {title}[Glycolysis / Gluconeogenesis] R {image}[http://www.kegg.jp/kegg/pathway/ko/ko00010.png] R {link}[http://www.kegg.jp/kegg-bin/show_pathway?ko00010] CH = #text CH = entry C {id}[13] C {name}[ko:K01623 ko:K01624 ko:K01622 ko:K11645 ko:K16305 ko:K16306] C {type}[ortholog] C {reaction}[rn:R01070] C {link}[http://www.kegg.jp/dbget-bin/www_bget?K01623+K01624+K01622+K11645+K16305+K16306] GR CH = #text GR CH = graphics GC {name}[K01623...] GC {fgcolor}[#000000] GC {bgcolor}[#BFBFFF] GC {type}[rectangle] GC {x}[483] GC {y}[404] GC {width}[46] GC {height}[17] GR CH = #text CH = #text 

2 Comments

I tried this, but in foreach my $child(@kids) Graphics is not one of the $child in @kids. I mean childnodes() does not return graphics as the nodes in my @kids = $rootel -> childNodes();
Thanks for your answer, tried to use xml simple and it was indeed simple and I could manage to pars all the elements
0

XML::Simple will work, but it is also recommended to use LibXML. Here is a perlmonks article on some notable differences and converting from XML::Simple to LibXML.

One way to do it using LibXML using XPathContext and findnodes:

use strict; use warnings; use XML::LibXML; use Data::Dumper; my $parser = XML::LibXML->new(); my $doc = $parser->parse_file("ko00010.xml"); my $root = $doc->getDocumentElement(); my %nodeHash = (); # get list of nodes and stores each nodeName(key) and textContent(value) in %nodeHash my $perlmatch = sub { die "Not a nodelist" unless $_[0]->isa('XML::LibXML::NodeList'); die "Missing a regular expression" unless defined $_[1]; my $i = 0; while ( my $node = $_[0]->get_node($i++) ) { push @{ $nodeHash{$node->nodeName} }, $node->textContent; } }; # Create XPathContext and find all nodes my $xc = XML::LibXML::XPathContext->new($root); $xc->registerFunction( 'perlmatch', $perlmatch ); # register 'perlmatch' function $xc->findnodes('perlmatch(//*, ".")') or die "Error retrieving nodes."; # //* is to go through all parent and child nodes, "." to match any nodeName print Dumper(%nodeHash); # print the contents of nodeHash (you can see the final hash structure here) 

Taken from the example on CPAN XML::LibXML::XPath (replaced with hash instead of array and "." for all nodes).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.