12
$\begingroup$

I have an xml file with song data from iTunes (https://dl.dropboxusercontent.com/u/1012958/iTunes%20Library.xml) and I'd like to import it to Mathematica to do some statistics. I get an XMLObject like this:

XMLObject[ "Document"][{XMLObject["Declaration"]["Version" -> "1.0", "Encoding" -> "UTF-8"], XMLObject["Doctype"]["plist", "Public" -> "-//Apple Computer//DTD PLIST 1.0//EN", "System" -> "http://www.apple.com/DTDs/PropertyList-1.0.dtd"]}, XMLElement[ "plist", {"version" -> "1.0"}, {XMLElement[ "dict", {}, {...}], {}, "Valid" -> True] 

How can I convert it to a Mathematica 10 dataset or table?

$\endgroup$
2
  • $\begingroup$ Have you tried SemanticImport? $\endgroup$ Commented Aug 30, 2014 at 21:08
  • $\begingroup$ Yes. It doesn't work. $\endgroup$ Commented Aug 31, 2014 at 18:18

2 Answers 2

19
$\begingroup$

We can start by importing the file as an XMLObject:

$url = "https://dl.dropboxusercontent.com/u/1012958/iTunes%20Library.xml"; $xml = Import[$url, {"XML", "XMLObject"}]; Short[$xml, 4] (* XMLObject[Document][ { XMLObject[Declaration][Version->1.0,Encoding->UTF-8] , XMLObject[Doctype][plist,Public->-//Apple Computer//DTD PLIST 1.0//EN,<<1>>} , XMLElement[plist,{version->1.0},{<<1>>}], {}, Valid->True ] *) 

The result is the XML document transformed into a Mathematica expression that is amenable to further transformation. Let's define a transformation from the XMLObject into a dataset. The iTunes file (an Apple plist file) is essentially a big nested hierarchy of associations, with the odd list thrown in:

itunesXmlToDataset[xml_] := Block[{XMLElement} , XMLElement["plist", _, {c_}] := Dataset @ c ; XMLElement["dict", _, c_] := <| Rule @@@ Partition[c, 2] |> ; XMLElement["array", _, c_] := c ; XMLElement["key"|"string"|"data", _, {c_}] := c ; XMLElement["integer", _, {c_}] := FromDigits @ c ; XMLElement["date", _, {c_}] := DateObject @ c ; XMLElement["true", _, {}] := True ; XMLElement["false", _, {}] := False ; XMLElement[t_, _, {c_, ___}] := (Message[itunesXmlToDataset::ignored, t]; c) ; xml[[2]] ] itunesXmlToDataset::ignored = "Ignored unexpected XML element: ``"; 

This will create the dataset we want:

$dataset = itunesXmlToDataset[$xml] 

dataset screenshot

We can then query, say, for a list of all tracks along with their album and artist:

$dataset["Tracks", All, {"Name", "Album", "Artist"}] 

dataset screenshot

... or perhaps for the playlists along with the number of songs in each:

$dataset[ "Playlists" , All , <| "Name" -> "Name", "Songs" -> "Playlist Items" /* Length |> ] 

dataset screenshot

$\endgroup$
5
$\begingroup$

Had to add the "ReadDTD" -> False option to

$xml = Import[ "/Volumes/WDC3TBRAID/Downloads/iTunes Music Library.xml", {"XML", "XMLObject"}, "ReadDTD" -> False] 

to get WReach's solution to work with Mathematica 11.0. Rest of his solution works fine after that.

$\endgroup$
1
  • $\begingroup$ Just hit the same issue with trying to import a MusicXML document in 10.4. Bug in the MMA code for reading DTDs? I got: XMLParserXMLGet::prserr: MalformedURLException: The URL used an unsupported protocol at Line: 2 Character: 127 in /Users/matthew/Documents/MuseScore2/Scores/WachetAufInD.xml. >> $\endgroup$ Commented Oct 8, 2016 at 9:44

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.