5
$\begingroup$

When calling the Stack Exchange API v2.2 using the new (Mma11) function URLRead we get a GZIP encoded Body in alternative formats: "Body", "BodyByteArray" and "BodyBytes" depending on the options.

How can we deal with each of these formats in order to decode the body in Mathematica?

I am particularly intrigued with operating on ByteArray

I know I could get the content using

Import[URLBuild[{"https://api.stackexchange.com", "2.2", "info"}, {"site" -> "mathematica"}], "RawJSON"] 

My questions is about dealing explicitly with list of byte values as well as ByteArray that are encoded. Hopefully without creating a temporary file, to then read it back.

The code I'm using:

reply=URLRead[ URLBuild[{"https://api.stackexchange.com", "2.2", "info"}, {"site" -> "mathematica"}] , {"Headers", "StatusCode", "StatusCodeDescription", "ContentType", "BodyByteArray"}] 
<|"Headers" -> { "cache-control" -> "private" , "content-type" -> "application/json; charset=utf-8" , "content-encoding" -> "gzip" , "access-control-allow-origin" -> "*" , "access-control-allow-methods" -> "GET, POST" , "access-control-allow-credentials" -> "false" , "x-content-type-options" -> "nosniff" , "date" -> "Thu, 11 May 2017 10:55:51 GMT" , "content-length" -> "234" } , "StatusCode" -> 200 , "StatusCodeDescription" -> "OK" , "ContentType" -> "application/json; charset=utf-8" , "BodyBytesArray" -> ByteArray[< 234 >] |> 

By the way, strangely to me, Import[URLRead[url], "RawJSON"] doesn't work.

$\endgroup$
1
  • 2
    $\begingroup$ related: 45282 $\endgroup$ Commented May 11, 2017 at 12:53

1 Answer 1

3
$\begingroup$

"BodyBytes"

  • a list of bytes from http response. Does not mean much without encoding information

"BodyByteArray"

  • afaict "BodyBytes" wrapped with ByteArray. (1.)

"Body"

  • afaict, a String - ToCharacterCode[#BodyBytes, encoding], where encoding is read from charset content-type header.

    That is a problem for us. First of all it ignores content-encoding. Additionally json does not need charset sub-header but without it it won't be recognized as utf8 (in case without gzip). Don't know if that is expected, probably deserves a spearate question.

So, the safe way (4.) is through bytes, e.g.:

URLRead[ "https://api.stackexchange.com/2.2/info?site=mathematica" , "BodyBytes" ] // FromCharacterCode // ImportString[#, {"gzip", "RawJSON"}] & (*3.*) 
<|"items" -> {<|"new_active_users" -> 0, "total_users" -> 31928, "badges_per_minute" -> 0.03, "total_badges" -> 92485, "total_votes" -> 567164, "total_comments" -> 325306, "answers_per_minute" -> 0.02, "questions_per_minute" -> 0.01, "total_answers" -> 63637, "total_accepted" -> 23610, "total_unanswered" -> 4679, "total_questions" -> 43005, "api_revision" -> "2017.5.3.25597"|>}, "has_more" -> False, "quota_max" -> ..., "quota_remaining" -> ...|> 

  1. What is the intended purpose of ByteArray

  2. Importing a Base64 encoded string

  3. SO: Content-Encoding vs charset

  4. Who is to blame: parsing UTF8 encoded JSON HTTPResponse fails

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.