Skip to main content
added 600 characters in body
Source Link
Kuba
  • 138.9k
  • 13
  • 297
  • 803

"BodyBytes" is a binary representation of gzip response json

  • a list of bytes from http response. Does not mean much without encoding information

"BodyByteArray" is ByteArray @ #BodyBytes

  • afaict "BodyBytes" wrapped with ByteArray. (1.)

"Body"

  • afaict, a String - ToCharacterCode[#BodyBytes, encoding], where encoding is read from charset content-type header.

That is ~a problem for us. First of all it ignores ToCharacterCode[#BodyBytes, encoding]content-encoding. Additionally json does not need charset sub-header but forcing encoding towithout it it won't be UTF-8 as specified in a header throws errors, as wellrecognized as directly fetchingutf8 "Body"(in case without gzip). Don't know whose fault isif that is expected, probably deserves a spearate question.

This seems to workSo, the safe way (4.) is through bytes, e.g.:

ImportString[URLRead[ FromCharacterCode[Normal @"https://api.stackexchange.com/2.2/info?site=mathematica"  reply["BodyBytesArray"]] , "BodyBytes" ] // FromCharacterCode // ImportString[#, {"gzip", "RawJSON"}  ] & (*3.*) 
<|"items" -> {<|"new_active_users" -> 0, "total_users" -> 31928, 
 "badges_per_minute" -> 0.03, "total_badges" -> 92485, "total_votes" -> 567164, "total_comments" -> 325306, "answers_per_minute" -> 0.02, "questions_per_minute" -> 0.01, "total_answers" -> 63637, "total_accepted" -> 23610, "total_unanswered" -> 4679, "total_questions" -> 43005, "api_revision" -> "2017.5.3.25597"|>}, "has_more" -> False, "quota_max" -> ..., "quota_remaining" -> ...|> 

But as soon as I try to enforce UTF8 it throws errors I don't have time to investigate.


  1. What is the intended purpose of ByteArray

  2. Importing a Base64 encoded string

  3. SO: Content-Encoding vs charset

  4. Who is to blame: parsing UTF8 encoded JSON HTTPResponse fails

"BodyBytes" is a binary representation of gzip response json

"BodyByteArray" is ByteArray @ #BodyBytes

"Body" is ~ ToCharacterCode[#BodyBytes, encoding] but forcing encoding to be UTF-8 as specified in a header throws errors, as well as directly fetching "Body". Don't know whose fault is that.

This seems to work

ImportString[ FromCharacterCode[Normal @ reply["BodyBytesArray"]] , {"gzip", "RawJSON"}  ] 
<|"items" -> {<|"new_active_users" -> 0, "total_users" -> 31928, 
 "badges_per_minute" -> 0.03, "total_badges" -> 92485, "total_votes" -> 567164, "total_comments" -> 325306, "answers_per_minute" -> 0.02, "questions_per_minute" -> 0.01, "total_answers" -> 63637, "total_accepted" -> 23610, "total_unanswered" -> 4679, "total_questions" -> 43005, "api_revision" -> "2017.5.3.25597"|>}, "has_more" -> False, "quota_max" -> ..., "quota_remaining" -> ...|> 

But as soon as I try to enforce UTF8 it throws errors I don't have time to investigate.

"BodyBytes"

  • a list of bytes from http response. Does not mean much without encoding information

"BodyByteArray"

  • afaict "BodyBytes" wrapped with ByteArray. (1.)

"Body"

  • afaict, a String - ToCharacterCode[#BodyBytes, encoding], where encoding is read from charset content-type header.

That is a problem for us. First of all it ignores content-encoding. Additionally json does not need charset sub-header but without it it won't be recognized as utf8 (in case without gzip). Don't know if that is expected, probably deserves a spearate question.

So, the safe way (4.) is through bytes, e.g.:

URLRead[ "https://api.stackexchange.com/2.2/info?site=mathematica"   , "BodyBytes" ] // FromCharacterCode // ImportString[#, {"gzip", "RawJSON"}] & (*3.*) 
<|"items" -> {<|"new_active_users" -> 0, "total_users" -> 31928, 
 "badges_per_minute" -> 0.03, "total_badges" -> 92485, "total_votes" -> 567164, "total_comments" -> 325306, "answers_per_minute" -> 0.02, "questions_per_minute" -> 0.01, "total_answers" -> 63637, "total_accepted" -> 23610, "total_unanswered" -> 4679, "total_questions" -> 43005, "api_revision" -> "2017.5.3.25597"|>}, "has_more" -> False, "quota_max" -> ..., "quota_remaining" -> ...|> 

  1. What is the intended purpose of ByteArray

  2. Importing a Base64 encoded string

  3. SO: Content-Encoding vs charset

  4. Who is to blame: parsing UTF8 encoded JSON HTTPResponse fails

added 118 characters in body
Source Link
Kuba
  • 138.9k
  • 13
  • 297
  • 803

"BodyBytes" is a binary representation of gzip response json

"BodyByteArray" is ByteArray @ #BodyBytes

"Body" is ~ ToCharacterCode[#BodyBytes, encoding] but forcing encoding to be UTF-8 as specified in a header throws errors, as well as directly fetching "Body". Don't know whose fault is that.

This seems to work

ImportString[ FromCharacterCode[Normal @ reply["BodyBytesArray"]] , {"gzip", "RawJSON"} ] 
<|"items" -> {<|"new_active_users" -> 0, "total_users" -> 31928, 
 "badges_per_minute" -> 0.03, "total_badges" -> 92485, "total_votes" -> 567164, "total_comments" -> 325306, "answers_per_minute" -> 0.02, "questions_per_minute" -> 0.01, "total_answers" -> 63637, "total_accepted" -> 23610, "total_unanswered" -> 4679, "total_questions" -> 43005, "api_revision" -> "2017.5.3.25597"|>}, "has_more" -> False, "quota_max" -> ..., "quota_remaining" -> ...|> 

But as soon as I try to enforce UTF8 it throws errors I don't have time to investigate.

This seems to work

ImportString[ FromCharacterCode[Normal @ reply["BodyBytesArray"]] , {"gzip", "RawJSON"} ] 
<|"items" -> {<|"new_active_users" -> 0, "total_users" -> 31928, 
 "badges_per_minute" -> 0.03, "total_badges" -> 92485, "total_votes" -> 567164, "total_comments" -> 325306, "answers_per_minute" -> 0.02, "questions_per_minute" -> 0.01, "total_answers" -> 63637, "total_accepted" -> 23610, "total_unanswered" -> 4679, "total_questions" -> 43005, "api_revision" -> "2017.5.3.25597"|>}, "has_more" -> False, "quota_max" -> ..., "quota_remaining" -> ...|> 

But as soon as I try to enforce UTF8 it throws errors I don't have time to investigate.

"BodyBytes" is a binary representation of gzip response json

"BodyByteArray" is ByteArray @ #BodyBytes

"Body" is ~ ToCharacterCode[#BodyBytes, encoding] but forcing encoding to be UTF-8 as specified in a header throws errors, as well as directly fetching "Body". Don't know whose fault is that.

This seems to work

ImportString[ FromCharacterCode[Normal @ reply["BodyBytesArray"]] , {"gzip", "RawJSON"} ] 
<|"items" -> {<|"new_active_users" -> 0, "total_users" -> 31928, 
 "badges_per_minute" -> 0.03, "total_badges" -> 92485, "total_votes" -> 567164, "total_comments" -> 325306, "answers_per_minute" -> 0.02, "questions_per_minute" -> 0.01, "total_answers" -> 63637, "total_accepted" -> 23610, "total_unanswered" -> 4679, "total_questions" -> 43005, "api_revision" -> "2017.5.3.25597"|>}, "has_more" -> False, "quota_max" -> ..., "quota_remaining" -> ...|> 

But as soon as I try to enforce UTF8 it throws errors I don't have time to investigate.

Source Link
Kuba
  • 138.9k
  • 13
  • 297
  • 803

This seems to work

ImportString[ FromCharacterCode[Normal @ reply["BodyBytesArray"]] , {"gzip", "RawJSON"} ] 
<|"items" -> {<|"new_active_users" -> 0, "total_users" -> 31928, 
 "badges_per_minute" -> 0.03, "total_badges" -> 92485, "total_votes" -> 567164, "total_comments" -> 325306, "answers_per_minute" -> 0.02, "questions_per_minute" -> 0.01, "total_answers" -> 63637, "total_accepted" -> 23610, "total_unanswered" -> 4679, "total_questions" -> 43005, "api_revision" -> "2017.5.3.25597"|>}, "has_more" -> False, "quota_max" -> ..., "quota_remaining" -> ...|> 

But as soon as I try to enforce UTF8 it throws errors I don't have time to investigate.