Friday 12 January 2018

M - Microsoft Power Query for Excel Formula Language

It seems that Power Query allows all manner of data sources to be queried and participate in a join. This gives the salivating prospect of joining an SQL relational query to a parsed JSON document. If true then that is truly remarkable. There is a query syntax language called Microsoft Power Query for Excel Formula Language for which a 115 page specification is available though a quick tour is also available.

But there is nothing like some examples, and I will focus here on JSON parsing examples because they arise from calling REST APIs.

Giving a JSON literal and drill into structure

So this a classic piece of JSON from http://json.org/example.html and we will drill into the structure. So below one can see the let clause and the in clause .

The let clause

The let clause is for the workings/evaluation of the logic. It is a series of one or more steps; each step is comma separated, one stores the output of a step into a variable. In the example given we can see that navigating a JSON node's child is done by using square brackets, e.g. [glossary].

The in clause
The in clause is strange because its specifies what to output! Anyway, one specifies a variable found in the let clause. Typically one would need to use the Record.ToTable() function to output something tabular.


let
    Source = Json.Document("{ ""glossary"": { ""title"": ""example glossary"", ""GlossDiv"": { ""title"": ""S"", ""GlossList"": { ""GlossEntry"": 
        { ""ID"": ""SGML"", ""SortAs"": ""SGML"", ""GlossTerm"": ""Standard Generalized Markup Language"", ""Acronym"": ""SGML"",
          ""Abbrev"": ""ISO 8879:1986"", ""GlossDef"": { ""para"": ""A meta-markup language, used to create markup languages such as DocBook."",
          ""GlossSeeAlso"": [""GML"", ""XML""] }, ""GlossSee"": ""markup"" } } } } }"),
    glossary = Source[glossary],
    GlossDiv = glossary[GlossDiv],
    GlossList = GlossDiv[GlossList],
    GlossEntry = GlossList[GlossEntry],
    ConvertedToTable = Record.ToTable(GlossEntry)
in
    ConvertedToTable


Attempting to extract sub properties

Here we extract the details of owners from a Stack Overflow REST API fragment.


let
    Source = "{""items"":[
       {""tags"":[""vba"",""permissions""],""owner"":
             {""reputation"":49,""user_id"":9073241,""user_type"":""registered"",""accept_rate"":86,""display_name"":""Kam""},
        ""is_answered"":false,""view_count"":4,""answer_count"":0,""score"":0,""question_id"":48229549},
       {""tags"":[""excel"",""vba"",""excel-vba""],""owner"": 
             {""reputation"":18,""user_id"":9057704,""user_type"":""registered"",""accept_rate"":29,""display_name"":""Gregory""},
        ""is_answered"":false,""view_count"":6,""answer_count"":0,""score"":0,""question_id"":48229590}
       ]}",
    #"Parsed JSON" = Json.Document(Source),
    items = #"Parsed JSON"[items],
    #"Converted to Table" = Table.FromList(items, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"owner"}, {"owner"}),
    #"Expanded owner" = Table.ExpandRecordColumn(#"Expanded Column1", "owner", {"reputation", "user_id", "user_type", "display_name"}, 
       {"reputation", "user_id", "user_type", "display_name"})
in
    #"Expanded owner"

Use a live REST API as source of data

So finally we can use a live source of data.


let
    Source = Web.Contents("https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&tagged=VBA&site=stackoverflow"),
    #"Parsed JSON" = Json.Document(Source),
    items = #"Parsed JSON"[items],
    #"Converted to Table" = Table.FromList(items, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"owner"}, {"Column1.owner"}),
    #"Expanded Column1.owner" = Table.ExpandRecordColumn(#"Expanded Column1", "Column1.owner", {"reputation", "user_id", 
                         "user_type", "profile_image", "display_name", "link", "accept_rate"}, {"Column1.owner.reputation", 
                          "Column1.owner.user_id", "Column1.owner.user_type", "Column1.owner.profile_image", 
                          "Column1.owner.display_name", "Column1.owner.link", "Column1.owner.accept_rate"})
in
    #"Expanded Column1.owner"

Investigation Notes

Someone set a huge bounty of 500 for a SO question which seems an outsized reward for solving an HTML download formatting glitch. I put my answer in here with a solution based on HTML DOM navigation. Clearly keen to see how my answers was faring I looked at the other answers. One other answer had a syntax that was new and strange to me, it was using a query with an OLE DB Provider of Microsoft.Mashup.OleDb.1

Googling for Microsoft.Mashup.OleDb.1 turns up path names of C:\Program Files (x86)\Microsoft Power Query for Excel which I did not have; so clearly I needed an install. Instructions for install are here at The Complete Guide to Installing Power Query - Excel Campus and here is the Microsoft download page.

After installing, I had a new PowerQuery tab on my Excel Menu. Unfortunately, changes to the Excel object model look to be from Excel 2016 versions onwards. So I cannot play with this in VBA code. No matter, I can use the GUI for the time being but where to start?

No comments:

Post a Comment