Skip to main content
edited tags
Link
added 715 characters in body
Source Link

Addendum 2: It has been pointed out that setting up the price data as a dataset as above creates a problem. A better arrangement is to define the stock dataset as follows:

stockDataset = Dataset[<|"AAP" -> <|"Name" -> "Advance Auto parts Inc.", "Prices" -> Normal@priceDataset|>|>] 

Then queries in the "natural" form work fine, for example:

stockDataset["AAP", "Prices", Max, "Close"] 200.023 

This is better than my first attempt, but I still think the price data needs to be indexed by date, so that cross-sectional analysis can be carried out more easily.

I need to amend this code before creating the price dataset:

AssociationThread[colheads -> #] & /@ pricedata 

Addendum 2: It has been pointed out that setting up the price data as a dataset as above creates a problem. A better arrangement is to define the stock dataset as follows:

stockDataset = Dataset[<|"AAP" -> <|"Name" -> "Advance Auto parts Inc.", "Prices" -> Normal@priceDataset|>|>] 

Then queries in the "natural" form work fine, for example:

stockDataset["AAP", "Prices", Max, "Close"] 200.023 

This is better than my first attempt, but I still think the price data needs to be indexed by date, so that cross-sectional analysis can be carried out more easily.

I need to amend this code before creating the price dataset:

AssociationThread[colheads -> #] & /@ pricedata 
added 1333 characters in body
Source Link

I could add the stock ticker in another column to the above dataset and then concatenate the datasets for all the stocks together to cretaecreate a very large rectangular dataset. This would be inefficient and, besides, there is other information one would like to append for each ticker symbol, such as, perhaps, the company name, sector, etc. So a hierarchical format appears more appropriate, with the stock ticker as the primary key.

If I succeed in getting this done I'll share the final code here, as I expect others may have a similar interest to my own.

Addendum: What's slightly tricky about this is that I can't find much in the documentation about how to create datasets - most of the examples relate to querying an already existing dataset, or creating very simple non-hierarchical datasets.

I worked on the planets dataset example using

planets//Normal 

to better understand the structure.

From which, the following suggests itself:

stockDataset = Dataset[<|"AAP" -> <|"Name" -> "Advance Auto parts Inc.", "Prices" -> priceDataset|>|>] 

This does indeed produce a hierarchical structure: enter image description here

This is sort of ok. But natural queries fail:

stockDataset["AAP", "Prices"] Dataset[] 

or:

stockDataset["AAP", "Prices", 1, "Close"] Missing["PartInvalid", "Close"] 

You have to use queries in this kind of format:

stockDataset["AAP", "Prices"][Max, "Close"] 200.023 

The reason is that the prices dataset is a simple table, without a key.
Thinking about it, it would surely be better to use date as a key in constructing the pricesDataset. Then natural queries in the first format would work. More importantly, you are going to need to be able to key on dates in order to construct cross-sectional datasets.

I could add the stock ticker in another column to the above dataset and then concatenate the datasets for all the stocks together to cretae a very large rectangular dataset. This would be inefficient and, besides, there is other information one would like to append for each ticker symbol, such as, perhaps, the company name, sector, etc. So a hierarchical format appears more appropriate, with the stock ticker as the primary key.

If I succeed in getting this done I'll share the final code here, as I expect others may have a similar interest to my own.

I could add the stock ticker in another column to the above dataset and then concatenate the datasets for all the stocks together to create a very large rectangular dataset. This would be inefficient and, besides, there is other information one would like to append for each ticker symbol, such as, perhaps, the company name, sector, etc. So a hierarchical format appears more appropriate, with the stock ticker as the primary key.

If I succeed in getting this done I'll share the final code here, as I expect others may have a similar interest to my own.

Addendum: What's slightly tricky about this is that I can't find much in the documentation about how to create datasets - most of the examples relate to querying an already existing dataset, or creating very simple non-hierarchical datasets.

I worked on the planets dataset example using

planets//Normal 

to better understand the structure.

From which, the following suggests itself:

stockDataset = Dataset[<|"AAP" -> <|"Name" -> "Advance Auto parts Inc.", "Prices" -> priceDataset|>|>] 

This does indeed produce a hierarchical structure: enter image description here

This is sort of ok. But natural queries fail:

stockDataset["AAP", "Prices"] Dataset[] 

or:

stockDataset["AAP", "Prices", 1, "Close"] Missing["PartInvalid", "Close"] 

You have to use queries in this kind of format:

stockDataset["AAP", "Prices"][Max, "Close"] 200.023 

The reason is that the prices dataset is a simple table, without a key.
Thinking about it, it would surely be better to use date as a key in constructing the pricesDataset. Then natural queries in the first format would work. More importantly, you are going to need to be able to key on dates in order to construct cross-sectional datasets.

Source Link
Loading