Revisions to Creating a Stock Dataset

edited tags

Link

edited Apr 20, 2017 at 1:02

J. M.'s missing motivation

126.6k
11
411
590

added 715 characters in body

Source Link

edited Nov 5, 2016 at 16:59

Jonathan Kinlay

1.3k
7
17

Addendum 2: It has been pointed out that setting up the price data as a dataset as above creates a problem. A better arrangement is to define the stock dataset as follows:

stockDataset = Dataset[<|"AAP" -> <|"Name" -> "Advance Auto parts Inc.", "Prices" -> Normal@priceDataset|>|>]

Then queries in the "natural" form work fine, for example:

stockDataset["AAP", "Prices", Max, "Close"] 200.023

This is better than my first attempt, but I still think the price data needs to be indexed by date, so that cross-sectional analysis can be carried out more easily.

I need to amend this code before creating the price dataset:

AssociationThread[colheads -> #] & /@ pricedata

Addendum 2: It has been pointed out that setting up the price data as a dataset as above creates a problem. A better arrangement is to define the stock dataset as follows:

stockDataset = Dataset[<|"AAP" -> <|"Name" -> "Advance Auto parts Inc.", "Prices" -> Normal@priceDataset|>|>]

Then queries in the "natural" form work fine, for example:

stockDataset["AAP", "Prices", Max, "Close"] 200.023

This is better than my first attempt, but I still think the price data needs to be indexed by date, so that cross-sectional analysis can be carried out more easily.

I need to amend this code before creating the price dataset:

AssociationThread[colheads -> #] & /@ pricedata

added 1333 characters in body

Source Link

edited Nov 5, 2016 at 13:58

Jonathan Kinlay

1.3k
7
17

I could add the stock ticker in another column to the above dataset and then concatenate the datasets for all the stocks together to cretaecreate a very large rectangular dataset. This would be inefficient and, besides, there is other information one would like to append for each ticker symbol, such as, perhaps, the company name, sector, etc. So a hierarchical format appears more appropriate, with the stock ticker as the primary key.

If I succeed in getting this done I'll share the final code here, as I expect others may have a similar interest to my own.

Addendum: What's slightly tricky about this is that I can't find much in the documentation about how to create datasets - most of the examples relate to querying an already existing dataset, or creating very simple non-hierarchical datasets.

I worked on the planets dataset example using

planets//Normal

to better understand the structure.

From which, the following suggests itself:

stockDataset = Dataset[<|"AAP" -> <|"Name" -> "Advance Auto parts Inc.", "Prices" -> priceDataset|>|>]

This does indeed produce a hierarchical structure:

This is sort of ok. But natural queries fail:

stockDataset["AAP", "Prices"] Dataset[]

or:

stockDataset["AAP", "Prices", 1, "Close"] Missing["PartInvalid", "Close"]

You have to use queries in this kind of format:

stockDataset["AAP", "Prices"][Max, "Close"] 200.023

The reason is that the prices dataset is a simple table, without a key.
Thinking about it, it would surely be better to use date as a key in constructing the pricesDataset. Then natural queries in the first format would work. More importantly, you are going to need to be able to key on dates in order to construct cross-sectional datasets.

I could add the stock ticker in another column to the above dataset and then concatenate the datasets for all the stocks together to cretae a very large rectangular dataset. This would be inefficient and, besides, there is other information one would like to append for each ticker symbol, such as, perhaps, the company name, sector, etc. So a hierarchical format appears more appropriate, with the stock ticker as the primary key.

If I succeed in getting this done I'll share the final code here, as I expect others may have a similar interest to my own.

I could add the stock ticker in another column to the above dataset and then concatenate the datasets for all the stocks together to create a very large rectangular dataset. This would be inefficient and, besides, there is other information one would like to append for each ticker symbol, such as, perhaps, the company name, sector, etc. So a hierarchical format appears more appropriate, with the stock ticker as the primary key.

If I succeed in getting this done I'll share the final code here, as I expect others may have a similar interest to my own.

Addendum: What's slightly tricky about this is that I can't find much in the documentation about how to create datasets - most of the examples relate to querying an already existing dataset, or creating very simple non-hierarchical datasets.

I worked on the planets dataset example using

planets//Normal

to better understand the structure.

From which, the following suggests itself:

stockDataset = Dataset[<|"AAP" -> <|"Name" -> "Advance Auto parts Inc.", "Prices" -> priceDataset|>|>]

This does indeed produce a hierarchical structure:

This is sort of ok. But natural queries fail:

stockDataset["AAP", "Prices"] Dataset[]

or:

stockDataset["AAP", "Prices", 1, "Close"] Missing["PartInvalid", "Close"]

You have to use queries in this kind of format:

stockDataset["AAP", "Prices"][Max, "Close"] 200.023

The reason is that the prices dataset is a simple table, without a key.
Thinking about it, it would surely be better to use date as a key in constructing the pricesDataset. Then natural queries in the first format would work. More importantly, you are going to need to be able to key on dates in order to construct cross-sectional datasets.

Source Link

asked Nov 4, 2016 at 13:47

Jonathan Kinlay

1.3k
7
17

Loading

Stack Exchange Network

Return to Question