Revisions to Clairvoyance in caching - optimal strategies?

replaced http://programmers.stackexchange.com/ with https://softwareengineering.stackexchange.com/

edited Apr 12, 2017 at 7:31

1

This question relates to the question here here, but I'll generalise it so that you can answer effectively without reading all of that.

Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

_{(Sized such that access was efficiently processed by virtual memory managers etc.)}.

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

Update: should note, I'm caching input data - the outputs of the actual application function have no (helpful) relation between iterations.

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.

Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

_{(Sized such that access was efficiently processed by virtual memory managers etc.)}.

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

Update: should note, I'm caching input data - the outputs of the actual application function have no (helpful) relation between iterations.

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.

Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

_{(Sized such that access was efficiently processed by virtual memory managers etc.)}.

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

Update: should note, I'm caching input data - the outputs of the actual application function have no (helpful) relation between iterations.

added 148 characters in body

Source Link

edited Apr 18, 2016 at 21:28

Lamar Latrell

149
5

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.

Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

_{(Sized such that access was efficiently processed by virtual memory managers etc.)}.

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

Update: should note, I'm caching input data - the outputs of the actual application function have no (helpful) relation between iterations.

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.

Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

_{(Sized such that access was efficiently processed by virtual memory managers etc.)}.

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.

Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

_{(Sized such that access was efficiently processed by virtual memory managers etc.)}.

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

Update: should note, I'm caching input data - the outputs of the actual application function have no (helpful) relation between iterations.

deleted 10 characters in body

Source Link

edited Apr 18, 2016 at 21:09

Lamar Latrell

149
5

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.

Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

_{(Sized such that access was efficiently processed by virtual memory managers etc.)}.

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

Addendum: It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.

Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

_{(Sized such that access was efficiently processed by virtual memory managers etc.)}.

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

Addendum: It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.

Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

_{(Sized such that access was efficiently processed by virtual memory managers etc.)}.

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

added 173 characters in body

Source Link

edited Apr 18, 2016 at 20:49

Lamar Latrell

149
5

Loading

added 173 characters in body

Source Link

edited Apr 18, 2016 at 20:48

Lamar Latrell

149
5

Loading

deleted 2 characters in body; edited title

Source Link

edited Apr 18, 2016 at 20:42

Lamar Latrell

149
5

Loading

Source Link

asked Apr 18, 2016 at 20:31

Lamar Latrell

149
5

Loading

Stack Exchange Network

Return to Question

Context:

Questions:

Context:

Questions:

Context:

Questions:

Context:

Questions:

Context:

Questions:

Context:

Questions:

Context:

Questions:

Context:

Questions:

Context:

Questions: