Skip to main content
replaced http://programmers.stackexchange.com/ with https://softwareengineering.stackexchange.com/
Source Link

This question relates to the question herehere, but I'll generalise it so that you can answer effectively without reading all of that.


Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

(Sized such that access was efficiently processed by virtual memory managers etc.).

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

  1. Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
  2. Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?


Update: should note, I'm caching input data - the outputs of the actual application function have no (helpful) relation between iterations.

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.


Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

(Sized such that access was efficiently processed by virtual memory managers etc.).

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

  1. Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
  2. Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?


Update: should note, I'm caching input data - the outputs of the actual application function have no (helpful) relation between iterations.

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.


Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

(Sized such that access was efficiently processed by virtual memory managers etc.).

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

  1. Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
  2. Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?


Update: should note, I'm caching input data - the outputs of the actual application function have no (helpful) relation between iterations.

added 148 characters in body
Source Link

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.


Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

(Sized such that access was efficiently processed by virtual memory managers etc.).

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

  1. Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
  2. Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?


Update: should note, I'm caching input data - the outputs of the actual application function have no (helpful) relation between iterations.

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.


Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

(Sized such that access was efficiently processed by virtual memory managers etc.).

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

  1. Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
  2. Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.


Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

(Sized such that access was efficiently processed by virtual memory managers etc.).

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

  1. Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
  2. Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?


Update: should note, I'm caching input data - the outputs of the actual application function have no (helpful) relation between iterations.

deleted 10 characters in body
Source Link

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.


Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

(Sized such that access was efficiently processed by virtual memory managers etc.).

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

Addendum: It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

  1. Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
  2. Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.


Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

(Sized such that access was efficiently processed by virtual memory managers etc.).

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

Addendum: It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

  1. Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
  2. Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

This question relates to the question here, but I'll generalise it so that you can answer effectively without reading all of that.


Context:

Imagine you had a large set of data greater than your available RAM that was partitioned in to chunks.

(Sized such that access was efficiently processed by virtual memory managers etc.).

An application with a GUI prompts a user to record a sequential process that will require access to sections of this data over time.

In realtime this would be managed by a kind of LRU cache for the user so they get feedback (perhaps at a lower resolution to account for latency). Data required, but not in RAM would be loaded by replacing older data that is tagged 'least recently used'...

But now instead, imagine that I know the sequence in advance - i.e. I effectively have look-ahead/clairvoyance of future memory access requirements.

It is assumed that the size (in GB/whatever) of unique data required integrated over the full sequence, is higher than the amount of RAM available (at least double).

Questions:

What optimal algorithms/strategies are there to manage this in the case of:

  1. Needing to 'play' back the sequence in a kind of psuedo-realtime (sequential) manner for a user.
  2. Needing to just process it as fast as possible in a non-sequential 'offline' fashion.

Imagine a worst case where data was required 'on and off', but often throughout the sequence - i.e. its requirement period is just over the period that an LRU, LFU strategy would dictate it 'not-required'. But, by most definitions of optimal we'd rather just keep it in RAM, right?

added 173 characters in body
Source Link
Loading
added 173 characters in body
Source Link
Loading
deleted 2 characters in body; edited title
Source Link
Loading
Source Link
Loading