0

Please allow me to use this example/metaphor to describe an algorithm I need.

Objects

  • There are 5 thousand pennies.

  • There are 50 cups.

  • There is a tracking history (Passport "stamp" etc) that is associated with each penny as it moves between cups.

Definition

I'll define a "highly diffused" penny as one that passes through many cups.

A "poorly diffused" penny is one that either passes back and forth between 2 cups

Question

How can I objectively measure the diffusion of a penny as:

  1. The number of moves the penny has gone through
  2. The number of cups the penny has been in
  3. A unit of time (day, week, month)

Why am I doing this? I want to detect if a cup is hoarding pennies.

Resistance from bad actors

Since hoarding is bad, the "bad cup" may simply solicit a partner and simply move pennies between each other. This will reduce the amount of time a coin isn't in transit, and would skew hoarding detection.

A solution might be to detect if a cup (or set of cups) are common "partners" with each other, though I'm not sure how to think though this problem.

Broad applicability

Any assistance would be helpful, since I would think that this algorithm is common to

  • Economics
  • The study of migration patterns of animals, citizens of a country
  • Other natural occurring phenomena

... and probably exists as a term or concept I'm unfamiliar with.

2
  • 1
    I wasn't the one that downvoted, but I think the downvote may be because you are asking for an algorithm, but your requirements look more like a framework. There's too many details missing (such as what causes pennies to move in the first place). You also mention actors, suggesting the implementation you are looking for is an actor based simulation. The actor based "algorithm" is about as trivial as they come. Those two statements suggest that you are really going to need assistance with actor based modeling, not the algorithm itself. Commented Jun 12, 2014 at 1:48
  • 1
    As for the algorithm itself, I think you might find the metrics to detect hoarding are trickier than you would like. If you are facing an intelligent adversary, they will identify the edges of your algorithms (such as tracking 2 cup cycles but not 3), and defeat it. These sorts of models lead very quickly to arms races between developers of metrics and the adversaries trying to overcome the metrics. Commented Jun 12, 2014 at 1:49

1 Answer 1

2

Simplest is often the best. Or at least a really good start.

You already have the history of which cups each penny has visited, hence you know the number of cups visited by each penny. We might expect to see a normal distribution of the number of cups visited vs the number of visits per penny.

A similar approach can be used to observe the distribution of pennies in each cup.

A penny that is poorly diffused has visited few cups, but one that is being hoarded (in your definition) is one that also has had many visits. Patterns of hoarding would skew the distributions.

So I would suggest looking for pennies where

number-of-visits / number-of-cups-visited 

is greater than one, as a starting point.

This analysis can be performed for each time period you are interested in.

The cutoffs you choose (I suggested one, above) should be able to be adjusted; the warnings posted by @Cort Ammon should be taken seriously.

2
  • I like this. Also, I would rather ask this on CrossValidated. I would add standard deviation and eventually a significance test to determine which cases are unusual (under- or overperforming). A boxplot might be a great way to visualize the distribution Commented Jun 12, 2014 at 6:53
  • Yes; my statistics is not strong enough to take the argument about expected distributions further. As you say, a visual representation (or several) of the data would help with understanding it, and hence with producing a viable algorithm. Commented Jun 12, 2014 at 7:18

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.