1
$\begingroup$

Suppose I have a database of entities. The database has a column for the name of the entity, min A, max A, min B, max B. Where the min/max columns are the min/max for variables A and B. It is unknown how many samples were used to produce the min/max values.

We have a system that takes field samples, but the only output available is a series of min A, max A, min B, and max B. I now want to identify the field samples based on the entity values in the database.

Testing that sample min A >= db min A and sample max A <= db max A is not sufficient. We can have sample min A less than the correct entity min A value, for example.

I am considering the following approach:

Let $\alpha = (db_{\min} A - \text{sample}_{\min} A)^2 + (db_{\max} A - \text{sample}_{\max} A)^2$.

Let $\beta = (db_{\max} A - db_{\min} A)^2$.

Let $p = \exp(1 - \alpha/\beta) / e$.

I can repeat this for B, and then take the product of the $p$'s.

Then, the entity that produces the largest $p$ for a sample is the most likely match.

Is there a better approach to this problem?

$\endgroup$
2
  • $\begingroup$ Welcome to CV. I am afraid you will need to provide a lot more details (please edit the original post; readers may not red all the comments). To start, what are the A and B variables? And why choose min/max as the statistics of choice (they are ill-behaved, at the mercy of any extreme value); there must be a domain specific reason? Is the min/max values in the db really the min/maxes of the min/max over multiple samples (for a given entity)? Or the average of the min/maxes? Or a CI? Or? Same for a given sample; is this min/max the extremes of multiple readings? How is it computed?...ctd $\endgroup$ Commented Mar 19 at 19:30
  • $\begingroup$ ..ctd. If you got a sample from entity X, what is the expected behavior of its min/max values wrt the values in the db for that entity? Very close to the db values, or well within the [min,max] interval? Or? And is there a lot of overlap between the min/max intervals in the db for the various entities, or are they well separated? (a graph would really help). Then, how would you validate your classifier? Do you have the true value of the entities for many samples? etc... I am afraid that is, no one will be able to provide a cogent answer t the question. $\endgroup$ Commented Mar 19 at 19:34

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.