Revisions to Frameworks for modeling prior knowledge other than Bayesian statistics

added 17 characters in body

edited Jan 14, 2020 at 17:37

22.2k
1
55
105

One way in which prior information can be incorporated into the estimator is through the likelihood (or model, depending on how you look at it). That is to say, when we build a standard parametric model, we are constraining ourselves to say that we are going to allow the model to follow a very specific form, that we know up to the values of the parameters themselves. If we are approximately correct about this form, we should have more efficient estimation than a more general model with more parameters. On the other hand, if our "prior knowledge" is incorrectgrossly inadquate and this constraint is incorrectoverly restrictive, we should introduce a lot of bias into our model.

As fairly modern example, Convolutional Neural Networks (CNN) are currently the state of the art for image classification, doing considerably better than vanilla fully connected NN's. The only difference between a CNN and a standard NN is that on the top layers, CNNs only allow for local interactions, where as a fully connected NN doesn't care about how close two pixels are to each other. In other words, the CNN models are a proper subset of the vanilla NNs, where many of the top level parameters are set to 0. This is based on the prior knowledge that nearby pixels are very likely to be related, so by constraining the fully connected model, we get more efficient estimation. Empirically, using this prior information about how we think interactions between pixels should work, we have improved our predictions for image classification.

One way in which prior information can be incorporated into the estimator is through the likelihood (or model, depending on how you look at it). That is to say, when we build a standard parametric model, we are constraining ourselves to say that we are going to allow the model to follow a very specific form, that we know up to the values of the parameters themselves. If we are approximately correct about this form, we should have more efficient estimation than a more general model with more parameters. On the other hand, if our "prior knowledge" is incorrect and this constraint is incorrect, we should introduce a lot of bias into our model.

As fairly modern example, Convolutional Neural Networks (CNN) are currently the state of the art for image classification, doing considerably better than vanilla fully connected NN's. The only difference between a CNN and a standard NN is that on the top layers, CNNs only allow for local interactions, where as a fully connected NN doesn't care about how close two pixels are to each other. In other words, the CNN models are a proper subset of the vanilla NNs, where many of the top level parameters are set to 0. This is based on the prior knowledge that nearby pixels are very likely to be related, so by constraining the fully connected model, we get more efficient estimation. Empirically, using this prior information about how we think interactions between pixels should work, we have improved our predictions for image classification.

One way in which prior information can be incorporated into the estimator is through the likelihood (or model, depending on how you look at it). That is to say, when we build a standard parametric model, we are constraining ourselves to say that we are going to allow the model to follow a very specific form, that we know up to the values of the parameters themselves. If we are approximately correct about this form, we should have more efficient estimation than a more general model with more parameters. On the other hand, if our "prior knowledge" is grossly inadquate and this constraint is overly restrictive, we should introduce a lot of bias into our model.

As fairly modern example, Convolutional Neural Networks (CNN) are currently the state of the art for image classification, doing considerably better than vanilla fully connected NN's. The only difference between a CNN and a standard NN is that on the top layers, CNNs only allow for local interactions, where as a fully connected NN doesn't care about how close two pixels are to each other. In other words, the CNN models are a proper subset of the vanilla NNs, where many of the top level parameters are set to 0. This is based on the prior knowledge that nearby pixels are very likely to be related, so by constraining the fully connected model, we get more efficient estimation. Empirically, using this prior information about how we think interactions between pixels should work, we have improved our predictions for image classification.

added 285 characters in body

Source Link

edited Jan 14, 2020 at 17:02

Cliff AB

22.2k
1
55
105

One way in which prior information can be incorporated into the estimator is through the likelihood (or model, depending on how you look at it). That is to say, when we build a standard parametric model, we are constraining ourselves to say that we are going to allow the model to follow a very specific form, that we know up to the values of the parameters themselves. If we are approximately correct about this form, we should have more efficient estimation than a more general model with more parameters. On the other hand, if our "prior knowledge" is incorrect and this constraint is incorrect, we should introduce a lot of bias into our model.

As fairly modern example, Convolutional Neural Networks (CNN) are currently the state of the art for image classification, doing considerably better than vanilla fully connected NN's. The only difference between a CNN and a standard NN is that on the top layers, CNNs only allow for local interactions, where as a fully connected NN doesn't care about how close two pixels are to each other. In other words, the CNN models are a proper subset of the vanilla NNs, where many of the top level parameters are set to 0. This is based on the prior knowledge that nearby pixels are very likely to be related, so by constraining the fully connected model, we get more efficient estimation. Empirically, using this prior information about how we think interactions between pixels should work, we have improved our predictions for image classification.

One way in which prior information can be incorporated into the estimator is through the likelihood (or model, depending on how you look at it). That is to say, when we build a standard parametric model, we are constraining ourselves to say that we are going to allow the model to follow a very specific form, that we know up to the values of the parameters themselves. If we are approximately correct about this form, we should have more efficient estimation than a more general model with more parameters. On the other hand, if our "prior knowledge" is incorrect and this constraint is incorrect, we should introduce a lot of bias into our model.

As fairly modern example, Convolutional Neural Networks (CNN) are currently the state of the art for image classification, doing considerably better than vanilla fully connected NN's. The only difference between a CNN and a standard NN is that on the top layers, CNNs only allow for local interactions, where as a fully connected NN doesn't care about how close two pixels are to each other. This is based on the prior knowledge that nearby pixels are very likely to be related, so by constraining the fully connected model, we get more efficient estimation.

One way in which prior information can be incorporated into the estimator is through the likelihood (or model, depending on how you look at it). That is to say, when we build a standard parametric model, we are constraining ourselves to say that we are going to allow the model to follow a very specific form, that we know up to the values of the parameters themselves. If we are approximately correct about this form, we should have more efficient estimation than a more general model with more parameters. On the other hand, if our "prior knowledge" is incorrect and this constraint is incorrect, we should introduce a lot of bias into our model.

As fairly modern example, Convolutional Neural Networks (CNN) are currently the state of the art for image classification, doing considerably better than vanilla fully connected NN's. The only difference between a CNN and a standard NN is that on the top layers, CNNs only allow for local interactions, where as a fully connected NN doesn't care about how close two pixels are to each other. In other words, the CNN models are a proper subset of the vanilla NNs, where many of the top level parameters are set to 0. This is based on the prior knowledge that nearby pixels are very likely to be related, so by constraining the fully connected model, we get more efficient estimation. Empirically, using this prior information about how we think interactions between pixels should work, we have improved our predictions for image classification.

Source Link

answered Jan 14, 2020 at 16:56

Cliff AB

22.2k
1
55
105

One way in which prior information can be incorporated into the estimator is through the likelihood (or model, depending on how you look at it). That is to say, when we build a standard parametric model, we are constraining ourselves to say that we are going to allow the model to follow a very specific form, that we know up to the values of the parameters themselves. If we are approximately correct about this form, we should have more efficient estimation than a more general model with more parameters. On the other hand, if our "prior knowledge" is incorrect and this constraint is incorrect, we should introduce a lot of bias into our model.

As fairly modern example, Convolutional Neural Networks (CNN) are currently the state of the art for image classification, doing considerably better than vanilla fully connected NN's. The only difference between a CNN and a standard NN is that on the top layers, CNNs only allow for local interactions, where as a fully connected NN doesn't care about how close two pixels are to each other. This is based on the prior knowledge that nearby pixels are very likely to be related, so by constraining the fully connected model, we get more efficient estimation.

Stack Exchange Network

Return to Answer