1
$\begingroup$

I've been researching neural networks and boolean formulae. From my efforts, it doesn't seem that neural networks can generally learn boolean formulae using back propagation. This makes sense intuitively, since boolean formulae outputs can exhibit dramatic changes based on input values, and so there will be many discontinuities and consequently local optima.

On the other hand, I also understand that any boolean formula can be represented with a neural network, as per the Universal Approximation Theorem, so there is no inherent reason why a neural network cannot represent, and consequently possibly learn an arbitrary boolean formula. The problem seems to be the learning algorithm, and all general machine learning algorithms seem like they'll get stuck in local optima, regardless of whether I use gradient descent, evolutionary algorithms, expectation maximization, and so on, since they all are based on the premise that local incremental improvements are the route to a global optimum solution.

That being said, I also know that there are other sorts of algorithms like Quine-McCluskey and Espresso that can derive minimal boolean formulae from truth tables. It would be possible to use these algorithms to subsequently generate a neural network that embeds the minimal boolean formula the algorithm derived from a truth table. Or, even simpler, just translate the truth table into a neural network. However, these are very specific algorithms that target boolean formulae in particular, and are not used in a more general machine learning context, as far as I am aware.

So, this brings me to my question. Are there any proofs demonstrating that neural networks are capable or incapable of learning arbitrary boolean formulae using backpropagation and gradient descent, or any other general purpose machine learning algorithms?

I have checked Cross Validated, and Googled the question, but have been unable to find anything definitive.

$\endgroup$

1 Answer 1

2
$\begingroup$

According to Universal Approximation Theorem (or another well articulated source, and another), yes. The situation is not restricted to boolean inputs, but you may see wordings related to boolean inputs in several places (not in Wikipedia article). Even, a single hidden layer neural network is a universal approximator, with enough number of neurons. The theorem does not state how (i.e. gradient descent, backpropagation, or any other optimization method). It only says, this is possible.

$\endgroup$
4
  • $\begingroup$ Yes, I understand that, addressed in my second paragraph. I added a link to the UAT to make the reference clearer. I also explain one algorithm using logic minimization that can represent truth tables with minimal neural networks. The optimization method is the crux of the issue, whether it is possible using one of the standard general methods. $\endgroup$ Commented Nov 29, 2024 at 2:37
  • 1
    $\begingroup$ Okay, from the paragraph before your edit, it looked like it was your deduction, that's why I specifically mentioned UAT. I doubt you'll find any theorem for an arbitrary non-convex optimization problem (let alone neural nets) that an algorithm will guarantee finding you a solution. These algorithms are all heuristics, and finding the optimum of a function is an NP-hard (or NP-complete) problem. $\endgroup$ Commented Nov 29, 2024 at 19:22
  • $\begingroup$ thanks, that is a good point about it generally being a NP-hard problem. I do know that if a landscape is smooth enough, then there is a good probability the optimization algorithm will find the optimum. Do you know of any analytic tools to determine how bumpy a function's landscape is, and consequently how good of a chance a general purpose optimization algorithm will find the optimum? $\endgroup$ Commented Nov 29, 2024 at 19:38
  • 1
    $\begingroup$ Normally, a function's number of critical points should signal something, but in the scale of a neural network with millions or billions of parameters, I do not think you'll find a tool that measures the bumpiness of the loss surface. However, there are just know techniques that promote smoothness, such as residual connections. $\endgroup$ Commented Nov 29, 2024 at 23:00

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.