The basic idea is proof by induction on $p$, the number of powers of $\phi$ in the connected correlation function (or derivatives of $J$ applied to $W[j] = \log(Z[J])$), and the observation that peeling off a single factor of $J$ (which appears as an external leg) in a connected Feynman diagram results in another connected Feynman diagram. The base case (for $p=1$) follows simply from tracing the logic for the proof that factors in Feynman diagrams of expectation values (i.e. $\delta_J Z[J]/Z[J]$) are all connected to at least one external leg (the external leg in question) and bubble diagrams cancel out. (This however is itself a non-trivial result.)
Note that if you consider quadratic and higher terms in the Lagrangian (or Hamiltonian, in statistical field theory) as "nonlinear sources" (i.e. attached to an external variable 'knobs' for the mass $m$, couplings $\Gamma$, $\lambda$, etc. that can be tuned/adjusted), the "bubble" diagrams that appear in $W[\mu, \lambda, \Gamma, ...]$ are also connected.
Another way of seeing the result is through the inclusion-exclusion principle, examining derivatives of $\log(Z[J])$ in terms of derivatives of $Z[J]$: for example, \begin{align*} \delta_{J_2}\delta_{J_1} W[J] &= -\frac{\delta}{\delta J_2} \Big[\frac{-1}{Z[J]}\frac{\delta Z[J]}{\delta J_1}\Big] \\ &= \frac{\delta^2 Z[J]}{\delta J_2\delta J_1} \frac{1}{Z[J]} - \frac{\delta Z[J]}{\delta J_2}\frac{\delta Z[J]}{\delta J_1}\frac{1}{Z[J]^2}\\ &= \langle \phi_1\phi_2\rangle - \langle \phi_1\rangle\langle\phi_2\rangle \end{align*} which you might recognize as the sum of all (possibly disconnected) diagrams connected to two external legs minus the sum of all diagrams connected to either external leg but not both. This sort of reasoning/book-keeping gets a little bit trickier with more derivatives, but the inclusion/exclusion principle still applies and results in a sum over connected diagrams.