I think the confusion arises precisely because of the computer graphics context. The notions of basis, coordinate system, axes, and so on have mathematical definitions, and what you see in graphics is just an application of those concepts with extra jargon attached on top.
The reply to the first two points is fairly lengthy, since I do not know how to rigorously explain this without resorting to linear algebra. However once this is understood the rest comes almost for free.
1. Axis, Basis Vectors, Coordinate System
The formal definition of an axis is an oriented line: "In real coordinate space, an oriented line is also known as an axis." Unlike a basis vector it does not prescribe lengths. On the other hand it doesn't necessarily pass through the origin. However, in graphics I believe that it is sometimes used interchangeably with the basis vectors when those are positioned at the origin of your coordinate system. This is likely because the origin of the coordinate system $\vec{a}_0$ along with a basis vector $\vec{a}_i$ uniquely determines an oriented line as $\vec{l}_i(t) = \vec{a}_0 + t\vec{a}_i$, where the positive orientation is along $\vec{a}_i$ and the opposite is along $-\vec{a}_i$. Note that formally the converse is not true despite the interchangeable usage in graphics, i.e., an oriented line does not uniquely determine a basis vector or an origin.
2.1. Basis: Mathematical Definition
A basis $A$ (in a finite-dimensional space) is just an $n$-tuple of linearly independent vectors $$A = \begin{bmatrix} | & & | \\ \vec{a}_1 & \ldots & \vec{a}_n \\ | & & | \end{bmatrix}\in V^{1\times n},$$ that spans the vector space $(V,\mathbb{F},+,\cdot)$. If $V=\mathbb{R}^n, \mathbb{F} = \mathbb{R},$ then the above is an invertible matrix from $\mathbb{R}^{n\times n}$. It gives you unique coordinates $[\vec{v}]_A\in\mathbb{R}^{n\times 1}$ for any vector $\vec{v}\in V$: $$\vec{v} = \sum_{i=1}^n [\vec{v}]_A^i \vec{a}_i =\begin{bmatrix} | & & | \\ \vec{a}_1 & \ldots & \vec{a}_n \\ | & & | \end{bmatrix} \begin{bmatrix} [\vec{v}]_A^1 \\ \vdots \\ [\vec{v}]_A^n \end{bmatrix} = A [\vec{v}]_A.$$ A different basis $B$ for the same space may give you different coordinates $[\vec{v}]_B\ne[\vec{v}]_A$ for the same vector $\vec{v}$: $$B[\vec{v}]_B = \sum_{i=1}^n [\vec{v}]^i_B \vec{b}_i = \vec{v} = \sum_{i=1}^n [\vec{v}]_A^i \vec{a}_i = A [\vec{v}]_A.$$ So the only thing a basis influences are coordinates.
2.1a. A Note on the Necessity for Abstraction
A natural question is what is the difference between vectors and elements of $\mathbb{R}^n$. The thing is that a vector space can be more abstract, e.g., $V = \{p:\mathbb{R}\to\mathbb{R}\,|\, \sum_{j=0}^{n-1} c_j t^n, \,c_j\in\mathbb{R}\}$, with the standard function addition and real number multiplication, is the vector space of real polynomials of at most degree $(n-1)$. Here vectors $\vec{v}\in V$ are polynomial functions. However the idea of coordinates remains virtually the same:
$$\vec{v} = \sum_{i=1}^n [\vec{v}]_A^i \vec{a}_i = A [\vec{v}]_A.$$
The only thing that changes is that $A$ is not an $n\times n$ matrix anymore, but it is instead a matrix from $V^{1\times n}$ where the elements are the polynomial basis functions, e.g. $A = \begin{bmatrix} 1 & t & t^2 & \ldots & t^{n-1}\end{bmatrix}$, then the coordinates are the polynomial coefficients in the monomial basis. So abstraction allows for more generality, where the above definitions do not only apply to arrows in Euclidean space.
2.2. Coordinate System
If you also add an origin $\vec{a}_0$ then you end up with a coordinate system $$A_0 = \begin{bmatrix} | & | & & | \\ \vec{a}_0 & \vec{a}_1 & \ldots & \vec{a}_n \\ | & | & & | \end{bmatrix}\in V^{1\times (1+n)}.$$ The coordinates $[\vec{v}]_{A_0}$ of any vector $\vec{v}$ are related to its coordinates $[\vec{v}]_{A}$ in a simple manner: $$\vec{v} = \vec{a}_0 + \sum_{i=1}^n[\vec{v}]^i_{A_0}\vec{a}_i \implies \vec{v}- \vec{a}_0 = \sum_{i=1}^n[\vec{v}]^i_{A_0}\vec{a}_i = \sum_{i=1}^n[\vec{v}-\vec{a}_0]^i_{A}\vec{a}_i \implies [\vec{v}]_{A_0} = [\vec{v}]_A-[\vec{a}_0]_A.$$ Thus the coordinates $[\vec{v}]_{A_0}$ are just $[\vec{v}]_A$ with a shift by the negated coordinates $-[\vec{a}_0]_A$ of $\vec{a}_0$ in $A$.
2.3. Basis Coordinates vs Coordinate System Coordinates
In graphics you typically work with $[\vec{v}]_{A_0}$ whenever a coordinate system is given. A compact notation similar to $\vec{v} = A[\vec{v}]_A$ is:
$$\vec{v} = \vec{a}_0 + \sum_{i=1}^n [\vec{v}]_{A_0}^i\vec{a}_i = \begin{bmatrix} | & | & & | \\ \vec{a}_0 & \vec{a}_1 & \ldots & \vec{a}_n \\ | & | & & | \end{bmatrix}\begin{bmatrix} 1 \\ [\vec{v}]^1_{A_0} \\ \vdots \\ [\vec{v}]^n_{A_0}\end{bmatrix} = A_0 \langle\vec{v}\rangle_{A_0}, \quad \langle\vec{v}\rangle_{A_0} := \begin{bmatrix} 1 \\ [\vec{v}]_{A_0} \end{bmatrix}. $$
For lack of better notation I have used $\langle \vec{v} \rangle_{A_0}$ for the extension of the coordinates $[\vec{v}]_{A_0}$ with a $1$, note however that this is not standard notation. The above is similar to what is done in CG for homogeneous coordinates, although I put the $1$ in the $0$-th slot, while in CG they like to put it in the $(n+1)$-st slot. On the other hand you cannot put an arbitrary non-zero number instead of the $1$ here, as that would mess up $\vec{v}$, So the above is just for conciseness of notation. You could put a zero, however, if you use $[\vec{v}]_A$ which results in: $$\vec{v} = A_0\langle \vec{v}\rangle_{A} = A[\vec{v}]_A, \quad \langle \vec{v}\rangle_A := \begin{bmatrix} 0 \\ [\vec{v}]_A\end{bmatrix}.$$
You will notice that I defined $\langle\cdot\rangle$ in two ways, whenever it was w.r.t. $A$ I put a zero, and when it was w.r.t. $A_0$ I put a $1$. This allows you to distinguish whether a vector is in $A$ or $A_0$ just by looking at "the homogeneous coordinate". In CG typically coordinates w.r.t. $A$ (i.e. using a zero) are used when working with "direction vectors" that should not be affected by translations, while coordinates w.r.t. $A_0$ (i.e. using a one) are used to when working with points that should be affected by translations.
2.3a Coordinate System Coordinates
From the previous we know that if we have a point $\vec{p}$ we would typically store its coordinates $[\vec{p}]_{A_0}$ w.r.t. $A_0$, while if we have a "direction vector" $\vec{d}$ we would typically store its coordinates $[\vec{d}]_{A}$ w.r.t. $A$, with the reconstruction rules: $$ \vec{p} = A_0\langle \vec{p}\rangle_{A_0}, \quad \vec{d} = A_0\langle \vec{d}\rangle_{A} $$
In the next section I will need reconstruction rules for the coordinate systems from their coordinaates, so I generalise the above to this setting. A coordinate system itself is a collection of $n+1$ vectors, where we treat the zero-th as a point, and the rest as "directional vectors", in other words $\vec{a}_0$ is expressed in $A_0$ and $\vec{a}_i$ are expressed in $A$ for $1\leq i\leq n$. For convenience and to reflect the above I generalize the notation $\langle\cdot\rangle$ to the coordinates of $B_0$ as follows: $$\langle B_0\rangle_{A_0} = \begin{bmatrix} | & | & & | \\ \langle \vec{b}_0 \rangle_{A_0} & \langle \vec{b}_1\rangle_A & \ldots & \langle \vec{b}_n\rangle_A \\ | & | & & |\end{bmatrix} = \begin{bmatrix} 1 & 0 & \ldots & 0 \\ | & | & & | \\ [\vec{b}_0]_{A_0} & [\vec{b}_1]_{A} & \ldots & [\vec{b}_n]_{A} \\ | & | & & | \end{bmatrix}. $$ Note that this is how a typical transformation matrix looks like up to switching the first and last column and rows. In the above the translation factor is in the first column and the basis transformation is in the lower right part.
Now you can reconstruct $B_0$ as follows:
\begin{align}B_0 &= A_0\langle B_0\rangle_{A_0} \\ \begin{bmatrix} | & | & & | \\ \vec{b}_0 & \vec{b}_1 & \ldots & \vec{b}_n \\ | & | & & | \end{bmatrix} &= \begin{bmatrix} | & | & & | \\ \vec{a}_0 & \vec{a}_1 & \ldots & \vec{a}_n \\ | & | & & | \end{bmatrix}\begin{bmatrix} | & | & & | \\ \langle\vec{b}_0\rangle_{A_0} & \langle \vec{b}_1\rangle_{A} & \ldots& \langle \vec{b}_n\rangle_{A} \\ | & | & & |\end{bmatrix}. \end{align}
2.4. "Nested" Coordinate Systems
Now suppose I have $m+1$ coordinate system $C_0,\ldots,C_m$ and that I store the coordinates $[C_{k+1}]_{C_k}$ of $C_{k+1}$ expressed in the basis $C_k$, i.e., $C_{k+1} = C_k \langle C_{k+1}\rangle_{C_k}$. Then in graphics $C_{k+1}$ is termed a child of $C_k$ and $C_k$ is termed the parent of $C_{k+1}$. The nesting becomes obvious if one expands $C_{k+1} = C_k \langle C_{k+1}\rangle _{C_k}$ as follows: $$C_{k+1} = C_k \langle C_{k+1}\rangle_{C_k} = C_{k-1} \langle C_{k}\rangle_{C_{k-1}}\langle C_{k+1}\rangle_{C_k} = \ldots = C_0 \langle C_1\rangle_{C_0} \langle C_2\rangle_{C_1} \ldots \langle C_{k+1}\rangle_{C_k} = C_0 \langle C_{k+1}\rangle_{C_0}.$$
Now if you have the coordinates $[\vec{v}]_{C_m}$ of a vector $\vec{v}$ w.r.t. $C_m$ then you can reconstruct the vector as:
$$\vec{v} = C_m \langle\vec{v}\rangle_{C_m} = C_0\langle C_1\rangle_{C_0}\ldots \langle C_m\rangle_{C_{m-1}} \langle \vec{v}\rangle_{C_m} = C_0 \langle \vec{v}\rangle_{C_0}.$$
The coordinate systems themselves are not nested as that wouldn't make sense. It's the coordinate representations of those that are nested, and you see above that given the coordinate $[\vec{v}]_{C_m}$ you can use this to get its coordinates $[\vec{v}]_{C_k}$ by taking $m-k$ steps up the chain.
2.4a. Examples
A typical setting would have $C_0$ as the world space coordinate system and $C_1$ as model space, $[C_1]_{C_0}$ being the coordinates of $C_1$ w.r.t. $C_0$. In fact the model to world matrix $P_{M\to W}$ in homogeneous coordinates is then: $$P_{M\to W} = \begin{bmatrix} | & & | & | \\ [\vec{c}_{1,1}]_{C_0} & \ldots & [\vec{c}_{1,n}]_{C_0} & [\vec{c}_{1,0}]_{C_0} \\ | & & | & | \\ 0 & \ldots & 0 & 1\end{bmatrix},$$ which is the same as $\langle C_1\rangle_{C_0}$ up to a switch of the first and last column and rows.
A more complex example would have the world space coordinate system $W$ as well as the camera coordinate system's coordinates $[C]_W$ expressed in world space, as well as some model's coordinate system's coordinates $[M]_W$ expressed in world space. Typically here you have $[\vec{v}]_M$ and you want to find $[\vec{v}]_C$. Then you can use
\begin{align} \vec{v} &= C\langle \vec{v}\rangle_C = W\langle C\rangle_W\langle\vec{v}\rangle_C \\ \vec{v} &= M\langle\vec{v}\rangle_M = W\langle M\rangle_W\langle \vec{v}\rangle_M. \end{align}
Since both sides are equal to the same thing we can just identify the coordinates: $$ \langle C\rangle_W\langle\vec{v}\rangle_C =\langle M\rangle_W\langle \vec{v}\rangle_M \implies \langle\vec{v}\rangle_C = \langle C\rangle_W^{-1}\langle M\rangle_W\langle \vec{v}\rangle_M. $$
3. Child Coordinate System Transformation
Suppose you have the coordinates $[C_0]_{P_0}$ of a "child" coordinate system $C_0$ expressed in the coordinate system $P_0$. And consider the transformation matrix:
$$T = \begin{bmatrix} 1 & \vec{0}^T \\ \vec{t} & M \end{bmatrix},$$
that encodes a translation $\vec{t}$ and a linear transformation $M$. The results depend on what side you multiply with $\langle C_0\rangle_{P_0}$:
\begin{align} T\langle C_0\rangle_{P_0} &= \begin{bmatrix} 1 & \vec{0}^T \\ \vec{t} & M \end{bmatrix}\begin{bmatrix} 1 & \vec{0}^T \\ \vec{c}_0 & C \end{bmatrix} = \begin{bmatrix} 1 & \vec{0}^T \\ \vec{t} + M\vec{c}_0 & MC \end{bmatrix} \\ \langle C_0\rangle_{P_0}T &= \begin{bmatrix} 1 & \vec{0}^T \\ \vec{c}_0 & C \end{bmatrix}\begin{bmatrix} 1 & \vec{0}^T \\ \vec{t} & M \end{bmatrix} = \begin{bmatrix} 1 & \vec{0}^T \\ \vec{c}_0 + C\vec{t} & CM \end{bmatrix}. \end{align}
Suppose that you apply a translation only, i.e., $M=I_n$. Then the first translates w.r.t. the basis of the coordinate system $P$, while the latter translates w.r.t. the basis of the coordinate system $C$. That is you can regard this as $\vec{t}\equiv[\vec{t}]_P$ in the first case and $\vec{t}\equiv[\vec{t}]_C$ in the second. Similarly, suppose that $\vec{t} = \vec{0}$ and $M$ is a rotation matrix. Then the first rotates $\vec{c}_0$ and $C$ around the origin of $P$, i.e., you can interpret $M\equiv [M]_P$ in the first case. In the second case the rotation happens around $\vec{c}_0$ and w.r.t. the basis $C$, i.e. $M\equiv [M]_{C}$.
4. Coordinates Representation
The examples from 2.4a with model to world and model to camera space should have answered this question.
5. Coordinates w.r.t. the System Itself
You know that $B_0 = A_0\langle B_0\rangle_{A_0}$, now set $B_0=A_0$, then $A_0 = A_0\langle A_0\rangle_{A_0}$, and the matrix must necessarily be the identity. So yes, the coordinates of the basis vectors w.r.t. the basis itself are the canonical vectors. Similarly the coordinates of the origin with respect to the system itself is zero since $$[\vec{a}_0]_{A_0} = [\vec{a}_0]_A - [\vec{a}_0]_A = \vec{0}.$$
6. Coordinates w.r.t. Some Other System
Suppose $[C_0]_{P_0}$ is given. It is the identity matrix only if $C_0=P_0$. If $\vec{c}_0\ne\vec{p}_0$ but $C=P$ then the origin coordinates $[\vec{c}_0]_{P_0}$ are non-zero, but the rest is the identity. Vice versa if $\vec{c}_0 =\vec{p}_0$ but $C\ne P$ then the origin coordinates $[\vec{c}_0]_{P_0}$ are zero, but the rest is arbitrary.
7. What does the "Parent" Coordinate System Know About the "Child" System
If $[C_0]_{P_0}$ is available the parent $P_0$ knows everything about the child, and it can reconstruct it as $C_0 = P_0\langle C_0\rangle_{P_0}$.
8. Transformations Applicable to the Origin and Basis
The basis is not affected by translational transformations, i.e. it accepts only linear transformations similar to "direction vectors". The origin can be transformed with any affine trasformation (translation + linear transformation), similar to how points transform. Refer to section 3.
9. Metrics in Different Systems
If you interpret everything as passive transformations then necessarily the inner product and thus norm are preserved because the vectors themselves do not change. The trick is that whenever you transform a coordinate system with some matrix $T$ you are also supposed to transform its coordinates with the opposite transformation in order for the coordinates to remain correct.
Suppose you have the coordinate system $A_0$ and the "direction vectors" $\vec{u}, \vec{v}$, then you can compute their coordinates $[\vec{u}]_{A},[\vec{v}]_{A}$ and you have that $\vec{u} = A[\vec{u}]_{A}$ and $\vec{v} = A[\vec{v}]_{A_0}$. You can compute the inner product $\langle \vec{u},\vec{v}\rangle$ of the vectors as follows:
\begin{align} \langle\vec{u}, \vec{v}\rangle &= \langle A[\vec{u}]_{A}, A[\vec{v}]_{A}\rangle = [\vec{u}]_A^T \begin{bmatrix} \langle \vec{a}_1, \vec{a}_1\rangle & \ldots & \langle \vec{a}_1,\vec{a}_n \rangle \\ \vdots & & \vdots \\ \langle \vec{a}_n, \vec{a}_1\rangle & \ldots & \langle \vec{a}_n, \vec{a}_n\rangle\end{bmatrix}[\vec{v}]_{A} = [\vec{u}]_{A}^TG_A[\vec{v}]_{A}. \end{align}
The matrix $G_A$ is known as the Gramian. Suppose that $B = A[B]_A$, then \begin{align} \vec{v} = B[\vec{v}]_B = A[B]_A[\vec{v}]_B = A[\vec{v}]_A \implies [B]_A[\vec{v}]_B=[\vec{v}]_A \implies [\vec{v}]_B = [B]_A^{-1}[\vec{v}]_A. \end{align}
The inner product is then consistent:
\begin{align} \langle\vec{u}, \vec{v}\rangle &= \langle B[\vec{u}]_{B}, B[\vec{v}]_{B}\rangle = \langle A[B]_A[B]_A^{-1}[\vec{u}]_A, A[B]_A[B]_A^{-1}[\vec{v}]_A\rangle = \langle A[\vec{u}]_A,A[\vec{v}]_A\rangle = [\vec{u}]_A^TG_A[\vec{v}]_A. \end{align}
If the Gramian is not the identity then generally:
$$\langle \vec{u},\vec{v}\rangle = [\vec{u}]_A^TG_A[\vec{v}]_A \ne [\vec{u}]_A^T[\vec{v}]_A.$$
So if you treat $[\vec{u}]_A = (3,0,0)$ as being of length 3 for arbitrary bases $A$, that is incorrect, its length can be computed as $$\|\vec{v}\|_2 = \sqrt{\begin{bmatrix} 3 & 0 & 0 \end{bmatrix} A^TA \begin{bmatrix} 3 \\ 0 \\ 0 \end{bmatrix}}.$$
If you're performing an active transformation then you're not actually transforming the basis but the vector/point itself. In that case the above does not hold and the vector can indeed change length and angles w.r.t. other vectors. Typically you world space is interpreted the "top space", so any transformation that is not meant to change the space of a vector/point should be treated as an active transformation there. That is, even though things are muddled in CG, I believe that all transformations are passive except for the model to world transformation. Those transformation ought to be considered active, since you would actually like to measure different angles and lengths in world space.
10. Is my conception of these subjects completely off?
I don't know. That's up to you to judge. I should note that the above apply to how contravariant vectors transform. The transformation of covariant vectors, higher order tensors, pseudovectors, multivectors, tensor densities and so on is not covered by my above explanation. Normals for instance need to be transformed with either $M^{-T}$ or $\operatorname{cof}(M)$ where the latter is the cofactor matrix. This is because upon rescaling, shearing, or other transforms that are not similarity transforms, they need to be treated as bivectors or covectors if you want the orthogonality to the geometry to be preserved.