Notation Convention in Linear Models: Why $\theta^\top x$ instead of $\theta x$?
Question:
I'm working with CMU 10-414 Lecture 2 and I'm curious about the notation convention used to represent the model parameters.
Our hypothesis function maps inputs $x \in \mathbb{R}^n$ to $k$-dimensional vector
$$ h: \mathbb{R}^n \to \mathbb{R}^k $$
where $h_{i}(x)$ indicates some measure of "believe" in how much likely the label is to be class $i$.
A linear hypothesis function uses a linear operator for this transformation
$$ h_{\theta}(x) = \theta^\top x $$
for parameters $\theta \in \mathbb{R}^{n \times k}$
Given a linear model with $k$ classifiers, why is the notation $\theta^\top x$ commonly used, where $\theta \in \mathbb{R}^{n \times k}$? Is there a specific reason for using $\theta^\top x$ instead of $\theta x$ with $\theta \in \mathbb{R}^{k \times n}$?
Besides, when working with matrix batch form, the standard convention favors the use of row vectors over column vectors.
Often more convenient to write the data and operations in matrix batch form $$ X \in \mathbb{R}^{m \times n} = \begin{bmatrix} \left( x^{(1)} \right) ^\top \\ \vdots \\ \left( x^{(m)} \right) ^\top \end{bmatrix},\quad y \in \{ 1,\dots,k \}^m = \begin{bmatrix} y^{(1)} \\ \vdots \\ y^{(m)} \end{bmatrix} $$
Then the linear hypothesis applied to this batch can be written as
$$ h_{\theta}(X) = \begin{bmatrix} h_{\theta}^\top\left( x^{(1)} \right) \\ \vdots \\ h_{\theta}^\top\left( x^{(m)} \right) \end{bmatrix} $$
As a result, given that $h_{\theta}(X) = X \theta$, it seems to me that the convention is at odds with the standard unbatched form.
Background:
- I'm familiar with the basics of linear regression and linear models.
- I've seen the notation $\theta^\top x$ used in various machine learning resources, but I'm not sure why it's preferred over $\theta x$.
Goal:
- Understand the reasoning behind the notation convention.
- Clarify any potential benefits or drawbacks of using $\theta^\top x$ versus $\theta x$.
- Elaborate the discrepancy in conventions between matrix batch form and standard form.