And if you go back further, e.g. to the ENIAC, you'll see a word size of 40 bits.
And if you go back even further, to mechanical calculators, you'll see word sizes determined by the number of decimal digits they can represent.
And that explains the approach: Computers originally were meant to automate calculations. So you want to represent numbers. With enough digits you can do meaningful calculations.
Then you decide if you want a binary or a decimal representation.
That's how you end up with something like 10 decimal digits, or between 33 and 40 bits.
Then you discover that this is too many bits for instructions. So you stuff several instructions into one word (or you have lots of space of addressingfor an address in the instruction).
And you think about representing characters. Which have 6 bits for teletypes. So multiples of 6 make a lot of sense.
Then you want to make the computers cheaper. If you are DEC and have a 36 bit machine, and you are using octal, 3*4 = 12 bits are an obvious choice, because that's a fraction of 36 bits. So you get the PDP-8.
And further on, you get the PDP-11, microcomputers, and word sizes of multiple of 8 bits.
So starting out with large word sizes to represent numbers is the natural thing to do. The really interesting question is the process by which they became smaller.