This is a FAQ in the C# tag, hard to find the duplicate. The need for the cast is relevant first. The underlying reason is that the CLI only specifies a limited number of valid types for the Opcodes.Add IL instruction. Only operands of type Int32, Int64, Single, Double and IntPtr are supported. IntPtr is special as well, the C# language forbids using that one.
So the C# compiler has to use an implicit conversion to uplift the byte to a type that the operator supports. It will pick Int32 as the closest compatible type. The result of the addition is Int32. Which does not fit back into a byte without truncating the result, throwing away the extra bits. An obvious example is 255 + 1, the result is 256 in Int32 but doesn't fit a Byte and yields 0 when stored.
That's a problem, the language designers didn't like that truncation to happen without you explicitly acknowledging that you are aware of the consequences. A cast is required to convince the compiler that you're aware. Bit of a cop-out of course, you tend to produce the cast mechanically without thinking much about the consequences. But that makes it your problem, not Microsoft's :)
The rub was the += operator, a very nice operator to write condense code. Resembles the brevity of the var keyword. Rock and a hard place however, where do you put the cast? It just doesn't work so they punted the problem and allowed truncation without a cast.
Notable is the way VB.NET works, it doesn't require a cast. But it gives a guarantee that C# doesn't provide by default, it will generate an OverflowException when the result doesn't fit. Pretty nice, but that check doesn't come for free.
Designing clean languages is a very hard problem. The C# team did an excellent job, warts not withstanding. Otherwise the kind of warts brought on by processor design. IL has these type restrictions because that's what real 32-bit processors have too, particularly the RISC designs that were popular in the 90s. Their internal registers can only handle 32-bit integers and IEEE-754 floating point. And only permit smaller types in loads and stores. The Intel x86 core is very popular and actually permits basic operations on smaller types. But that's mostly a historical accident due to Intel keeping the design compatible through the 8-bit 8080 and 16-bit 8086 generations. It doesn't come for free, 16-bit operations costs an extra cpu cycle. To be avoided.