Basic implementation
Here is a function BlockTridiagonalSolve that takes three lists of blocks (diag, lower and upper) and a list of vector pieces (vec) and solves the corresponding linear system:
BlockTridiagonalSolve[diag_?(ArrayQ[#, 3] &), lower_?(ArrayQ[#, 3] &), upper_?(ArrayQ[#, 3] &), vec_?MatrixQ] := Module[{a, i, n = Length[diag], d = diag, v = vec}, For[i = 1, i < n, i++, a = lower[[i]].Inverse[d[[i]]]; d[[i + 1]] -= a.upper[[i]]; v[[i + 1]] -= a.v[[i]]]; v[[n]] = Inverse[d[[n]]].v[[n]]; For[i = n - 1, i > 0, i--, v[[i]] = Inverse[d[[i]]].(v[[i]] - upper[[i]].v[[i + 1]])]; v]
To see it in action you can use the following piece of code:
nBlocks = 50; blockSize = 10; diag = DiagonalMatrix /@ RandomReal[{-1, 1}, {nBlocks, blockSize}]; lower = RandomReal[1, {nBlocks - 1, blockSize, blockSize}]; upper = RandomReal[-1, {nBlocks - 1, blockSize, blockSize}]; vec = RandomReal[{-1, 1}, {nBlocks, blockSize}]; trimat = SparseArray[{ Band[{1, 1}] -> diag, Band[{blockSize + 1, 1}] -> lower, Band[{1, blockSize + 1}] -> upper}]; bts = BlockTridiagonalSolve[diag, lower, upper, vec]; ls = LinearSolve[trimat, Flatten[vec]]; maxError = Max[Abs[Flatten[bts] - ls]]
Note that BlockTridiagonalSolve is slower than LinearSolve for small inputs, and grows comparable only for large blockSize values.
However, the following fact has to be taken into consideration: if you happen to be able to naturally construct the matrices diag, lower, and upper the way BlockTridiagonalSolve requires them, then BlockTridiagonalSolve may be the way to go, because a huge amount of time is spent (i.e. wasted...) contructing trimat, which is required to call LinearSolve. You can experience this for example with nBlocks = 100 and blockSize = 100, in which case trimat = SparseArray[...]; takes roughly 10 seconds on my Intel-i7 PC.
This is an important general observation: the best algorithm may depend on the type of input you are able to provide, because conversion can be time consuming too.
Considerations
Of course you may try and compile BlockTridiagonalSolve or perform other tricks (such as replacing Inverse with LinearSolve at lines 7 and 9). Last but not least, you should consider how BlockTridiagonalSolve is meant to be used: will you sporadically call it with different matrices, or will you repeatedly call it with the same matrix always? This makes a huge difference, because in the latter case you may want to "precompile" a specific solver as you would do by calling LinearSolve with one argument so that it generates a LinearSolveFunction.