At the high level it's not too difficult. Most laptop/desktop displays have three vertical subpixels, in RGB order from left to right. Given that, in your fragment shader you will need to evaluate the glyph outlines three times, using sample points at the center of each subpixel. Since the usual sample point for a fragment shader is at the pixel center, you would want to offset it by 1/3 of a pixel left and right to get the other two sample points.
If you do any antialiasing as part of your glyph outline evaluation (e.g. estimating how much of the pixel area is covered), you would likewise want to narrow the antialiasing kernel to cover only 1/3 of the width of the pixel, instead of the whole pixel.
A subtlety here is that if you want to render colored text, or text over different backgrounds, then you will need to use dual-source blending in order to correctly blend all three subpixels independently, using the three glyph evaluations as alpha values.
However, note that displays might have the subpixels in other orientations (which can change dynamically as the user rotates the display in the case of phones and tablets), and some displays have pentile layouts which are more complex. For subpixel rendering to work, your sample points need to adapt to the layout of the actual display you're running on, but I'm not sure if there is any reasonable way to query this information. It's for this reason, plus the general increase in display DPIs making subpixel rendering less advantageous, that people have generally moved away from subpixel text rendering in recent years.