'Training' data is really just splitting data you have already collected into test or training sets. For example, if you want to build a classifier for handwritten numbers, you collect thousands of samples of handwritten numbers like the MNIST database. When you think you have enough data to build a model, you then split it into train and test sets (usually by randomly assigning individual samples to one group or another at a specific ratio).
I think where your confusion lies is in the idea of collecting a 'training' set first as if it's truly independent from the test set. When collecting handwritten numbers, the researchers did not say, well, we have 10,000 samples, let's build a model with 10,000 samples and then have it running for our future data sets that we have not collected yet - in fact, that strategy is particularly bad and can lead to overfitting.
What you would do is take those 10,000 samples and split them - say 7,000 for training to build a model, 3,000 for testing said model - and maybe you would randomly build up many 7,000/3,000 models and take the average of the parameters for those models that you built. Then you can say our model predicts our test set with an accuracy of 97%, we think it will work well on data we have not yet collected.
How you collect that initial data set is specific to the process you are trying to understand. Maybe it's clicks on a website, images from a satellite, or electrical recordings from an ensemble of neurons. Sometimes you pay money to collect data - like a census or survey or maybe even buying another company that collected a bunch of user data that you want - typically, data collection is an inherent process to what you are doing and you are using statistical methods to make models and inferences about your population of interest.