Discrimination (in its proper sense) occurs when a variable is used in the decision process, not merely when the outcome is correlated with that variable. Formally, we discriminate with respect to a variable if the decision function in the process (i.e., the rating in this case) is a function of that variable.
Disparities in outcome with respect to a particular variable often occur even when there is no discrimination on that variable. This occurs when other characteristics in the decision function are correlated with the excluded variable. In cases where the excluded variable is a demographic variable (e.g., gender, race, age, etc.) correlation with other characteristics is ubiquitous, so disparities in outcome across demographic groups are to be expected.
It is possible to try to reduce disparities in outcomes across demographic groups through affirmative-action, which is a form of discrimination. If there are disparities in process-outcomes with respect to a variable, it is possible to narrow those disparities by using the variable as a decision-variable (i.e., by discriminating on that variable) in a way that favours groups that are "underrepresented" (i.e., groups with lower proportions of positive outcomes in the decision process).
You can't have it both ways --- either you want to avoid discrimination with respect to a particular characteristic, or you want to equalise process-outcomes with respect to that characteristic. If your goal is to "correct" disparities in outcomes with respect to a particular characteristic then don't kid yourself about what you are doing --- you are engaging in discrimination for the purposes affirmative action.
Many of the more complex difficulties you raise occur when you try to model things with various types of "black box" models of the type sometimes found in machine-learning applications, where there is complexity in relation to how the variables are taken into account in the model. In these cases there are some genuine issues that arise in trying to interpret and control the way that variables are used and so the problem can become non-trivial. Nevertheless, it is useful to bear in mind that using these models is a choice made by the user. If control over use of variables is important then it is better to use traditional statistical models (e.g., regression, GLMs, etc.) where there is greater clarity as to how variables are used in predictions and decisions.
In regard to this issue, it is useful to demarcate between characteristics that are inherent gender characteristics (e.g., pees standing up) versus characteristics that are merely correlated with gender (e.g., has an engineering degree). If you wish to avoid gender discrimination, this would usually entail removing gender as a predictor, and also removing any other characteristic that you consider to be an inherent gender characteristic. For example, if it happened to be the case that job applicants specify whether they pee standing up or sitting down, then that is a characteristic that is not strictly equivalent to gender, but one option effectively determines gender, so you would probably remove that characteristic as a predictor in the model.
Again, here we also need to note that certain complex machine-learning models might have a "black box" element where it is more difficult to understand and control the use of variables in prediction and decisions. Using these latter models is a choice, so if you value control highly (e.g., to constrain the use of variables in accordance with some ethical principle) then it is best to eschew these types of models and use traditional statistical models where it is simple to understand and control the use of input variables.
If you are talking about actual discrimination, as opposed to mere disparities in outcome, this is easy to constrain and check. All you need to do is to formulate your model in such a way that it does not use gender (and inherent gender characteristics) as predictors. Computers cannot make decisions on the basis of characteristics that you do not input into their model, so if you have control over this it should be quite simple to check the absence of discrimination. One way you could test this is to take a new test data point (e.g., a new applicant resume) and create two versions that flip the gender of the applicant, then input them into your model to make predictions/decisions --- if the model is operating in a non-discriminatory way then it should make the same prediction/decision for the new data point irrespective of whether you flip the gender.
Things become a bit harder when you use machine-learning models that try to figure out the relevant characteristics themselves, without your input. Even in this case, it should be possible for you to program your model so that it excludes predictors that you specify to be removed (e.g., gender), and it is certainly possible to test for discrimination using test data.