I want to train (fine-tune) a seq2seq model to perform the task of rephrasing input following these rules :
1- always follow the pattern "Entity Verb Entity"
2- only use simple sentences : never combine sentences
3- Don't replace existing words
4- Don't lose the overall meaning of the text or any information in it.
For example:
text = "Project Risk Management includes the processes of conducting risk management planning, identification, analysis, response planning, response implementation, and monitoring risk on a project"
Standardized Text = "Project Risk Management conducts risk management planning. Project Risk Management conducts risk identification. Project Risk Management conducts risk analysis. Project Risk Management plans responses. Project Risk Management implements responses. Project Risk Management monitors risk on a project."
Using ChatGPT the results were very good, but I want to know if I can fine tune a model (BERT, T5, any LM) locally, what should be the data format for training such a model, evaluation metrics ?