Self-Instructμ eval datasetμ νκ΅μ΄ λͺ¨λΈμμ μ¬μ© κ°λ₯νλλ‘ λ²μν λ°μ΄ν° λͺ¨μμ λλ€.
- 2023.04.14: λΉλκΈ°λ‘ μ²λ¦¬νμ¬ μ‘°κΈ λ μκ°μ μλ μ μμ΅λλ€!
user_oriented_instructions_deepl_ko.jsonl: deeplλ‘ λ²μν νκ° λ°μ΄ν° μ§ν©user_oriented_instructions_chatgpt_ko.jsonl: GPT-3.5-turboλ‘ λ²μν νκ° λ°μ΄ν° μ§ν©user_oriented_instructions_gpt4_ko.jsonl: GPT-4λ‘ λ²μν νκ° λ°μ΄ν° μ§ν©
- openai api
- deepl api
- poetry ν¨ν€μ§λ₯Ό μ΄μ©ν΄μ μμ‘΄μ±μ κ΄λ¦¬ν©λλ€.
# poetry μ€μΉ pip install poetry # μμ‘΄μ± ν¨ν€μ§ μ€μΉ poetry install # poetry dotenv νλ¬κ·ΈμΈ μ€μΉ poetry self add poetry-dotenv-plugin # OPENAI_API_KEY νκ²½λ³μ μ€μ poetry run dotenv set OPENAI_API_KEY {OPENAI_API_KEY} # DEEPL_API_KEY poetry run dotenv set DEEPL_API_KEY {DEEPL_API_KEY}-
dataset μ 체λ₯Ό λ²μνμ¬ μ μ₯ν©λλ€.
-
run
poetry run python dataset_prepare.py
- log
(base) persuade@nlp-server-10:/mnt/md0/persuade/self-instruct-eval-ko$ poetry run python dataset_prepare.py in_filepath: user_oriented_instructions.jsonl out_filepath: user_oriented_instructions_deepl_ko.jsonl 100%|ββββββββββββ| 252/252 [00:07<00:00, 32.58it/s] in_filepath: user_oriented_instructions.jsonl out_filepath: user_oriented_instructions_chatgpt_ko.jsonl 100%|ββββββββββββ| 252/252 [00:43<00:00, 5.81it/s] in_filepath: user_oriented_instructions.jsonl out_filepath: user_oriented_instructions_gpt4_ko.jsonl 100%|100%|ββββββββββββ| 252/252 [02:40<00:00, 1.57it/s]-
unit function ν μ€νΈ μ©λμ λλ€.
-
run example
poetry run python translate.py
- log
(base) persuade@nlp-server-10:/mnt/md0/persuade/self-instruct-eval-ko$ poetry run python translate.py μλ¬Έ: hello world! DEEPL: μλ
νμΈμ! ChatGPT 3.5: μλ
, μΈμ! GPT-4: μλ
νμΈμ, μΈμ! DEV MODE IS ON, only 5 objs are converted in_filepath: user_oriented_instructions.jsonl out_filepath: user_oriented_instructions_ko.jsonl 0it [00:00, ?it/s] ORIGINAL: {'id': 'user_oriented_task_0', 'motivation_app': 'Grammarly', 'instruction': 'The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words.', 'instances': [{'input': 'If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.', 'output': "If you have any questions about my rate or find it necessary to increase or decrease this project's scope, please let me know."}]} TRANSLATED: {'id': 'user_oriented_task_0', 'motivation_app': 'Grammarly', 'instruction': 'μ£Όμ΄μ§ λ¬Έμ₯μ΄ λ무 κΈΈκ±°λ 볡μ‘νκ±°λ λΆλΆλͺ
ν μ μμ΅λλ€. λ¬Έμ₯μ λ€μ μ°κ³ κ°κ²°νκ² μ μ§νμ¬ κΈμ λͺ
ννκ² λ§λμΈμ. κ°λ₯ν ν 볡μ‘ν λ¬Έμ₯μ μ¬λ¬ λ¬Έμ₯μΌλ‘ λλκ³ λΆνμν λ¨μ΄λ₯Ό μ κ±°νμΈμ.', 'instances': [{'input': 'μ μκΈμ λν΄ κΆκΈν μ μ΄ μκ±°λ μ΄ νλ‘μ νΈμ λ²μλ₯Ό λ리거λ μ€μΌ νμκ° μλ€κ³ μκ°λλ©΄ μλ €μ£ΌμΈμ.', 'output': 'μ κ° μ μν κΈμ‘μ λν μ§λ¬Έμ΄ μκ±°λ μ΄ νλ‘μ νΈμ λ²μλ₯Ό λ리거λ μ€μΌ νμκ° μλ€κ³ μκ°νμλ©΄ μλ €μ£ΌμΈμ.'}]} 1it [00:33, 33.88s/it] ORIGINAL: {'id': 'user_oriented_task_1', 'motivation_app': 'Grammarly', 'instruction': 'Analyze the word choice, phrasing, punctuation, and capitalization in the given email. How may the writer of this email sound to the reader? These tones include Disheartening, Accusatory, Worried, Curious, Surprised, Disapproving, Unassuming, Formal, Assertive, Confident, Appreciative, Concerned, Sad, Informal, Regretful, Encouraging, Egocentric, Joyful, Optimistic, and Excited.', 'instances': [{'input': "Hi Jen, \nI hope you're well. Can we catch up today? I'd appreciate your input on my presentation for tomorrow's meeting. I'd especially love it if you could double-check the sales numbers with me. There's a coffee in it for you!", 'output': 'Confident'}]} TRANSLATED: {'id': 'user_oriented_task_1', 'motivation_app': 'Grammarly', 'instruction': 'μ£Όμ΄μ§ μ΄λ©μΌμμ λ¨μ΄ μ ν, νν, ꡬλμ , λμλ¬Έμ μ¬μ©μ λΆμνμΈμ. μ΄ μ΄λ©μΌμ μμ±μκ° λ
μμκ² μ΄λ€ λλμ μ€ μ μλμ? μ΄λ¬ν λλλ€μ λμ¬κ°, λΉλμ , κ±±μ μ€λ¬μ΄, νΈκΈ°μ¬, λλ, λΆμΉμΈ, κ²Έμν, 곡μμ , λ¨νΈν, μμ κ° μλ, κ°μ¬νλ, μ°λ €νλ, μ¬ν, λΉκ³΅μμ , νννλ, κ²©λ €μ , μκΈ° μ€μ¬μ , κΈ°μ, λκ΄μ , κ·Έλ¦¬κ³ ν₯λΆν κ²λ€μ΄ ν¬ν¨λ©λλ€.', 'instances': [{'input': 'μλ
Jen,\nμ μ§λ΄κ³ μλμ§ κΆκΈν΄. μ€λ λ§λμ μκΈ° μ’ ν μ μμκΉ? λ΄μΌ νμλ₯Ό μν λ°νμλ£μ λν μ견 μ’ λ£κ³ μΆμ΄. νΉνλ λ§€μΆ μ«μλ₯Ό κ°μ΄ νμΈν΄μ€ μ μλ€λ©΄ μ λ§ μ’κ² μ΄. μ»€νΌ νμ μ¬μ€κ²!', 'output': 'μμ κ° μλ'}]} 2it [01:24, 43.92s/it] ORIGINAL: {'id': 'user_oriented_task_2', 'motivation_app': 'Grammarly', 'instruction': 'Rewrite the given text and correct grammar, spelling, and punctuation errors.', 'instances': [{'input': "If you'd told me year ago that today I would finish a marathon, I would of laughed. Your support had a huge affect on me!", 'output': "If you'd told me a year ago that today I would finish a marathon, I would have laughed. Your support had a huge effect on me!"}]} TRANSLATED: {'id': 'user_oriented_task_2', 'motivation_app': 'Grammarly', 'instruction': 'μ£Όμ΄μ§ ν
μ€νΈλ₯Ό λ€μ μμ±νκ³ , λ¬Έλ², μ² μ λ° κ΅¬λμ μ€λ₯λ₯Ό μμ νμμμ€.', 'instances': [{'input': 'λ§μ½ μλ
μ μ€λ λ§λΌν€μ μμ£Όν κ±°λΌκ³ λ§ν΄μ€¬λ€λ©΄, μμμ κ²μ΄λ€. λμ μ§μ§κ° λμκ² ν° μν₯μ λ―Έμ³€μ΄!', 'output': 'λ§μ½ μλ
μ μ€λ λλ λ§λΌν€μ μμ£Όν κ±°λΌκ³ λ§ν΄μ€¬λ€λ©΄, μμμ κ±°μμ. λΉμ μ μ§μ§κ° μ μκ² μμ²λ μν₯μ λ―Έμ³€μ΄μ!'}]} 3it [01:55, 37.71s/it] ORIGINAL: {'id': 'user_oriented_task_3', 'motivation_app': 'Google Scholar', 'instruction': 'You are given a paper citation, convert it to the requested citation style.', 'instances': [{'input': 'Chicago: Vaswani, Ashish, Shazeer, Noam, Parmar, Niki, Uszkoreit, Jakob, Jones, Llion, Gomez, Aidan N., Kaiser, Lukasz, and Illia Polosukhin. "Attention Is All You Need." arXiv, (2017). https://doi.org/10.48550/arXiv.1706.03762.\nMLA:', 'output': 'Vaswani, Ashish, et al. "Attention Is All You Need." arXiv, 2017, https://doi.org/10.48550/arXiv.1706.03762.'}]} TRANSLATED: {'id': 'user_oriented_task_3', 'motivation_app': 'Google Scholar', 'instruction': 'λΉμ μ λ
Όλ¬Έ μΈμ©λ¬Έμ λ°μμ΅λλ€. μμ²λ μΈμ© μ€νμΌλ‘ λ³κ²½ν΄μ£ΌμΈμ.', 'instances': [{'input': 'μμΉ΄κ³ : λ°μ€μλ, μμμ, μ€μ§μ΄, λ
Έμ, νλ§λ₯΄, λν€, μ°μμ½λ μ΄νΈ, μΌμ½₯, μ‘΄μ€, 릴리μ¨, κ³ λ©μ¦, μμ΄λ¨ μ., μΉ΄μ΄μ , 루카μ, λ° μΌλ¦¬μ ν΄λ‘μν¨. "μ£Όλͺ©λ§ μμΌλ©΄ μΆ©λΆν©λλ€." arXiv, (2017). https://doi.org/10.48550/arXiv.1706.03762.\nMLA:', 'output': 'λ°μ€μλ, μμμ λ±. "μ£Όμκ° μ λΆμ
λλ€." arXiv, 2017, https://doi.org/10.48550/arXiv.1706.03762.'}]} 4it [02:34, 38.46s/it] ORIGINAL: {'id': 'user_oriented_task_4', 'motivation_app': 'Grammarly', 'instruction': "Desk jobs require writing a lot of emails, so it isn't surprising we get tired of repeating ourselves. Come up with several synonyms for the given word.", 'instances': [{'input': 'Sincerely', 'output': 'Best regards, All the best, Cheers, Best'}]} TRANSLATED: {'id': 'user_oriented_task_4', 'motivation_app': 'Grammarly', 'instruction': 'μ±
μ μΌμ λ§μ μ΄λ©μΌμ μμ±ν΄μΌ νκΈ° λλ¬Έμ, μ°λ¦¬κ° κ³μ λ°λ³΅ν΄μ μ§μΉλ κ²μ λλμ§ μλ€. μ£Όμ΄μ§ λ¨μ΄μ λν λͺ κ°μ§ λμμ΄λ₯Ό μκ°ν΄λ³΄μΈμ.', 'instances': [{'input': 'μ§μ¬μΌλ‘', 'output': 'κ°μ¬ν©λλ€, λͺ¨λ μ’μ μΌ μμΌμκΈΈ, 건배, μ΅κ³ '}]} 5it [02:52, 31.04s/it] ORIGINAL: {'id': 'user_oriented_task_5', 'motivation_app': 'Gmail', 'instruction': 'If you could help me write an email to my friends inviting them to dinner on Friday, it would be greatly appreciated.', 'instances': [{'input': '', 'output': "Hi there,\n\nI hope you're all doing well. I'm inviting you over for dinner on Friday night. Please let me know if you can make it. I'll be cooking your favorite dishes!\n\nLooking forward to seeing you,"}]} TRANSLATED: {'id': 'user_oriented_task_5', 'motivation_app': 'Gmail', 'instruction': 'λ§μ½ κΈμμΌ μ λ
μμ¬μ μΉκ΅¬λ€μ μ΄λνλ μ΄λ©μΌμ μμ±νλ λ° λμμ£Όμ λ€λ©΄ μ λ§ κ°μ¬νκ² μ΅λλ€.', 'instances': [{'input': '', 'output': 'μλ
νμΈμ,\n\nμ¬λ¬λΆ λͺ¨λ μ μ§λ΄κ³ κ³μκΈΈ λ°λλλ€. κΈμμΌ λ°€μ μ ν¬ μ§μμ μ λ
μμ¬λ₯Ό μν΄ μ¬λ¬λΆμ μ΄λνκ³ μΆμ΅λλ€. κ°λ₯νλ€λ©΄ μλ €μ£ΌμΈμ. μ¬λ¬λΆμ΄ μ’μνλ μμμ λ§λ€μ΄λ³Όκ²μ!\n\nλ΅κΈ°λ₯Ό κΈ°λνλ©°,'}]} 5it [03:17, 39.56s/it]TBD