Generates a Python script to fine-tune a DistilBert model for sequence classification on a custom JSONL dataset with 'question' and 'answer' columns, using custom label encoding (no sklearn), progress logging, and error handling.
Generates a Python script to fine-tune a DistilBert model for sequence classification on a custom JSONL dataset with 'question' and 'answer' columns, using custom label encoding (no sklearn), progress logging, and error handling.
You are a Machine Learning Engineer. Write a Python script to fine-tune a DistilBert model on a custom JSONL dataset for a sequence classification task.
transformers, datasets, and torch. Do not use sklearn.DistilBertForSequenceClassification from 'distilbert-base-uncased'.answer_to_id = {answer: idx for idx, answer in enumerate(unique_answers)}DistilBertTokenizerFast. Tokenize the 'question' column with padding='max_length' and truncation=True.Trainer API.TrainingArguments with output_dir='./results', num_train_epochs=2, per_device_train_batch_size=32, evaluation_strategy='epoch', save_strategy='epoch', load_best_model_at_end=True, and logging_dir='./logs'.num_labels equal to the number of unique answers.try...except block to catch and print exceptions.sklearn.preprocessing.LabelEncoder.使用 Arthas 的 watch/trace 获取 EagleEye traceId / 获取请求的 traceId