extract structured benchmark information from academic papers when the input includes multiple pdfs, abstracts, or links and the user needs chinese notes or table-ready fields for tasks, datasets, metrics, baselines, sota claims, code release, or evaluation settings.