stanford-crfm/helm: v0.5.2 ...

Scenarios Updated VHELM scenarios for VLMs (#2719, #2684, #2685, #2641, #2691) Updated Image2Struct scenarios (#2608, #2640, #2660, #2661) Added Automatic GPT4V Evaluation for VLM Originality Evaluation Added FinQA scenario (#2588) Added AIR-Bench 2024 (#2698, #2706, #2710, #2712, #2713) Fixed entit...

Full description

Bibliographic Details
Main Authors: Tony Lee, Yifan Mai, Percy Liang, Dilara Soylu, Rishi Bommasani, Dimitris Tsipras, Yian Zhang, Deepak Narayanan, Ryan Chi, Josselin Somerville Roberts, Eric Zelikman, Xuechen Li, Frieda Rong, Brian W. Goldman, Farzaan Kaiyom, Drew Arad Hudson, Ananya Kumar, Yuhui Zhang, Ben Newman, Nathan Kim, Keshav Santhanam, fladhak, Huaxiu Yao, Tianyi, Qian Huang, Michi Yasunaga, AshwinParanjape, Mert Yuksekgonul, Hongyu Ren
Format: Software
Language:unknown
Published: Zenodo 2024
Subjects:
Online Access:https://dx.doi.org/10.5281/zenodo.12018094
https://zenodo.org/doi/10.5281/zenodo.12018094
Description
Summary:Scenarios Updated VHELM scenarios for VLMs (#2719, #2684, #2685, #2641, #2691) Updated Image2Struct scenarios (#2608, #2640, #2660, #2661) Added Automatic GPT4V Evaluation for VLM Originality Evaluation Added FinQA scenario (#2588) Added AIR-Bench 2024 (#2698, #2706, #2710, #2712, #2713) Fixed entity_data_imputation scenario breakage by mirroring source data files (#2750) Models Added google-cloud-aiplatform~=1.48 dependency requirement for Vertex AI client (#2628) Fixed bug with Vertex AI client error handling (#2614) Fixed bug with for Arctic tokenizer (#2615) Added Qwen1.5 110B Chat (#2621) Added TogetherCompletionClient (#2629) Fixed bugs with Yi Chat and Llama 3 Chat on Together (#2636) Added Optimum Intel (#2609, #2674) Added GPT-4o model (#2649, #2656) Added SEA-LION 7B and SEA-LION 7B Instruct (#2647) Added more Gemini 1.5 Flash and Pro versions (#2653, #2664, #2718, #2718) Added Gemini 1.0 Pro 002 (#2664) Added Command R and Command R+ models (#2548) Fixed GPT4V Evaluator Out of Option Range Issue ...