Internships

Speech and Semantic Group, Huawei Noah's Ark Lab
Research Intern
Sep. 2023 -- Feb. 2024, Hong Kong
Mentored by Dr. Yufei Wang

Instruction-Following Evaluation of LLMs: Constructed FollowBench, a multi-level fine-grained constraints following benchmark for systemically and precisely evaluating the instruction-following capability of LLMs.
Multi-Turn Evaluation of LLMs: Introduced MT-Eval, a comprehensive benchmark designed to evaluate multi-turn conversational abilities of LLMs.
Long-Context Evaluation of LLMs: Proposed M4LE, a Multi-ability, Multi-range, Multi-task, Multi-domain long-context evaluation benchmark, covering a wide range of tasks and domains across five context length buckets up to 12k.
Knowledge Editing of LLMs: Proposed a Learning to Edit (LTE) framework for effective and efficient knowledge editing of LLMs.

BEIKE FINANCE
Modeling & Algorithm Intern
Mar. 2020 -- Aug. 2020, Shanghai

Model Building: Created credit scorecard model for business end-users by integrating Logistic Regression (base model) with LightGBM, XGBoost and DNN, which achieved a final AUC of 0.82 and KS of 0.49.
Data Development: Wrote 1500+ lines of HiveQL codes for data query and feature derivation on Zeppelin big data platform.
Indicators Monitoring: According to the needs of the product department, independently designed an AB test of broker churn & recall which tracked the churn rate in 7 days. Launched 1 data table and completed 2 data board.
Customer Personas: Made multi-angle user portraits according to 20+ features including identity, wealth, behaviors for further data mining and feature selection.
Analysis Reports: Extracted the tripartite test data such as UnionPay report, and analyzed the correlation between the scorecard model results and tripartite record, in order to provide reference for model iteration and Y tags determination.

Institute of Neuroscience, Chinese Academy of Science
Data Analysis Intern
Apr. 2019 -- Sep. 2019, Shanghai
Mentored by Prof. Jun Yan

Researched on systematical classification of mouse suprachiasmatic nucleus, completed the process from data cleaning to K-Means modeling by using Python independently.
Iterated and optimized the clustering model, manually analyzed part of the neuron data to obtain supervised information, built a semi-supervised model which improved the clustering effect by 8%.

Ingersoll Rand
System & Data Intern
Jan. 2019 - Feb. 2019, Shanghai

Sorted out the company’s human resources in Asia-Pacific region, used Excel’s VLOOKUP function and pivot table to visually analyzed the personnel data of 18,000+ employees in the past 5 years from dimensions of gender, age, education, etc.
Transformed the analysis results into slides, and provided suggestions for company’s talent recruitment and manpower adjustment.