NeurIPS Paper Checklist: IRC-Bench

IRC-Bench: Recognizing Entities from Contextual Cues in First-Person Reminiscences
Alexander Apartsin, Eden Moran, Yehudit Aperstein

NeurIPS Paper Checklist

The following checklist is required for all NeurIPS submissions. Answers are provided with justifications relevant to this paper.

1. Claims

Question: Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope?

Answer: Yes.

Justification: All claims (dataset size, non-locality property, 19 configurations, specific performance numbers) are supported by the experimental results in Section 5. Limitations are discussed in Section 6.

2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors?

Answer: Yes.

Justification: Section 6 discusses limitations including English-only scope, LLM-generated elision, alias-aware evaluation gaps, and QLoRA sequence length truncation.

3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (or correct) proof?

Answer: NA.

Justification: The paper does not include formal theorems. The non-locality property (Section 4.2) is defined formally and validated empirically rather than proved theoretically.

4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper?

Answer: Yes.

Justification: All model names, hyperparameters (Section 4.3, 4.4), evaluation metrics (Section 4.6), data splits (Table 2), and prompt templates are fully specified. Code and data will be publicly released.

5. Open Access to Data and Code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results?

Answer: Yes.

Justification: All data, code, and evaluation tools are publicly released under an open license.

6. Experimental Setting/Details

Question: Does the paper specify all the training and test details necessary to understand the results?

Answer: Yes.

Justification: Training details (QLoRA: 4-bit NF4, rank 16, alpha 32, 2 epochs; DPR: 3 epochs, batch 48, lr 2e-5), evaluation protocol (four-tier matching), and statistical tests (McNemar's) are fully specified.

7. Experiment Statistical Significance

Question: Does the paper report error bars, confidence intervals, or statistical significance tests?

Answer: Yes.

Justification: All key comparisons use McNemar's test with continuity correction (p < 0.001). Bootstrap confidence intervals (1,000 resamples) are computed for main results.

8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources needed to reproduce them?

Answer: Yes.

Justification: QLoRA fine-tuning was performed on a single NVIDIA RTX GPU with 6GB VRAM. DPR training used similar consumer-grade hardware. API-based models (GPT-4o, GPT-4.1-mini) were accessed via OpenAI Batch API.

9. NeurIPS Code of Ethics

Question: Does the research conform to the NeurIPS Code of Ethics?

Answer: Yes.

Justification: The research uses publicly available oral history transcripts from institutional archives. No human subjects were recruited for this study.

10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work?

Answer: Yes.

Justification: Positive impacts include improved accessibility of oral history archives and support for reminiscence therapy. The English-only, American-focused scope limits generalizability. The benchmark does not enable surveillance or profiling capabilities beyond what existing NER systems provide.

11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models?

Answer: Yes.

Justification: The dataset is derived from publicly available oral history archives that were already consented and released by their respective institutions. No private or sensitive information beyond what is in the public transcripts is included.

12. Licenses for Existing Assets

Question: Are the creators of existing assets (data, code, models) properly credited and are the license terms respected?

Answer: Yes.

Justification: All source collections are cited (Table 1). Models (Llama 3.1, BGE-base) are cited with their respective papers. All assets are used under their original licenses.

13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?

Answer: Yes.

Justification: IRC-Bench is documented with full statistics (Tables 1, 2), construction pipeline (Section 3, Figure 1), quality validation (Section 6), and will be released with a datasheet and usage documentation.

14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable?

Answer: NA.

Justification: No crowdsourcing or human subjects research was conducted. Quality validation used automated GPT-4o assessment rather than human annotators.

15. IRB Approvals or Equivalent for Research with Human Subjects

Question: Does the paper describe potential risks incurred by study participants?

Answer: NA.

Justification: No human subjects were involved. The data consists of previously published oral history transcripts from institutional archives.

16. Dataset License

Question: Does the dataset have a license consistent with its intended use?

Answer: Yes.

Justification: IRC-Bench will be released under a permissive open-source license for research use. Source transcripts are from public domain and openly licensed institutional archives.

17. Consent

Question: If the dataset relates to people, was consent obtained?

Answer: Yes.

Justification: All oral history transcripts were collected and published by their respective institutional archives (Library of Congress, universities, memorial projects) with informed consent from participants as part of standard oral history collection protocols.

18. Personal Identifiable Information (PII)

Question: Does the dataset contain data that might be considered personal, sensitive, or private?

Answer: Yes.

Justification: The oral history transcripts contain personal narratives and names of individuals. However, all transcripts were previously published by their respective archives with participant consent, and the benchmark samples are derivative summaries rather than verbatim transcript excerpts.