LaTeX Reference Checker: Bash Script to Find Unused Labels and Missing Citations
A robust bash script that audits LaTeX documents to find unreferenced figures, tables, equations, and citation issues while ignoring commented code
Overview
When preparing research manuscripts, it’s common to accumulate unused labels, orphaned citations, and commented-out content. This bash script provides a comprehensive audit of your LaTeX document, identifying:
- Unreferenced figures, tables, and equations
- Unused bibliography entries
- Missing citations (cited but not in .bib file)
- All while properly ignoring commented lines
Key Feature: The script preserves the sequence in which labels appear in your document, making it easier to locate and manage them.
The Problem
During manuscript development, you might encounter:
| Issue | Example | Impact |
|---|---|---|
| Unreferenced labels | \label{fig:old_analysis} never used | Clutters document, confuses reviewers |
| Commented labels counted | % \label{tab:removed} still detected | False positives in checks |
| Missing citations | \cite{smith2023} but not in .bib | Compilation errors |
| Unused bibliography | References added but never cited | Inflates reference count |
Manual checking across 50+ pages with multiple revisions becomes impractical.
The Solution
Features
✅ Comment-aware: Ignores both full-line and inline comments
✅ Sequence-preserving: Shows labels in document order
✅ Comprehensive: Checks figures, tables, equations, and citations
✅ Multiple reference styles: Supports \ref, \autoref, and \eqref
✅ Lightweight: Pure bash, no dependencies
✅ Fast: Processes typical manuscripts in <1 second
Installation and Usage
Step 1: Create the Script
Copy the following code and save it as check_latex_refs.sh:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
#!/bin/bash
if [ $# -eq 0 ]; then
echo "Usage: $0 <latex_file.tex>"
exit 1
fi
TEXFILE=$1
# Remove comments in two steps:
# Step 1: Remove lines that start with % (including whitespace before %)
# Step 2: Remove inline comments (everything after % that's not \%)
TEXCONTENT=$(grep -v '^\s*%' "$TEXFILE" | sed 's/\([^\\]\)%.*$/\1/')
echo "Checking references in: $TEXFILE"
echo "========================================"
# Tables (preserving order, excluding comments)
echo -e "\n=== TABLES ==="
TAB_LABELS=$(echo "$TEXCONTENT" | grep -oP '\\label\{tab:\K[^}]+')
TAB_LABELS_UNIQ=$(echo "$TAB_LABELS" | awk '!seen[$0]++')
# Catches both \ref and \autoref
TAB_REFS=$(echo "$TEXCONTENT" | grep -oP '\\(auto)?ref\{tab:\K[^}]+' | awk '!seen[$0]++')
TAB_COUNT=$(echo "$TAB_LABELS_UNIQ" | grep -v '^$' | wc -l)
REF_COUNT=$(echo "$TAB_REFS" | grep -v '^$' | wc -l)
echo "Total labels: $TAB_COUNT"
echo "Total refs: $REF_COUNT"
echo "Unreferenced tables:"
while IFS= read -r label; do
if ! echo "$TAB_REFS" | grep -qx "$label"; then
echo " tab:$label"
fi
done <<< "$TAB_LABELS_UNIQ"
# Figures (preserving order, excluding comments)
echo -e "\n=== FIGURES ==="
FIG_LABELS=$(echo "$TEXCONTENT" | grep -oP '\\label\{fig:\K[^}]+')
FIG_LABELS_UNIQ=$(echo "$FIG_LABELS" | awk '!seen[$0]++')
# Catches both \ref and \autoref
FIG_REFS=$(echo "$TEXCONTENT" | grep -oP '\\(auto)?ref\{fig:\K[^}]+' | awk '!seen[$0]++')
FIG_COUNT=$(echo "$FIG_LABELS_UNIQ" | grep -v '^$' | wc -l)
FIGREF_COUNT=$(echo "$FIG_REFS" | grep -v '^$' | wc -l)
echo "Total labels: $FIG_COUNT"
echo "Total refs: $FIGREF_COUNT"
echo "Unreferenced figures:"
while IFS= read -r label; do
if ! echo "$FIG_REFS" | grep -qx "$label"; then
echo " fig:$label"
fi
done <<< "$FIG_LABELS_UNIQ"
# Equations (preserving order, excluding comments)
echo -e "\n=== EQUATIONS ==="
EQ_LABELS=$(echo "$TEXCONTENT" | grep -oP '\\label\{eq:\K[^}]+')
EQ_LABELS_UNIQ=$(echo "$EQ_LABELS" | awk '!seen[$0]++')
# Catches \ref, \autoref, and \eqref
EQ_REFS=$(echo "$TEXCONTENT" | grep -oP '\\((auto|eq))?ref\{eq:\K[^}]+' | awk '!seen[$0]++')
EQ_COUNT=$(echo "$EQ_LABELS_UNIQ" | grep -v '^$' | wc -l)
EQREF_COUNT=$(echo "$EQ_REFS" | grep -v '^$' | wc -l)
echo "Total labels: $EQ_COUNT"
echo "Total refs: $EQREF_COUNT"
echo "Unreferenced equations:"
while IFS= read -r label; do
if ! echo "$EQ_REFS" | grep -qx "$label"; then
echo " eq:$label"
fi
done <<< "$EQ_LABELS_UNIQ"
# Citations (excluding comments)
echo -e "\n=== CITATIONS ==="
BIBFILE=$(echo "$TEXCONTENT" | grep -oP '\\bibliography\{\K[^}]+' | head -1)
if [ -n "$BIBFILE" ]; then
[[ "$BIBFILE" != *.bib ]] && BIBFILE="${BIBFILE}.bib"
if [ -f "$BIBFILE" ]; then
echo "Using bibliography file: $BIBFILE"
BIB_ENTRIES=$(grep -oP '@\w+\{\K[^,]+' "$BIBFILE" | awk '!seen[$0]++')
CITATIONS=$(echo "$TEXCONTENT" | grep -oP '\\cite[tp]?\{\K[^}]+' | tr ',' '\n' | sed 's/^[[:space:]]*//' | awk '!seen[$0]++')
BIB_COUNT=$(echo "$BIB_ENTRIES" | grep -v '^$' | wc -l)
CITE_TOTAL=$(echo "$TEXCONTENT" | grep -oP '\\cite[tp]?\{[^}]+\}' | wc -l)
CITE_UNIQ=$(echo "$CITATIONS" | grep -v '^$' | wc -l)
echo "Total bib entries: $BIB_COUNT"
echo "Total citation commands: $CITE_TOTAL"
echo "Unique cited keys: $CITE_UNIQ"
echo "Uncited bibliography entries:"
while IFS= read -r entry; do
if ! echo "$CITATIONS" | grep -qx "$entry"; then
echo " $entry"
fi
done <<< "$BIB_ENTRIES"
echo "Missing bibliography entries (cited but not in .bib):"
while IFS= read -r cite; do
if ! echo "$BIB_ENTRIES" | grep -qx "$cite"; then
echo " $cite"
fi
done <<< "$CITATIONS"
else
echo "Bibliography file not found: $BIBFILE"
fi
else
echo "No bibliography file found in document"
fi
echo -e "\n========================================"
echo "Check complete!"
Step 2: Make it Executable
1
chmod +x check_latex_refs.sh
Step 3: Run the Checker
1
./check_latex_refs.sh manuscript.tex
Example Output
Here’s what the script reports for a typical manuscript:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Checking references in: paper.tex
========================================
=== TABLES ===
Total labels: 11
Total refs: 8
Unreferenced tables:
tab:appendix_data
tab:extra_results
tab:summary_stats
=== FIGURES ===
Total labels: 6
Total refs: 6
Unreferenced figures:
=== EQUATIONS ===
Total labels: 15
Total refs: 12
Unreferenced equations:
eq:supplementary
eq:variance
eq:alt_form
=== CITATIONS ===
Using bibliography file: references.bib
Total bib entries: 45
Total citation commands: 52
Unique cited keys: 38
Uncited bibliography entries:
smith2020old
jones2019unused
brown2018extra
Missing bibliography entries (cited but not in .bib):
nguyen2023missing
========================================
Check complete!
Note: The script detects references made with \ref{fig:one}, \autoref{tab:results}, or \eqref{eq:main} - all are counted correctly.
Technical Deep Dive
How It Works
1. Comment Removal
The script uses a two-step approach to handle comments:
1
2
3
# Step 1: Remove lines starting with %
# Step 2: Remove inline comments but preserve \%
TEXCONTENT=$(grep -v '^\s*%' "$TEXFILE" | sed 's/\([^\\]\)%.*$/\1/')
Important: The order matters! We must remove full-line comments first, then handle inline comments.
Test cases:
| LaTeX Code | Processed As |
|---|---|
\label{fig:test} | ✓ Included |
% \label{fig:old} | ✗ Excluded |
Text \ref{fig:a} % comment | ✓ \ref{fig:a} included |
50\% efficiency | ✓ \% preserved |
2. Sequence Preservation
Uses awk '!seen[$0]++' to remove duplicates while maintaining order:
1
2
3
4
5
# Traditional approach (loses order)
sort -u
# Our approach (preserves order)
awk '!seen[$0]++'
Why this matters:
If your document has:
1
2
3
4
5
6
7
8
\section{Methods} % Line 50
\label{fig:workflow} % Line 65
\section{Results} % Line 150
\label{fig:accuracy} % Line 170
\section{Appendix} % Line 300
\label{fig:extra} % Line 310 (unreferenced)
The output shows fig:extra in document order (not alphabetically sorted), making it easier to locate.
3. Pattern Matching
Uses Perl-compatible regex with grep -oP:
1
2
3
4
5
# For tables and figures: catches both \ref and \autoref
grep -oP '\\(auto)?ref\{tab:\K[^}]+'
# For equations: catches \ref, \autoref, and \eqref
grep -oP '\\((auto|eq))?ref\{eq:\K[^}]+'
Breakdown:
\\(auto)?ref\{tab:- Match\ref{tab:or\autoref{tab:\\((auto|eq))?ref\{eq:- Match\ref{eq:,\autoref{eq:, or\eqref{eq:\K- Discard everything matched so far[^}]+- Capture everything until}
Supported reference commands:
\ref{...}- Standard LaTeX reference\autoref{...}- Automatic reference from hyperref package\eqref{...}- Equation reference (for equations only)
Examples: Examples:
Labels detected:
\label{tab:results}→ extractsresults\label{fig:analysis_2023}→ extractsanalysis_2023\label{eq:main_theorem}→ extractsmain_theorem
References detected:
\ref{fig:diagram}→ matchesdiagram\autoref{tab:results}→ matchesresults\eqref{eq:einstein}→ matcheseinstein
Not matched:
% \label{tab:old}(filtered by comment removal)\label{sec:intro}(different prefix - use section extension)
Advanced Usage
1. Check Multiple Files
Create check_all.sh:
1
2
3
4
5
6
7
8
9
#!/bin/bash
for file in *.tex; do
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "File: $file"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
./check_latex_refs.sh "$file"
echo ""
done
2. Generate Timestamped Reports
1
2
3
4
5
6
7
8
# Create dated log
./check_latex_refs.sh paper.tex > "audit_$(date +%Y%m%d_%H%M%S).log"
# Compare before/after revision
./check_latex_refs.sh paper.tex > before_revision.log
# ... make changes ...
./check_latex_refs.sh paper.tex > after_revision.log
diff before_revision.log after_revision.log
3. Git Pre-commit Hook
Add to .git/hooks/pre-commit:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/bin/bash
# Run check
./check_latex_refs.sh main.tex > ref_check.log
# Count unreferenced items
UNREFERENCED=$(grep -c "^ " ref_check.log)
if [ $UNREFERENCED -gt 0 ]; then
echo "⚠️ Warning: $UNREFERENCED unreferenced items found"
cat ref_check.log
read -p "Continue commit? (y/n) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 1
fi
fi
4. Makefile Integration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Makefile
.PHONY: check clean all
all: manuscript.pdf check
manuscript.pdf: manuscript.tex
pdflatex manuscript.tex
bibtex manuscript
pdflatex manuscript.tex
pdflatex manuscript.tex
check:
@./check_latex_refs.sh manuscript.tex
clean:
rm -f *.aux *.log *.bbl *.blg *.out *.toc
audit:
@./check_latex_refs.sh manuscript.tex > audit_$(shell date +%Y%m%d).log
@cat audit_$(shell date +%Y%m%d).log
Usage:
1
2
3
make # Compile and check
make check # Run reference check only
make audit # Generate timestamped audit
5. Pre-submission Checklist Script
Create pre_submit.sh:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/bin/bash
echo "📋 Pre-submission Checklist"
echo "═══════════════════════════"
# 1. Reference check
echo "1️⃣ Checking references..."
./check_latex_refs.sh manuscript.tex > ref_audit.log
# 2. Count issues
ISSUES=$(grep -c "^ " ref_audit.log)
if [ $ISSUES -eq 0 ]; then
echo " ✓ No unreferenced items"
else
echo " ⚠️ $ISSUES unreferenced items found"
cat ref_audit.log
fi
# 3. Check compilation
echo "2️⃣ Checking compilation..."
pdflatex -interaction=nonstopmode manuscript.tex > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo " ✓ Compiles successfully"
else
echo " ✗ Compilation errors found"
fi
# 4. Check bibliography
echo "3️⃣ Checking bibliography..."
bibtex manuscript > /dev/null 2>&1
echo " ✓ Bibliography processed"
echo "═══════════════════════════"
echo "✅ Checklist complete!"
Extensions and Customizations
Add Line Numbers
Modify the unreferenced item display to show line numbers:
1
2
3
4
5
6
while IFS= read -r label; do
if ! echo "$TAB_REFS" | grep -qx "$label"; then
LINE=$(grep -n "\\label{tab:$label}" "$TEXFILE" | cut -d: -f1)
echo " tab:$label (line $LINE)"
fi
done <<< "$TAB_LABELS_UNIQ"
Output:
1
2
3
Unreferenced tables:
tab:appendix_data (line 145)
tab:extra_results (line 203)
Add Section Labels
Add this after the equations section:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Sections (preserving order, excluding comments)
echo -e "\n=== SECTIONS ==="
SEC_LABELS=$(echo "$TEXCONTENT" | grep -oP '\\label\{sec:\K[^}]+')
SEC_LABELS_UNIQ=$(echo "$SEC_LABELS" | awk '!seen[$0]++')
SEC_REFS=$(echo "$TEXCONTENT" | grep -oP '\\ref\{sec:\K[^}]+' | awk '!seen[$0]++')
SEC_COUNT=$(echo "$SEC_LABELS_UNIQ" | grep -v '^$' | wc -l)
SECREF_COUNT=$(echo "$SEC_REFS" | grep -v '^$' | wc -l)
echo "Total section labels: $SEC_COUNT"
echo "Total section refs: $SECREF_COUNT"
echo "Unreferenced sections:"
while IFS= read -r label; do
if ! echo "$SEC_REFS" | grep -qx "$label"; then
echo " sec:$label"
fi
done <<< "$SEC_LABELS_UNIQ"
Color Output
Add ANSI color codes for better visibility:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Add at top of script (after shebang)
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Use in output
echo -e "${BLUE}=== TABLES ===${NC}"
echo "Total labels: $TAB_COUNT"
echo "Total refs: $REF_COUNT"
if [ "$TAB_COUNT" -eq "$REF_COUNT" ]; then
echo -e "${GREEN}✓ All tables referenced!${NC}"
else
echo -e "${RED}Unreferenced tables:${NC}"
while IFS= read -r label; do
if ! echo "$TAB_REFS" | grep -qx "$label"; then
echo -e " ${YELLOW}tab:$label${NC}"
fi
done <<< "$TAB_LABELS_UNIQ"
fi
Export to CSV
Create a CSV export function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/bin/bash
# Export unreferenced items to CSV
TEXFILE=$1
OUTFILE="${TEXFILE%.tex}_unreferenced.csv"
# Run the check and extract unreferenced items
./check_latex_refs.sh "$TEXFILE" > /tmp/ref_check.log
# Create CSV header
echo "Type,Label,Section" > "$OUTFILE"
# Extract and format
grep "^ tab:" /tmp/ref_check.log | sed 's/^ /table,/' >> "$OUTFILE"
grep "^ fig:" /tmp/ref_check.log | sed 's/^ /figure,/' >> "$OUTFILE"
grep "^ eq:" /tmp/ref_check.log | sed 's/^ /equation,/' >> "$OUTFILE"
echo "CSV exported to: $OUTFILE"
JSON Output
Add JSON export capability:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/bin/bash
# Add --json flag support
if [ "$2" == "--json" ]; then
# Extract data
TAB_LABELS=$(echo "$TEXCONTENT" | grep -oP '\\label\{tab:\K[^}]+' | awk '!seen[$0]++')
TAB_REFS=$(echo "$TEXCONTENT" | grep -oP '\\ref\{tab:\K[^}]+' | awk '!seen[$0]++')
# Build unreferenced array
UNREF_TABS=$(comm -23 <(echo "$TAB_LABELS" | sort) <(echo "$TAB_REFS" | sort) | \
awk '{printf "\"%s\",", $0}' | sed 's/,$//')
# Output JSON
cat << EOF
{
"file": "$TEXFILE",
"timestamp": "$(date -Iseconds)",
"tables": {
"total_labels": $(echo "$TAB_LABELS" | grep -v '^$' | wc -l),
"total_refs": $(echo "$TAB_REFS" | grep -v '^$' | wc -l),
"unreferenced": [$UNREF_TABS]
}
}
EOF
fi
Workflow Integration Examples
1. Continuous Integration (CI)
For GitHub Actions (.github/workflows/latex-check.yml):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
name: LaTeX Reference Check
on: [push, pull_request]
jobs:
check-refs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Run reference checker
run: |
chmod +x check_latex_refs.sh
./check_latex_refs.sh manuscript.tex
- name: Count issues
run: |
ISSUES=$(./check_latex_refs.sh manuscript.tex | grep "^ " | wc -l)
echo "Found $ISSUES unreferenced items"
if [ $ISSUES -gt 5 ]; then
echo "Too many unreferenced items!"
exit 1
fi
2. VS Code Task
Add to .vscode/tasks.json:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"version": "2.0.0",
"tasks": [
{
"label": "Check LaTeX References",
"type": "shell",
"command": "./check_latex_refs.sh",
"args": ["${file}"],
"problemMatcher": [],
"presentation": {
"reveal": "always",
"panel": "new"
}
}
]
}
3. Overleaf/Git Sync
For projects synced between Overleaf and Git:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/bin/bash
# sync_and_check.sh
# Pull from Overleaf
git pull origin master
# Run check
./check_latex_refs.sh main.tex > ref_check.log
# If clean, push changes
if [ $(grep -c "^ " ref_check.log) -eq 0 ]; then
git add .
git commit -m "Update: references checked"
git push origin master
else
echo "⚠️ Unreferenced items found. Fix before pushing."
cat ref_check.log
fi
Comparison with Alternatives
| Tool | Pros | Cons | Best For |
|---|---|---|---|
| This script | Fast, customizable, comment-aware, preserves order | Single file only | Quick audits, CI/CD |
refcheck package | LaTeX-integrated, visual markers | Must recompile, clutters PDF | During writing |
chktex | Comprehensive linting, many checks | Verbose, complex output | Deep analysis |
| VS Code LaTeX Workshop | Real-time, editor-integrated | Editor-specific, no batch | Active editing |
| Python scripts | Very flexible, HTML reports | Slower, requires Python | Custom workflows |
Real-World Case Study
Initial State
Manuscript for journal submission with 8 months of revisions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
=== TABLES ===
Total labels: 8
Total refs: 7
Unreferenced tables:
tab:correlation_matrix
=== FIGURES ===
Total labels: 12
Total refs: 12
=== EQUATIONS ===
Total labels: 6
Total refs: 6
=== CITATIONS ===
Total bib entries: 67
Uncited bibliography entries:
lecun2015deep
goodfellow2016deep
chollet2017deep
bishop2006pattern
murphy2012machine
hastie2009elements
james2013introduction
vapnik1995nature
cover2006elements
Actions Taken
- Moved
tab:correlation_matrixto supplementary materials - Removed 9 unused deep learning references (leftover from earlier drafts)
- Cleaned up bibliography from 67 → 58 entries
- Verified all remaining citations were relevant
Final State
1
2
3
4
5
6
7
8
=== TABLES ===
Total labels: 7
Total refs: 7
Unreferenced tables:
=== CITATIONS ===
Total bib entries: 58
Uncited bibliography entries:
Time saved: ~30 minutes of manual checking
Outcome: Clean submission, no reviewer comments about references
Limitations and Workarounds
Current Limitations
- Single file only: Doesn’t traverse
\input{}or\include{}commands - Standard prefixes: Assumes
tab:,fig:,eq:,sec:conventions - Reference commands: Detects
\ref,\autoref, and\eqref(for equations) - Citation commands: Only detects
\cite,\citep,\citet - Bibliography format: Requires standard
\bibliography{}command
Workarounds
For multi-file projects:
1
2
3
4
# Combine files first
cat main.tex chapter*.tex appendix.tex > combined.tex
./check_latex_refs.sh combined.tex
rm combined.tex
For custom prefixes:
1
2
3
# Modify the grep patterns in the script
# Change tab: to tbl:
grep -oP '\\label\{tbl:\K[^}]+'
For additional citation commands:
1
2
# Add to CITATIONS line:
CITATIONS=$(echo "$TEXCONTENT" | grep -oP '\\cite(p|t|author|year|)?\{\K[^}]+' | ...)
For biblatex:
1
2
# Change BIBFILE detection:
BIBFILE=$(echo "$TEXCONTENT" | grep -oP '\\addbibresource\{\K[^}]+' | head -1)
Troubleshooting
Issue: Script shows no output
Solution:
1
2
3
4
5
6
7
8
9
# Check file exists and has content
ls -la manuscript.tex
wc -l manuscript.tex
# Check for labels
grep "\\label{" manuscript.tex | head -5
# Run with debug mode
bash -x check_latex_refs.sh manuscript.tex
Issue: Bibliography not found
Solution:
1
2
3
4
5
6
7
8
# Check bibliography command exists
grep "bibliography" manuscript.tex
# Verify .bib file location
ls -la *.bib
# Check file permissions
ls -la references.bib
Issue: Commented lines still appear
Solution:
1
2
3
4
5
6
7
8
# Check sed version
sed --version
# Try alternative comment removal
TEXCONTENT=$(grep -v '^\s*%' "$TEXFILE" | sed 's/[^\\]%.*$//')
# Or use Perl
TEXCONTENT=$(perl -pe 's/([^\\])%.*$/$1/' "$TEXFILE")
Issue: Special characters in labels
Solution:
1
2
3
4
5
# Escape special characters in grep
grep -oP '\\label\{tab:\K[^}]+' | sed 's/[._-]//g'
# Or use simpler pattern
grep "\\label{tab:" | cut -d'{' -f2 | cut -d'}' -f1
Issue: False positives
Solution:
1
2
3
4
5
# Check if label actually exists in source
grep -n "label{tab:problematic}" manuscript.tex
# Verify it's not in a comment
grep "tab:problematic" manuscript.tex | grep -v "^%"
Performance Optimization
For very large documents (>10,000 lines):
1
2
3
4
5
6
7
8
9
# Use parallel processing for multiple files
find . -name "*.tex" | parallel ./check_latex_refs.sh {}
# Cache intermediate results
TEXCONTENT=$(sed 's/\([^\\]\)%.*$/\1/' "$TEXFILE" | tee /tmp/cleaned.tex)
# Use faster grep alternatives
# Install ripgrep: sudo apt install ripgrep
rg -oP '\\label\{tab:\K[^}]+' "$TEXFILE"
Contributing and Feedback
This tool is open for improvements! If you:
- Find bugs or edge cases
- Have feature suggestions
- Want to add support for other label types
- Created useful extensions
Please share your feedback or contribute to the project.
Version History
v1.0 (2026-01-02)
- Initial release
- Support for tables, figures, equations, citations
- Comment filtering (inline and full-line)
- Sequence-preserving output
Planned features:
- Support for
\input{}and\include{} - Algorithm and listing labels
- HTML output format
- Configurable label prefixes
- Multi-language support
License
MIT License - Free to use, modify, and distribute.
1
2
3
4
5
6
7
8
9
10
11
12
13
MIT License
Copyright (c) 2026 ScholarNote
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
Citation
If you use this tool in your research workflow:
1
2
3
4
5
6
7
@misc{latex_ref_checker_2026,
title={LaTeX Reference Checker: Bash Script for Document Auditing},
author={ScholarNote},
year={2026},
url={https://scholarsnote.org/latex-reference-checker-bash/},
note={Accessed: 2026-01-02}
}
Last updated: January 2, 2026
Tested on: Ubuntu 20.04+, macOS 12+, Git Bash (Windows)
Script version: 1.0
ScholarNote provides free tools and resources for academic researchers. Follow us for updates on new tools and LaTeX productivity tips.