Generating Ant Care Sheets from Scientific Literature
Introduction
In my previous post, I described how I built a structured knowledge dataset by extracting facts from ~13,000 scientific papers into machine-readable JSON. That dataset gives LLMs access to verified, cited information about ant species. But raw facts alone don’t help an antkeeper trying to raise a colony. What they need is practical guidance, temperature ranges, humidity levels, what to feed them, what nest setup works, what can go wrong.
So I built a pipeline that takes the knowledge dataset and AntWiki as input, and produces complete, structured care sheets for each species. The result: 12,388 care sheets covering species from common beginner ants like Lasius niger to obscure specialists like Acanthognathus brevicornis, each grounded in cited research rather than forum hearsay.
The Inputs: Knowledge Dataset + AntWiki
The care sheet generator combines two data sources:
1. The AntScout Knowledge V1.0 dataset
For each species, all papers mentioning that species are collected into a context block. This includes the paper’s findings (extracted as structured facts), the paper’s title, authors, year, and research focus. These are formatted in TOON (Token-Oriented Object Notation) for efficient token usage.
2. AntWiki pages
For basic natural history data that many research papers don’t cover. AntWiki serves as a supplementary source, always cited if used, alongside the primary research citations.
The key design decision: research papers always take priority over AntWiki. If a paper directly contradicts an AntWiki claim, the (new, high reliability) paper wins. AntWiki fills in gaps where no research exists, it doesn’t override direct evidence.
Old taxa handling
Many papers reference species under names that are no longer current (synonyms, changed classifications). The pipeline uses a taxa renaming map to resolve old names to their current taxonomy, so papers written about Leptothorax makora in 1850 can still be found when generating care sheets for its current Temnothorax makora taxa.
Step 1: Generate the caresheet data JSON
This script handles the full care sheet generation for a single species:
- Paper discovery: Searches through all ~13,000 parsed JSON papers for any that mention the target species (including old synonyms)
- Context assembly: Formats matching papers into TOON notation with numbered IDs and author attribution
- AntWiki retrieval: Loads the corresponding AntWiki page as supplementary context
- AI generation: Sends the assembled context to GLM 5.1 (8 bit, high quality), which generates a complete care sheet in structured JSON
- Post-processing: Cleans the JSON output, renumbers citations sequentially, builds the references list, strips HTML, standardizes text
The system prompt (view here) is extensive, over 460 lines of instructions that cover:
- Writing style: Conversational, plain English, second person (“you/your”), active voice. Scientific terms must be explained on first use.
- Evidence tiers: Confirmed (directly stated in research), Inferred (derived from related evidence), Estimated (rough guess), Unknown (no basis for a claim). The prompt strictly prohibits fabricating biology.
- Citation rules: Every factual claim must be cited with [ID] from the provided TOON context or [AntWiki]. Numbers always need citations. Citation IDs that don’t appear in the context are forbidden.
- Structural requirements: Fixed JSON schema with required fields for summary, colony information, detailed sections, FAQs, and references.
- Practical advice: Temperature and humidity guidance must match evidence precision. Nest recommendations must use established antkeeping setups. Common issues must focus on actual colony-killing problems, not behavioral observations.
The model outputs JSON with this structure:
{
"species_name": "Harpegnathos saltator",
"common_name": "Jerdon's jumping ant",
"general_description": "Harpegnathos saltator is a large (14-17mm) predatory ant from India and Sri Lanka, famous for its remarkable jumping ability and complex social system. Workers have elongated mandibles equipped with trigger hairs and large compound eyes with about 1,600 ommatidia for binocular vision, allowing them to hunt by sight and leap up to 20cm to capture prey or escape predators [1]. This species builds unusually complex nests for a ponerine ant, featuring stacked chambers lined with discarded cocoon fragments ('wall-papering') and a distinctive atrium that helps prevent flooding [2]. The colony operates with a single founding queen who is eventually replaced by multiple gamergates (mated reproductive workers) through a ritualized dominance tournament after the queen ages out [3].",
"summary": {
"difficulty": "Medium",
"origin_habitat": "India and Sri Lanka, primarily in the Western Ghats region. Found in evergreen forests and forest edges, typically nesting in areas with thick leaf litter [2][11].",
"colony_type": "Colonies start with a single founding queen. After the queen dies or becomes senescent (typically 3-4 years), workers compete in dominance tournaments to establish 3-6 gamergates (mated reproductive workers) that take over reproduction [3][4]. Both queens and workers can reproduce - this is one of the few ant species where workers regularly contribute to sexual reproduction through inbreeding with colony males.",
"size_growth": {
"queen_size": "17mm [11]",
"worker_size": "14-17mm [11]",
"colony_size": "Typically 65±40 workers, can reach 300-500 workers in mature colonies [3]",
"growth_rate": "Moderate",
"development_timeline": {
"egg_to_worker": "3-4 months at 25°C [3]",
"notes": "Eggs hatch after ~30 days, larvae develop and spin cocoons after ~33 more days, adults emerge after another 33 days. First workers appear 3-4 months after colony founding."
}
},
"antkeeping_requirements": {
"temperature": "Keep at 24-26°C. Colonies are maintained at 25°C in laboratory conditions [3]. A gentle gradient is recommended.",
"humidity": "Moderate to high. These ants prefer humid conditions but their complex nest architecture with an atrium helps regulate moisture. Keep nest substrate consistently moist but not waterlogged.",
"diapause": "No - this is a tropical species from India/Sri Lanka. No hibernation period is required, but colony activity may slow slightly during cooler periods.",
"nesting": "Best kept in a naturalistic setup with soil or a plaster/acrylic nest with chambers. They build elaborate multi-chambered nests in the wild and will construct complex structures given the opportunity. Test tubes can work for founding colonies but they benefit from more space as they grow."
},
"behavior": "Harpegnathos saltator is an aggressive, solitary forager that hunts primarily by sight. Workers have excellent vision and use their large eyes to spot prey from several centimeters away before launching a rapid attack. They can jump significant distances using their middle and hind legs to capture fleeing or flying prey. Workers are highly defensive and will readily sting threats. Escape prevention is important - while not tiny, they are active and can climb smooth surfaces. They are diurnal and primarily forage during daylight hours [1][5].",
"common_issues": [
"Predatory feeding requirements mean they need live prey - cannot survive on sugar water alone",
"Complex social dynamics may confuse keepers expecting simple queen-worker hierarchy",
"Nests can be elaborate - may need more space than typical test tube setups for established colonies",
"Workers have potent venom and will sting readily when threatened",
"Colony growth is relatively slow compared to many common ant species"
]
},
"colony_information": {
"colony_type": "Facultatively queen-right with gamergate replacement system",
"founding_type": "Semi-claustral - queens leave the nest to forage during founding [6]",
"colony_size_estimate": "Up to 300-500 workers in mature colonies",
"colony_growth_rate": "Moderate",
"colony_traits": {
"monogyne": true,
"polygyne": false,
"oligogyne": false,
"semi_claustral": true,
"claustral": false,
"parasitic": false,
"socially_parasitic": false,
"temporary_parasitic": false,
"dulotic": false,
"slave_making": false,
"gamergate": true,
"pleometrosis": false,
"supercolonial": false,
"facultatively_polygyne": false,
"monogynous_colonies_possible": true,
"polygynous_colonies_possible": false
}
},
"detailed_sections": [
{
"title": "Housing and Nest Setup",
"content": "Harpegnathos saltator benefits from a naturalistic setup that allows for their complex nest-building behavior. In the wild, they construct elaborate multi-chambered nests with stacked levels connected by passages, all enclosed within a spherical earthen shell with an atrium separating the nest from surrounding soil [2]. For captivity, a plaster or acrylic formicarium with multiple connected chambers works well, or you can use a naturalistic setup with soil in an outworld. The nest should have chambers sized appropriately for their body size (workers are 14-17mm). Provide a water tube and keep the nest substrate moderately moist. They are active foragers, so include a spacious outworld for hunting. Use Fluon or similar barrier on the edges of the outworld to prevent escapes - these ants can climb smooth surfaces effectively.",
"citations_used": [
2,
3872
]
},
...
],
"frequently_asked_questions": [
{
"question": "Can I keep Harpegnathos saltator in a test tube setup?",
"answer": "Test tubes work for founding colonies but are not ideal for established colonies. This species naturally builds complex multi-chambered nests and benefits from more space. Consider a plaster or acrylic formicarium with multiple connected chambers, or a naturalistic setup with soil.",
"citations_used": [
2
]
},
...
],
"references": [
{
"number": 1,
"citation": "Xim Cerdá & Alain Dejean (2011)",
"title": "Predation by ants on arthropods and other animals",
"url": "http://p2.storage.canalblog.com/26/72/598270/79242859.pdf"
},
...
]
}
The colony_traits booleans use null for genuinely unknown traits, forcing a true/false when the answer is unknown is worse than admitting uncertainty.
Step 2: Markdown Conversion
Raw JSON isn’t useful on its own. This script converts each JSON care sheet into markdown with interactive citation handling:
- Reference linking: Each
[1]citation becomes an HTML footnote link that points to the original paper’s URL (when available) - Hover tooltips: Citations include
data-*attributes for paper title, authors, year, and journal - Species cross-linking: Other species names mentioned in the text are automatically linked to their care sheets
- Colony traits integration: Colony traits from care sheets are merged back into the species data JSON, powering the species database’s search and filter features
Data Format of a Published Care Sheet
Here’s what a finished care sheet looks like on AntScout, using Acanthognathus brevicornis as an example:
Quick Summary: difficulty, origin, colony type, size/growth data, antkeeping requirements (temperature, humidity, diapause, nesting), behavior, and common issues, all with inline citations.
Detailed Sections: 4-8 topic-specific sections covering nest preferences, feeding, temperature care, behavior, and any species-specific traits (e.g., “Colony Founding”, “Visual Navigation and Tandem Running” for Temnothorax). Each section cites its sources.
FAQs: 8-12 questions using terms antkeepers actually search for: “Can I keep Harpegnathos venator in a test tube?”, “Do they sting?”, “How long until first workers?”
References: Every citation is numbered sequentially by first appearance, with author names, paper title, and a direct link to the original PDF when available.
Why This Approach Works
Grounding over generation: The model isn’t asked to generate care advice from scratch. It synthesizes an answer from provided evidence, and the system prompt forces it to cite every claim. If something isn’t in the source material, the model must say “Unconfirmed” or “Unknown” rather than making something up.
Two-source verification: Having both research papers and AntWiki means gaps in one can be filled by the other, but contradictions are handled by prioritizing the primary source.
Old taxonomy resolution: A paper from 1890 about “Myrmica laevinodis” is still useful when generating the care sheet for its current name, Myrmica rubra. The taxa renaming map handles this automatically.
Language simplification: The system prompt includes an extensive list of scientific terms and their plain-English replacements. “Mesophilous habitat” becomes “damp, shaded spots.” “Tandem-running recruitment” becomes “lead each other to food.” Every sentence is written for a 14-year-old beginner.
Honesty about uncertainty: The prompt explicitly forbids several common AI failure patterns: stating collection dates as nuptial flight timing, assuming founding type from genus alone without evidence, padding sections with vague filler, and using phrases like “not documented in the scientific literature” that imply a comprehensive review was done.
Legal Notice
The care sheet content is generated by AI synthesis of publicly available scientific literature and AntWiki data. The following applies:
- Source material: All cited facts are extracted from papers in the FORMIS2024/AntCat bibliography and AntWiki. Original papers retain their respective copyrights.
- Attribution: Every claim in each care sheet includes its source reference. This ensures full traceability to the original research.
- Disclaimer: While every effort has been made to accurately synthesize information from source papers, AI-generated care sheets may contain errors. Users should verify critical information against the original sources. The care sheets are provided “as is” without warranty of any kind.
- Intended use: These care sheets are intended as starting points for antkeepers, not definitive guides. Always cross-reference with original research and experienced keepers when making decisions about colony care.
- AntWiki content: AntWiki is licensed under CC BY-SA 4.0. AntWiki content used in care sheets is attributed via [AntWiki] citations.
- Care sheet license: The generated care sheets themselves are also released under CC BY-SA 4.0. You are free to share and adapt them, even commercially, as long as you give appropriate credit and distribute derivative works under the same license.
Leave a Comment