As cyber threats grow in volume and complexity, traditional security measures are struggling to keep up. Rapid collection, analysis, and application of Cyber Threat Intelligence (CTI) is now essential for effective defense. Integrating Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) offers a new approach to CTI analysis. The combination uses LLMs’ advanced text processing, anchored in real-time, to enable automated cyber threat intelligence for cyber security.

RAGRecon Proof-of-Concept

The RAGRecon system was developed as a novel proof-of-concept that integrates LLMs, RAG, and Knowledge Graphs (KGs) to deliver explainable CTI. Its core function is to provide clear, context-aware answers to complex cybersecurity questions. The system ingests domain-specific documents, such as technical threat reports. Also, when a user poses a query, it retrieves the most relevant information to form a context.

In context, RAGRecon generates not only a conversational textual answer but also a visual Knowledge Graph that graphically represents the key entities and relationships the model used in its reasoning. A dual-output approach provided a transparent, interpretable layer, enabling analysts to trust and verify the system’s conclusions in real-time.

RAGRecon, a system designed to improve Cyber Threat Intelligence through the integration of Large Language Models and Retrieval-Augmented Generation.

Foundational Work, Architecture and Performance of the RAGRecon System

As seen in the research, RAGRecon used an end-to-end data pipeline for efficient processing and retrieval of unstructured CTI data. Source documents, such as PDF reports, are segmented into 1000-character chunks with a 100-character overlap to maintain context. Each chunk is converted into embedding using the sentence-transformers/all-MiniLM-L6-v2 model.

Read More: DRAFT-RL | First LLM Evaluation Framework to Integrate Structured Reasoning with Multi-Agent RL

RAGRecon evaluated LLMs using two custom datasets, one on conventional CTI and another on blockchain-specific threats, each with 50 questions. The RAG system showed strong factual and reliability scores consistently above 0.8 out of 1.0, indicating minimal hallucination.

The retrieval mechanism was efficient, as only 8% of the retrieved context achieved a complete answer. Manual analysis of 2,050 automated decisions confirmed a correct decision rate of 90% to 97%. Minor performance variations linked to occasional errors from both the generation and self-evaluation models.

RAGRecon Dual-Output Generation For Textual and Visual Insights

A key innovation of RAGRecon is its dual-output capability, delivering both a direct textual answer. As well as a visual explanation of data relationships. For the textual response, the setup integrated the retrieved context and user query into a model-agnostic prompt. Next, instructing the LLM to generate a coherent answer based solely on the provided information. Simultaneously, a specialized prompt directs the LLM to extract primary entities and relationships. While outputting them in a structured JSON format, then parsed and visualized as an interactive graph.

Related: Is This “Humanity’s Last Exam”… For Language Models?

RAGRecon Enhanced Efficacy for Security Operations

The research provides a valuable option for automated cyber threat intelligence for cybersecurity professionals. Notably, a robust RAGRecon system will reduce the cognitive load and manual effort needed to analyze unstructured cyber threat reports. Also, it will enable cyber analysts to quickly visualize and understand complex cyber threats. At the same time, identify cyber security vulnerabilities, and develop cyber threat responses. At the same time, accelerating the cyber intelligence lifecycle from data to actionable insight.

Conclusion

The field of Cyber Threat Intelligence requires AI systems that are both powerful and explainable. This proposal builds on the proven RAGRecon system, which already generates accurate, fact-based answers from complex security documents. The research addressed the main limitations by developing robust methods for reliable Knowledge Graph generation, the key to its explainability. Resolving the bottleneck will unlock the full potential of an LLM-driven CTI analysis. While delivering a trustworthy and effective tool for cyber security professionals.

Sources

Large Language Models for Explainable Threat Intelligence. (RAGRecon). Large Language Models, 7 Nov 2025. https://arxiv.org/abs/2511.05406

Large Language Models for Explainable Threat Intelligence. (RAGRecon). Large Language Models, 7 Nov 2025. https://arxiv.org/pdf/2511.05406

Disclosure: This Page may contain affiliate links. We may receive compensation if you click on these links and make a purchase.

You May Also Like

More From Author

+ There are no comments

Add yours