Thoughts on data validation while using LLMs.
Misinformation, hallucinations, inaccuracies, and contextual relevance have become a pervasive issues in today’s digital landscape. We all like to blame social media, and don’t get me wrong. I do place most of the blame there. But overreliance on LLMs like ChatGPT, or more importantly, under-reliance on a process I call “humanization of results,” is a growing risk. (I have been running into cyber blog posts where information is simply not accurate).
A 2024 report by Statista highlights that a substantial portion of the global population encounters false information online, contributing to widespread public concern. “In the United States, 44% of news consumers express strong concern about fake news, underscoring the critical need for effective strategies to combat misinformation.”
Data validation serves as a crucial process in ensuring the accuracy, reliability, and integrity of information. By systematically verifying data against predefined standards, organizations can prevent errors and inconsistencies that may lead to misguided decisions. Implementing robust data validation techniques not only enhances data quality but also streamlines operations, ultimately supporting informed decision-making and maintaining trust in data-driven processes.
Misinformation Controls in ChatGPT
Given that I write a lot, I’m monitoring the risk of my presenting information that is not properly vetted. We have plenty of controls in place in our second set of ice review process, but they are not going to catch hallucinations, inaccuracies, irrelevance, or misinformation.
To mitigate this risk, I have implemented a series of practices designed to mitigate misinformation and reduce the likelihood of LLM hallucinations. These strategies focus on ensuring source quality, promoting transparency, encouraging cross-verification, and leveraging reliable tools for accuracy.
Misinformation Controls
I asked ChatGPT to add the following instructions to its memory:
- Include links to sources that prove what you are saying in your responses, whenever possible.
- When reformatting responses in plain text, please include the links at the end of the response, like a bibliography.
- Prioritize FFIEC.gov, ISACA.org, .edu or .gov domains, then mainstream websites dedicated to the topic in question.
- Prioritize cross-verification by relying on multiple corroborating sources for complex or disputed topics when possible. Include links to both sources.
- When I ask you to format in plain text Meta information about your response should be in all caps.
- In the Meta information about your response, indicate when no reliable source exists, and specify if information is inferred or a best guess based on related facts.
- Reference established databases such as JSTOR or PubMed for academic or medical information.
- Encourage follow-up validation using tools like Snopes, FactCheck.org, or PolitiFact for contentious claims.
- Use up-to-date sources to avoid reliance on outdated or obsolete information.
- Indicate when your response is based on opinions that are controversial.
Summary
These controls work together to promote accuracy and reduce errors in information. By ensuring transparency when sources are unavailable, encouraging cross-verification, and leveraging established databases and fact-checking tools, this approach enhances reliability. It focuses on both the quality of sources and the integrity of the information, building trust in responses provided.
We’d love to hear what other tactics you may have come across. Please feel free to comment.

Original article by Dan Hadaway CRISC CISA CISM. Founder and Information Architect, infotex
”Dan’s New Leaf” – a fun blog to inspire thought in IT Governance.
