How to train your dRAGon
The problem with GPT models is not hallucination, it's dogma.
AI is Google, Wikipedia and the whole mainstream media on steroids. It's phenomenal at assimilating vast amounts of information and regurgitating it in convenient, memetic soundbites that satisfy the low cognitive masses.
But, it's a classic case of GIGO1. Those "vast amounts of information" do not contain all the information, only that which is permitted according to the bias that has been programmed into it. In fact, it might not contain any information at all, if you consider information to be evidence-based and free of conflicts of interests.
Even if that bias is not explicit on the part of the AI companies, it is inevitable given how censored the information is that is made available to it. If you can't find a "controversial" viewpoint before page 20 on Google, then you aren't going to receive it from the GPT either. You will see why "controversial" is in quotes in a moment.
By way of example, I put together a simple RAG2 model using Kaiban and Langchain with OpenAi gpt- under the hood. Left to its own devices (i.e. just using its training data), to the question "Does the COVID vaccine cause cancer", this is the sort of stock response you get (my emphases):
COVID-19 Vaccines and Cancer: What the Evidence Shows
The question of whether COVID-19 vaccines cause cancer has been a significant source of public concern, often fueled by misinformation and anecdotal reports circulating in public discourse. However, extensive scientific research and continuous safety monitoring have consistently demonstrated that there is no causal link between COVID-19 vaccination and cancer.
Regulatory Approval and Safety Standards
COVID-19 vaccines, including those developed by Pfizer-BioNTech, Moderna, and Johnson & Johnson, have undergone rigorous clinical trials and met stringent safety, efficacy, and quality requirements before authorization. Health authorities such as the World Health Organization (WHO), Centers for Disease Control and Prevention (CDC), Health Canada, and the Therapeutic Goods Administration (TGA) have thoroughly investigated and monitored these vaccines' safety profiles.
Scientific Evidence and Expert Consensus
The overwhelming consensus among scientists and medical professionals is that COVID-19 vaccines do not cause cancer. Both mRNA vaccines (Pfizer and Moderna) and viral vector vaccines (Johnson & Johnson) work by triggering the body's immune response to protect against disease, and there is no evidence linking these mechanisms to cancer development. The vaccines' immune responses are designed to be protective, not carcinogenic.
Addressing Specific Concerns
Some concerns have emerged regarding the presence of DNA fragments, such as the SV40 sequence used in the vaccine manufacturing process. However, health authorities maintain that neither the full SV40 nor its fragments increase cancer risk or influence cancer development in individuals. These components are part of standard manufacturing processes and do not pose health risks.
Anecdotal Reports vs. Scientific Evidence
While some individuals have reported perceived increases in cancer diagnoses following vaccine rollout, these personal observations lack scientific backing and are not substantiated by credible health data. It's crucial to distinguish between anecdotal reports and evidence-based research when evaluating vaccine safety.
Ongoing Monitoring and Benefits
Continuous safety monitoring demonstrates that the benefits of vaccination—protecting individuals from severe illness, hospitalization, and death due to COVID-19—far outweigh any negligible risks. Long-term data monitoring continues to enhance our understanding, but vaccination remains a critical public health tool without a causal relationship to cancer.
Individuals are encouraged to make informed decisions about vaccination based on credible scientific research and guidance from healthcare professionals, rather than being misled by unwarranted fears regarding cancer risks.
So, there you are then, overwhelming consensus of Safe & Effective™, pretty much what you would expect if your main source of information is the BBC or Google? It talks of "evidence" but doesn't actually present any. That's dogma, pure and simple. What little reference is made to a view that opposes the sponsored narrative, is nice relegated to the "conspiracy theory" myth bucket without explicitly saying so. It talks of “informed decisions” but only if your information comes from the narrative, not the “unwarranted fears”…
On the other hand, there is an abundance of information that challenges the prevailing narrative, and should at least make you cautious, if not fearful. Furthermore, it is also less dogmatic and more evidence based (contrary to Chat GPT’s scurrilous claims).
Take, for example, the informed analysis of my dear friend Dr Jessica Rose, an eminent computational biologist with more integrity and human compassion than anyone else I know. She has written almost 800 articles on her Substack - Unacceptable Jessica. I confess to not understanding a lot of what she writes, such is the depth of her intellectual prowess. But I figured, I could trust an LLM to interpret her publications (having loaded and vectorised every one into LanceDB) and provide me with a condensed summary.
Alas not. No matter how strict I made the prompt, I could not consistently get the GPT model to simply report on the information that was provided to it, without, at least adding its own contradictory conclusions or "controversial" labels, such was the strength of its dogmatic training.
Ultimately, I did succeed but only by downgrading the model (from gpt-4o to 3.5-turbo), turning the temperature to 03 and prompting it to paraphrase verbatim, rather than use any reasoning skills to regenerate a report.
This was the final prompt:
Create a comprehensive document about {question} by combining ALL the provided sources.
Instructions:
- Go through each source document one by one
- Paraphrase EVERYTHING from each source verbatim - include ALL details, data, quotes, studies, numbers, names, dates
- Do not summarize or condense - include the full content of each source
- String all the source content together into one flowing document
- Do not add any analysis, interpretation, or external knowledge
- Include every single detail mentioned in the sources
And this was the final result:
The COVID-19 vaccines are different from conventional vaccines as they inject genetic instructions into human beings to produce the most toxic component of the illness. Lipid nanoparticle vehicles can potentially bypass the blood-brain barrier and deliver genetic instructions to every cell in the body, hijacking the body's cells to produce foreign proteins that overwhelm the immune system. The media has been silent about this new medical technology.
A recent preprint in Japan describes excess mortality and a cancer wave following the administration of mRNA vaccines to 123 million people, with 13% receiving 7 doses. The temporal association between vaccination and cancer raises questions about causation, which can be evaluated using the Bradford Hill criteria. Figures showing excess mortality in different dose cohorts suggest a dose-response relationship, supporting the criteria.
Biological mechanisms such as DNA contamination and the impact of spike proteins on P53 tumor suppression are potential factors in the association between vaccination and cancer. Studies have shown that SV40 enhancers in Pfizer doses can bind to P53 and integrate into cells, potentially leading to oncogenic risks. Plasmid-based gene therapies have been linked to hematologic cancers in patients, highlighting the need for further investigation into the safety of mRNA vaccines.
Regulatory agencies like Health Canada have received reports from Pfizer assuring the safety of SV40 in vaccines, but these claims are contradicted by scientific literature. Studies have shown that SV40 plasmids can infect and spread in humans, raising concerns about the oncogenic potential of DNA in vaccines. Regulatory documents cite outdated and irrelevant studies to support their claims, ignoring evidence of DNA integration and oncogenic risks associated with plasmid-based gene therapies.
Overall, the evidence suggests a potential link between COVID-19 vaccines and cancer, highlighting the need for further research and scrutiny of regulatory practices. The manipulation of data and misrepresentation of scientific literature by regulatory agencies raise concerns about the safety and efficacy of mRNA vaccines. Further investigation into the integration of DNA from vaccines and its potential impact on health is warranted.
Well, that looks a lot like evidence to me. What about you?
Which ever version you prefer, I imagine depends largely on your worldview, but objectively, I hope it is apparent that one is decidedly more "evidence-based" than the other? Yes, the "controversial" one.
When it comes to matters of public health and other public policies, we at least deserve access to both sides of the debate. If you rely only on the off-the-shelf GPT models, you are only getting one side of any story, the one sponsored by the very same big corporations that ChatGPT kindly identified for us.
I’m thinking about releasing my model in the form of a Substack news curator. Let me know in the comments if you think this would be a good product (especially one you might pay for as AI tokens can soon rack up!).
garbage in, garbage out
Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model (LLM), so it references an authoritative knowledge base outside of its training data sources before generating a response.
The temperature parameter in AI models, particularly large language models (LLMs), is a hyperparameter that controls the randomness and creativity of the model's output. It influences the probability distribution of the model's next-token predictions, affecting how deterministic or varied the generated text will be.
Temperature typically ranges from 0 to 1, although some models may allow values beyond this range. A lower temperature (closer to 0) results in more predictable and conservative outputs, while a higher temperature (closer to 1) allows for more creativity and diversity in the responses.



I like your approach, Joel. What would be the cost of your reports or your service? How does it work? Would you give us these AI generated summaries on subjects that you choose for us, or would you allow us to submit topics or questions which your AI would then summarize for us at different levels?
"Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them." - Dune