Personal Science Week - 250227 Deep Research
Challenging each of the new "Deep Research" LLMs with a personal science question
It’s becoming harder and harder to stay ahead of the latest developments in Large Language Models and the ways AI is changing everything.
This week I tried each of the new “Deep Research” AI models on a personal science question. What did I learn?
AI is Improving Rapidly
Back in the old days, a typical, informative report for a blog or Substack post would mean (1) come up with an idea or two, and (2) research for several hours, pouring through existing publications and perhaps contacting an expert or others for feedback, and finally (3) write it up in succinct and readable prose.
Now for virtually any topic, the only hard part is (1) thinking of the idea in the first place.
In PSWeek 250213 I described a recent family health scare and a new urgency for me to understand the true risks of my high cholesterol numbers. Since then I’ve been throwing the question at all the LLMs: ChatGPT, Claude, Perplexity and now Grok3.
When we first started to write about LLMs back in PSWeek 230316, the big problem was hallucinations, the tendency for chatbots to make stuff up while appearing to sound confident and authoritative. That’s still a concern, but for practical purposes it matters far less now: first, I can give the results of one query to a different LLM and ask it to “double-check”, and second, these LLMs are becoming quite good at providing references and sources that let you verify their conclusions. Although you still have to be wary, the natural skepticism of any personal scientist is almost always enough to safeguard against the most egregious mistakes.
A big innovation in the past few months has been the so-called “Deep Research” features now available in the top paid LLMs. Your question gets a much more thorough investigation, as the AI model considers a large space of options, carefully evaluating and then winnowing down the results to a remarkably thorough report, often 10+ pages of heavily-footnoted conclusions and counter-arguments. The quality is easily as good as what you’d get from, say, a bright college student or even a reasonably experienced analyst. I have a hard time telling the difference between the Deep Research reports I get from Perplexity and something I’d see in The Harvard Health Letter or even a literature review published in a top medical journal. Sure, a professional with deep understanding of the field will spot missing or misleading items, but for what I need as a personal scientist, the results are much better than what I could do alone (at least, without months of work).
Remember, I want a very personal (customized) answer. Don’t give me population statistics. I’m not a statistic: I’m me. I want the LLM to look over all of my personal circumstances (blood test results, family history, genetic info, etc.) and reason—based on whatever it can pull from any similar studies out there—to give a precise report made just for me. The big difference between LLMs and Google is that ability to hone in on exactly what matters to me.
Each of the top deep research engines has different methods of interaction. OpenAI’s Deep Research wanted some clarifications before starting the reasoning process. Claude lets me build upon a previous project, where I’d already provided it some background information. Perplexity seems most obsessed with providing exact sources for its responses. All of them let me ask follow-up questions, which is where I usually got the most useful takeaways.
Asking about CVD Risk
The results are so long and detailed that it’s hard to provide a useful summary, but here’s a taste starting with my initial prompt:
I'm a cardiology researcher with advanced degrees in biochemistry and medicine. I need to understand precisely the relationship between LDLC, Apob, and heart disease, especially in men over 60 with no history of CVD. I'd like to start with accurate information about the incidence of CVD in that age group, and the average levels of LDLC and ApoB among people over 60 who have their first heart attack. I'm looking for absolute numbers as much as possible, not relative numbers
I say “cardiology researcher” to prevent the model from “dumbing down” its output to layman language. I want the nitty gritty details.
After a few minutes’ thought, each model gave back pages of heavily-footnoted and accurately-sourced results. Here are some highlights (click the links for the full results)
OpenAI Deep Research (ChatGPT):
What did I learn?
Ultimately I have a simple question: how worried should I be about a heart attack and what can I do about it? I can Google this or read reams of information from trusted websites or peer reviewed journals, but none of it matters if it doesn’t take into account my specific situation.
Although the various LLMs have slightly different perspectives and sometimes different specific numbers, they generally agree.
Bottom line: whereas people like me with average or low lipid levels have a roughly 1.5% chance of a heart attack per year, my risk is probably closer to 5%. Over ten years, that’s like 50%! (vs. under 20% for “normal” people).
Ouch!
Next I guess I’ll be asking these Deep Research models for suggestions about what to do.
AI and the increased advantage of ‘what’ over ‘how’
Speaking of AI and its ability to generate such detailed reports, at some point I need to justify why I bother doing these PSWeek posts. Why not just have the AI do it all?
That’s a good question, but I think it misses the point of what AI is really for, which I’ve been describing as the difference between “what” and “how”.
Until the arrival of generative AI, you had an advantage over other people if you had better “how” skills—you knew “how” to write a compelling Powerpoint or a spreadsheet or a C program, or maybe you knew “how” to recommend stocks based on their P/E ratios. These are skills that you can learn and apply on behalf of other people—the people who know “what” to do.
Your boss knows “what” to do, and he needs people who know “how” to do it. He’ll hire the best people he can find at doing the “how”. Right now, you happen to be one that he picked.
But now AI can do the “how”, often better than anyone. Who cares if you’re the best C programmer or stock analyst? Now what matters is somebody who knows what to program or what stock to buy.
A doctor whose only skill is knowing “how” to cure a patient’s medical problem is rapidly becoming unnecessary. There are important soft skills of course (e.g. empathy) but that can be done by a nurse or assistant using the AI. In fact, it’s already possible for the patient himself to do the “how” (diagnosis).
If you had an army of AI bots that know “how” to report the news, then “what” you could do is maybe start your own newspaper.
If you knew “how” to write any app or game you want, then the problem is “what” game to write.
So how do you become better at the “what”?
I’m doing my own deep dive into personal science in an attempt to answer that question. ChatGPT knows “how” to do just about any self-tracking analysis I want. So “what” do I want?
That’s my answer to why I continue to write these weekly posts. Sure, the AI could do it for me, but the value is in knowing what to write and what to do about it.
Personal Science Weekly Readings
Sign up here for a new Stanford study on wearables:
𝗖𝗼𝗺𝗲 𝗧𝗲𝘀𝘁 𝗮 𝗡𝗲𝘄 𝗶𝗣𝗵𝗼𝗻𝗲 𝗔𝗽𝗽 𝘁𝗼 𝗠𝗼𝘁𝗶𝘃𝗮𝘁𝗲 𝗣𝗵𝘆𝘀𝗶𝗰𝗮𝗹 𝗔𝗰𝘁𝗶𝘃𝗶𝘁𝘆!
We are researchers at Stanford that are studying emerging technologies to support physical activity. We are testing a new and unique mobile app to help motivate you to be physically active.
We’ve mentioned plastics a number of times , and pointed you to Trevor Klee, who has been studying the issue of plastic contamination for a while. Now he proposes a solution: a “food-grade oat fiber designed to remove harmful plasticizers from your digestive tract”. He’s looking for volunteers to test it: Sign up here.
Eric Gilliam writes A Report on Scientific Branch-Creation: How the Rockefeller Foundation helped bootstrap the field of molecular biology: how one man, Warren Weaver, was responsible for funding 15 out of 18 Nobel Prizes in molecular biology.
About Personal Science
Professionals scientists get paid to do science. But you don’t need fancy credentials or a job title to do science. Personal science is about using the tools of science—experimentation, logical reasoning, quantified data collection—for everyday questions and problems. Because we don’t have to fill out funding requests or beg donors for money, we can work on whatever interests us most. And because the questions we study are directly relevant to our own lives, our answers are often more useful.
We publish each Thursday. Let us know if you have other topics you’d like to discuss.