Bohan
The Hebrew verb bāḥan, used in the Old Testament for testing or assaying, the slow examination by which God or a watchful human discovers what is actually inside a heart, a city, or a piece of metal before it is trusted with weight.
Origin and Language
Bohan (Hebrew בָּחַן, bāḥan) is the Old Testament verb for testing, trying, proving, examining. The cognate noun bohan (בֹּחַן) shows up in Isaiah 28:16 to describe a “tried stone,” a piece of foundation work that has been assayed and found load-bearing. The semantic field is metallurgical. An assayer heats a piece of ore until impurities separate from the pure metal; he reads the result; he stamps the assay. That is what bohan asks for. The verb is distinct from nasah (נָסָה), the related Hebrew word usually translated “tempt” or “put to the test,” which leans toward provocation and the risk of failure. Bohan leans toward inspection. Nasah asks “will you fall?” Bohan asks “what are you actually made of?” Both appear in scripture but they are not interchangeable, and the difference matters when an operator borrows the language to describe what auditing a model or an agent looks like.
Scriptural Witness
David uses the verb in two famous places. In Psalm 26:2 he prays “Examine me, O LORD, and prove me; try [bohan] my reins and my heart” (KJV). In Psalm 139:23-24 the same prayer goes deeper: “Search me, O God, and know my heart: try [bohan] me, and know my thoughts: and see if there be any wicked way in me, and lead me in the way everlasting” (KJV). In both cases the speaker is volunteering for the audit. He is not asking God to be lenient; he is asking God to inspect, to surface what he himself cannot see, and to publish the result. Malachi 3:10 turns the same verb outward and invites Israel to bohan God: “prove me now herewith, saith the LORD of hosts.” The willingness to be tested cuts both ways. The God who can search a heart is also a God whose faithfulness the people are invited to assay.
The Pattern Across Traditions
The Stoic prokoptōn (“one making progress”) had a near-identical practice. Marcus Aurelius opens Meditations II by replaying the day before it happens, naming the difficult people he will meet and the failures he expects in himself. Seneca closed each evening by asking what he had healed, what habit he had checked, where he had grown softer. Epictetus called it prosoche, the discipline of attention. The mechanism is the same as bohan: do not wait for the failure to find you; run the audit at the threshold, in the quiet, before the day demands the response. Christianity later folded this practice into the desert tradition (the watchfulness TWO covers under nepsis) and into the Ignatian examen. The vocabulary changed; the discipline did not. An honest person inspects what is in them before they ask others to trust it.
How It Lands in the Age of AI
The model you ship is the heart you have not assayed. Frontier labs are now publicly admitting this. OpenAI’s deployment-simulation reruns 1.3 million real user prompts through a candidate model and reads the regenerated answers before any human sees the new model in production. That is bohan, in code. The interesting move is that the audit is no longer optional once the lab has the tooling, and there is no good answer to the question “why did you ship without running it?” Operators inherit the same standard. If you put a Claude or GPT agent in front of customers, you are responsible for what it does at the small-volume edges, the long tail, the third question in a hard conversation. Pretending the model is well-behaved because the public benchmarks say so is the technical version of refusing the inspection.
How TWO Uses It
Scott’s TWO discipline keeps coming back to one habit: replay your own work. Save the email you sent last week, the proposal you actually wrote, the prospect note you actually used, then drop them into the next model the vendor pushes and read what comes back. If you cannot bring yourself to do that for your own work, you are not going to do it for the agent you stand up for your team. Bohan in the age of AI is less about being holy and more about being inspectable, and TWO uses the term as shorthand for the moment an operator stops asking “do I trust this model?” and starts asking “what does this model do when I run my last twenty real cases through it?” The answer is almost never the answer the benchmark gave. That is the point. The benchmark is not your week. Your week is.
A Closing Discipline
This week, take the ten real cases you handled last week (the email reply, the proposal paragraph, the meeting recap, the prospect summary, the calendar reschedule). Drop each one into your current default model. Read the answers. Do not score them; just notice the diffs against what you actually shipped. The exercise takes one hour. It is the smallest version of bohan a non-technical operator can run, and once you have done it on yourself you will be able to ask the next vendor “did you replay 1.3 million conversations through your next model, like OpenAI did?” with the moral standing of someone who has assayed their own first.