This is the long version of how we rebuilt dsti.school — and why a small, selective engineering school on the French Riviera ended up writing its own static-site build chain, its own schema.org validator, and its own edge-geolocation engine. The short version is one sentence: most of what looks like marketing is actually engineering, and most of the engineering only makes sense once you know what it is for. So this post does both — the systems, in full, and the reasons underneath them.
Two efforts run in parallel through the story. One is eighteen months building a global partner network, on the road, one institution at a time. The other is three weeks rebuilding the website from scratch. They sound unrelated. They are the same project: a sustained attempt to earn a family's trust in a school they have often never heard of, in a country that is rarely their first instinct. The website is the part of that effort that runs while everyone sleeps, so it had better be good.
01The reason it exists: a network, not a home market
Before any code: the website was rebuilt because of where our students actually come from. Over the last eighteen months DSTI's partner-counsellor network grew from a couple of dozen relationships to 92 approved partners across 78 countries and territories. That growth happened in person — fairs, campus visits, long conversations — and it changed what the website has to do. It is no longer a brochure for people who already know us. It is, increasingly, the first serious encounter a prospective student in Lagos, La Paz, Hanoi or Bogotá has with the school, on a phone, on an uneven connection, in a language that may not be their first.
That single fact sets nearly every technical constraint that follows: it has to be fast on mobile, legible to machines as well as people, available in more than one language as a first-class citizen, and impossible to ship broken, because there is rarely anyone awake to catch it. None of those are aesthetic choices. They are consequences of who is on the other end.
02The bet: craft as the only equaliser
We are small. We do not have a famous name or an enormous budget, and we are not going to acquire either this quarter. What a small team can control is the quality of the thing it actually ships. So the bet behind the rebuild was deliberately old-fashioned: out-engineer the problem. Make the site faster, cleaner, more correct and more machine-legible than schools a hundred times our size, because that is a competition decided by care and discipline rather than by spend.
That is also why the rebuild created a flywheel rather than a one-off redesign. Every quality property we wanted — speed, validity, machine-readability — got turned into something the build can check on every release, so the standard holds without anyone remembering to hold it.
03Speed, because milliseconds recruit
A prospective student on a patchy connection at altitude does not wait for a slow page; they leave. So we measured ourselves on mobile, where it actually matters, against the most famous engineering schools in the world — not to claim we teach better, but to establish a technical order of magnitude.
A single fast snapshot is luck; a fast site is engineering. The three numbers Google's Core Web Vitals actually use — how fast the main content appears, how quickly the page responds to a tap, and how much it visually shifts while loading — all sit inside the “good” band, and the build holds that line on every release. Pre-rendered HTML on S3 behind a CDN means there is no per-request rendering at all; on top of that sit AVIF imagery, targeted preload hints, a deliberately light critical path, and one optimisation worth singling out because it sounds trivial and is not.
The V308 follow-through: remove avoidable work before changing the architecture
The first release after the rebuild still exposed one intermittent Lighthouse warning in our own runtime: a scroll handler could write the reading-progress bar and then ask the browser for fresh section geometry in the same turn. V308 reorganised that path around one requestAnimationFrame: sticky-header height and section positions are cached, geometry is read once, and progress or active-state classes are written only afterwards. The same release moved linked stylesheets ahead of substantial JSON-LD blocks across the maintained pages, generated pages and collateral build paths, while preserving stylesheet order. It was deliberately the conservative option — no critical-CSS split, no asynchronous stylesheet trick and no new cascade to prove.
The first post-deployment mobile lab run reflected the change without pretending that one run is field data: First Contentful Paint held at 0.9 s; Largest Contentful Paint moved from 2.9 s to 2.7 s; Total Blocking Time fell from 190 ms to 30 ms; Speed Index moved from 1.1 s to 0.9 s; and Cumulative Layout Shift remained 0. The maximum measured critical path shortened from 159 ms to 98 ms. Lighthouse still correctly reports one render-blocking request — the page's pruned 19.5 KiB stylesheet — because we chose a predictable first render over removing a warning at any cost.
One stylesheet per page — and why that is hard
The usual way to build a site is one big shared stylesheet, which means every page downloads, parses and applies thousands of rules it will never use. We do the opposite: each page ships a stylesheet pruned to the rules that page actually needs. Less to send, less for the browser to work through before it can paint. The catch is that you cannot prune by what is in the page's HTML.
The naïve version — “remove any rule whose selector never appears in this page” — quietly breaks the site, because the classes that matter most are not in the saved HTML at all. JavaScript adds them at runtime: a menu becomes open when tapped, a header turns sticky and scrolled as you move, a dialog goes active, the cookie banner toggles consent, an item becomes selected or loading. Strip those “unused” rules and every interaction loses its styling. So the optimiser keeps a deliberately conservative allow-list of these dynamic-state markers and never removes them. Three more things make it fiddly: selectors do not read left-to-right (:is(), :where(), :has() nest other selectors, so matching is recursive); at-rules split two ways (@media/@container wrap rules and must be walked into, while @font-face/@keyframes are referenced by name and kept opaque); and some modern CSS we genuinely use — container queries, with container-type and cqi units — is new enough that the HTML/CSS validator itself reports it as a property that “doesn't exist,” so the pipeline carries those as reviewed, known false-positives. Because deleting CSS is dangerous, every pruned page is then re-checked against the original by a separate verifier before it can ship.
04Reach: three first-class languages
The site grew up bilingual — English and French. Spanish (es-MX) is the newest addition, and the reasoning is worth stating because it is a business decision, not a translation exercise. Spanish is the one language that unlocks an entire continent: one edition opens Latin America (minus Brazil) all at once, where Asia's many languages would each open only a sliver. So Spanish earns its place not because it is the “second business language” in the abstract, but because of its unusual reach-per-edition.
hreflang so search engines serve the right one, and URLs are never machine-translated. (More on the machine-readable twins next chapter.)Crucially, the locales are not a primary site with bolt-on translations. Each is a first-class citizen produced from one source, with proper hreflang reciprocity and an x-default, and a strict do-not-translate list for programme names, trademarks, certifier and campus names, and study-mode names — the things that must stay identical across every language to remain findable and correct.
Translation governance became part of the build's memory
The French translation of the longer Team DSTI story made an important distinction visible: a good translation rulebook needs both normative rules and the evidence that produced them. By V308 the handover therefore carried one maintained translation authority, plus separate French and Mexican-Spanish provenance sidecars. The authority says what must be preserved — identifiers, URLs, official programme and study-mode names, first-person agency, markup and machine-readable meaning. The sidecars record what natural public language actually looks like in each locale: neutral international French with vous; professional Mexican Spanish with tú; different numeral conventions; recurring terminology; false-friend and Spain-variant checks.
Those files are not loose notes. They are mandatory members of the curated next-model package, included in its reading order, checksums, manifests and structure report. That means a future translator — human or model — receives not only the rule, but its provenance and the QA vocabulary needed to test the result. Localisation became another reproducible engineering surface rather than a memory held by whoever translated the last page.
05Legible to humans and to machines
For the first time in the web's history, two kinds of reader matter: people, and the AI systems that increasingly answer their questions. A school that wants to be discovered now has to be legible to both, so every page carries more than what you see.
llms.txt hand AI systems a clean, unambiguous copy. We're not guessing how to be cited — we're laying it out.The visible HTML is for people and crawlers; schema.org JSON-LD tells machines exactly what a programme, fee or date is; a faithful Markdown twin of each page and an llms.txt index hand AI systems a clean, unambiguous copy. When a future student asks an assistant “where can I study data engineering in France, in English?”, the systems that answer are reading structured data and clean text, not squinting at a pretty page. Most institutions have not published machine-readable versions of themselves. We have.
The schema.org gate is ours, and it does not cheat
Two details are worth being explicit about. First, that schema.org validator is in-house. The vocabulary is published and there is an interactive checker on the web, but nothing you can run unattended in a build the way you can with the HTML validator — so we wrote a purpose-built checker that fetches the official schema.org vocabulary and reproduces, page by page, what the online Schema Markup Validator would show, with no human in the loop. Second, and more important: we never chased a vanity “zero errors, zero warnings” by deleting markup. schema.org is a deliberately open vocabulary, so a property it does not formally declare is a warning, not an error — and the lazy way to silence those is to strip the offending fields, which throws away meaning. Instead, errors block the release outright while warnings are reviewed and kept wherever the markup is genuinely correct and useful. We validate the structure without amputating it.
llms.txt. The DSTI homepage passes all three: a clean 3/3. Google still says it does not use llms.txt for Search, and the category is openly “under development,” so this is no ranking trophy — just the clearest sign yet that building for machine readers before we had to is the direction the platforms are taking. We did not predict that audit. We were simply already passing it.06Built not to break
Static does not mean fragile; done properly it means the opposite. The whole site is pre-rendered HTML in object storage, locked so it can only be served through the CDN, with a web application firewall and managed DDoS protection at the edge and DNS in front. There is no application server to fall over at 2 a.m., because there is no application server.
The one genuinely tricky migration was URL behaviour. The previous setup relied on a web server's try_files logic to map clean, extensionless URLs to files; the new edge needed to replicate that with a small CDN function, and to handle the quirk that a locked origin returns a different status for a missing object than people expect. Getting that rewrite logic exactly right — so that every old link resolves and nothing 404s by accident — was most of the three weeks' genuinely fiddly work, alongside a redirect map from the old site's sitemaps to the new structure.
07Provably correct: nine gates and a refusal to ship
Here is the part that surprises people most. The system refuses to publish a release that fails its own checks. Not “be careful when you deploy” — a machine that will not let a broken or unvalidated page go live. At its heart, publishing is a finite-state machine.
Nothing partial ever ships: a failed gate returns the system to a retained candidate, and a release can be rolled back to a tagged prior version. The blocking gates, in sequence, are: build (esbuild parses and minifies every external .js and .css, while the HTML minifier runs with inline-JavaScript minification forced off so scripts are never rewritten) → VNU HTML and CSS (output must be empty) → inline-JavaScript validity (every executable <script>, on* handler and javascript: URL must parse) → inline-JavaScript byte-exact preservation across minification → conservative per-page CSS optimisation (only allow-listed rewrites, each page proven equivalent) → JSON-LD / schema.org validity → byte-identical rebuild proof → cross-source version authority (five documents must agree on the active version) → publish, with a do-not-deploy marker and version tagging for rollback.
# publish() — simplified; every step is a hard gate def publish(candidate): stage(candidate) # into a verified temp tree for gate in [ build, # esbuild parses/minifies every .js and .css; inline JS untouched vnu_html, vnu_css, # W3C/VNU must print nothing — HTML and CSS alike inline_js_valid, # every inline script, on-handler, javascript: URL must parse inline_js_preserve, # inline JS byte-identical, before vs after minification css_allowlist, # per-page CSS pruned; only allow-listed rewrites, proven equivalent schemaorg, # JSON-LD validates against the live vocabulary byte_identical, # published == validated source, to the byte version_authority]: # five documents agree on the active version if not gate(candidate).ok: mark("DO_NOT_DEPLOY"); keep(candidate) # retained for retry; nothing ships return Blocked aws_publish(candidate); tag_version(candidate) # transactional; rollback enabled return Published
A gate you run on every release has to be quick, so we accelerated the HTML and CSS validation dramatically — and, crucially, proved the fast path returns the exact same verdicts as the trusted one, even on deliberately broken fixtures and awkward Unicode paths. Speed without that proof would be cheating; speed with it is just good engineering.
08The whole machine, end to end
Before the “how,” here is the whole thing on one page — every language and tool in the chain that turns a spreadsheet row into a page on the edge. It is deliberately polyglot and vendored: the Python core has zero third-party dependencies, and the heavier tools travel inside the project, pinned, for both Windows and macOS, so any machine can reproduce a build with no global installs.
One Python conductor with no third-party dependencies drives the toolchain — a .NET 8 content-inventory tool, an in-house Python schema.org checker, the Java-based Nu Html Checker (which validates both HTML and CSS), a Node minifier flanked in the same build stage by an inline-JavaScript validator and a conservative per-page CSS optimiser, and an XML/XSD geolocation engine. The output is a fingerprinted static site — every byte accounted for by checksums and a per-file manifest — served from object storage through the CDN. The two legacy pieces still in migration are a WordPress install that contributes images at build time (not at runtime) and a single stateful application form.
A content engine for everyone, not just engineers
The point of all this discipline is that non-engineers can contribute safely. The editing surface is a spreadsheet; a content-inventory tool reads it; the Python orchestrator drives every step; and the gates above mean a colleague editing a fee or a date cannot accidentally ship something invalid. The loop is closed and self-checking.
09Dynamic where it counts
A static site does not have to be inert. Where it genuinely helps a visitor, the page adapts — but through a small, validated rules engine rather than ad-hoc scripting. Region-aware routing (for example, sending a South-Asian visitor to the right regional representative) is expressed as an XML ruleset validated against its own XSD before it is allowed to run. The same reflex appears everywhere in the build: the deploy parameters themselves are validated against a schema before any command can act on them, so a malformed deploy target is caught at rest rather than mid-deploy. Verify, then proceed — at the edge as much as in the pipeline.
10Built with AI — and deliberately portable
A lot of this was built with AI assistance, but not in the throwaway, paste-from-a-chatbot way. The principle is simple: the project's memory lives in the project, not in any one conversation or any one model. Every important decision is written into versioned continuity files that travel with the repository — one for content (reference version, translation rules, UI/UX invariants, URLs, media, non-regression decisions), one for tooling (validations, AWS procedures, known incidents). A new collaborator, human or model, starts from those files and the self-checking package, not from someone's chat history.
That is not theoretical: the handover has been exercised across different models, with a second model independently reconstructing the active version and verifying every checksum before proposing a change. By V308 the compact handover was a strict curated allowlist rather than a convenient folder dump: maintained source and output, active ContentInventory, translation authority and locale evidence, tool source, generated package structure, per-file manifest and an independently verified checksum baseline. Tying a school's website to one AI vendor's memory would just be a new kind of lock-in and a new single point of failure — so we designed it out. And picking it up should not cost a colleague a week of setup: a single command checks for the tools the build needs, installs the vendored pieces, and refuses to call itself “ready” until the project's parameters validate against their schema.
11Why the world, and not the home market
It is fair to ask why a French school pours so much into the world and comparatively little into the market on its doorstep. The honest answer is in the nature of what we built. DSTI is an institution designed, from day one, for the international market: a deliberately mixed faculty of 50+ academics and practitioners; a Direction of Studies that is, in truth, an operations-and-quality-control function of a kind France barely recognises as a discipline, let alone values; hyper-flexible study modes; merit-based scholarships; demanding programmes and assessment; professional certifications; and instruction entirely in English. None of that is shaped for a local catchment.
And for structural reasons — not a question of quality — the French undergraduate and postgraduate markets are largely insensitive to exactly those things. So we do the most French thing imaginable: we export. France is one of the world's great export economies; so are we. We take the best of what France and DSTI have to offer out into the world, and bring quality students in to make the most of it.
12An honest word: investing through a hard year
Candour, because it is owed. The higher-education market is tough right now, for everyone. We are weathering it better than most, but it would be dishonest to dress it up: student numbers have held steady rather than grown since 2024. In that climate, holding the line is more than many institutions can say — but it is not growth, and everything in this rebuild is a bet that growth returns.
The easy move in a hard year is to cut and wait. We did the opposite — investing in programmes, in people, in faculty, and in international exposure: the partner network, the website, the collaterals. That was possible thanks to the patient backing of our President and shareholders (all private individuals, all engineers and scientists) — support we do not take for granted.
13What all of it is actually for
The network across 78 countries, the fast pages, the clean markup, the provable releases, the three languages — none of it is the point on its own. The point is what it is for: that a young person, somewhere we have not been yet, with talent and not much else, has a real shot at a serious education, a good profession and a durable career. We are small and young, without a famous name or a vast budget. What we have is the decision to do the best possible job at the highest quality we can reach — and, increasingly, the engineering to prove we did.
The honest summary
A small school out-engineered a problem it could not out-spend: a dependency-free, fully validated, AI-portable static-site build serving three languages, fast on a phone, legible to machines, and impossible to ship broken — all in service of earning, one family at a time, the trust to teach. That is the whole post, and the whole point.
Figures are produced from the real build; some technical detail is simplified for a general engineering audience. Performance figures are laboratory measurements from the stated runs, not a substitute for field data.