CV
For a downloadable version, get in touch.
T. Michael Cornelia
Athens, GA · LinkedIn · Contact
Data center operations Director responsible for ~24% of Meta’s GPU fleet and power capacity. 13 years scaling Meta’s fleet from megawatts to gigawatts; deep operational ownership of GPU training and inference platforms including Zion, Grand Teton, SMC, MTIA, and NVIDIA GB200/GB300 GenAI clusters. Founded and led Meta’s data center AI Operations program; drove a 49% reduction in unplanned downtime and interruptions on training infrastructure and 70% on inference infrastructure in year one. Engineer → Director: engineer’s instincts paired with director-level strategic range. Known for building durable, high-performing teams — people follow my leadership across roles, locations, and reorgs. Promoted multiple FTEs into Site Manager and Director-track roles.
In Practice
- Daily user of Claude Code; built and maintain two primary custom agents: EDI (always-on Linux-based Chief-of-Staff integrated with Meta’s data and tooling) and Glyph (OpenClaw messenger agent via mobile) that manages and directs a team of other specialized agents
- Built a daily operational telemetry layer on the Anthropic API: scheduled pulls from production data sources, agents that flag SLA/KPI misses (red/yellow/green), and real-time SEV alerts piped to my workflow
- Use Claude Code daily for executive workflows: morning briefs, weekly metric reviews, project knowledge graphs, work-product synthesis, and team-level rollups
Selected Impact
- Regional growth: Scaled South region 754 MW → 1,070 MW (+42%) in 18 months
- Reliability: Held South region unavailability below targets on training and inference platforms through a 6x scope increase (2025)
- AI Operations program: Led a 49% reduction in training downtime/interruptions, 70% reduction in inference downtime, 180% improvement in diagnostic accuracy (2022)
- People: Top-quartile manager engagement scores (2025); promoted 3 FTEs to Site Manager (2024-25)
Experience
Director, Global Operations, South Region
Direct all site operations for Meta’s South region: ~24% of the global GPU fleet and total megawatts. Scaled the region 754 MW → 1,070 MW in 18 months (+42%) including a 500+ MW generative AI training cluster; on path to ~2,500 MW by 2027. Currently supporting the turn-up of Hyperion in Louisiana: Meta’s next-generation campus, the size of Manhattan and 5 GW at full build-out.
AI & GPU Infrastructure Operations
- Operate GPU training and inference platforms across the South region; established the AI operations knowledge base, dashboards, and weekly leadership cadence adopted as the source of truth across the data center org
- Sponsored enterprise-wide quality metrics overhaul: multi-tier composite scoring framework, retired legacy metrics; introduced LLM-based quality assessment of repair tickets, adopted globally
- Launched cloud operations pilot across multiple public cloud providers (OCI, GCP), defining the support model for heterogeneous cloud infrastructure
New Region Turn-Up & Commissioning
- Delivered Meta’s first all-Turin server region on schedule, proving an accelerated capacity program targeting 80% reduction in fulfillment time
- Positioned 2,063 racks at rates up to 750 racks/week, single-day record of 200 racks
- Manage continuous turn-up pipeline across liquid-cooled facilities, rapid deployment structures, and leased data centers; 10+ concurrent new builds delivering 150–330 MW per quarter
Organizational Strategy & Workforce Planning
- Authored and globally deployed the Leadership Deployment Model: 9-month effort with HR and Legal enabling site operations to support the planned 12+ GW fleet without adding regional leadership headcount
- Led convergence of facilities and site operations risk frameworks into a unified Data Center Operations metrics dashboard, the first shared operational data layer between the two orgs
- Co-authored ring-aligned restructuring proposal that shifted operations from per-site to infrastructure-ring-based model; 34% management reduction while scaling to gigawatt-class campuses
Team & Talent
- Built and retained a team of senior operations leaders with multi-year tenure under my leadership; multiple managers and FTEs followed me across roles and locations as the South region grew
- Promoted 3 FTEs into Site Manager roles (2024–25); developed pipeline of next-generation operations leadership
- Sustained top-quartile manager engagement scores (2025) through a period of significant org change and rapid growth
- Recognized internally for talent magnetism — recruiters and adjacent orgs routinely ask to “borrow” my model
Capacity Planning & Risk Management
- Created region-level capacity delivery risk framework combining construction risk with operational signals into a unified executive dashboard
- Standardized capacity engineering processes, launched root cause corrective action (RCCA) framework, and established quarterly quality assessments
- Manage region against rack turn-up SLO (P90 < 5 days), redeployment SLO (P95 < 4 days), and decommission targets with per-site tracking
Government Relations & Community
- Represent Meta in state-level legislative advocacy for data center tax incentive preservation
- Graduate of Leadership North Carolina; maintain statewide network of leaders across public and private sectors
- Manage relationships with government officials, economic development authorities, and community organizations across eight states and growing
Director, Site Operations, Stanton Springs, GA
Directed site operations for Meta’s Stanton Springs (Newton County, GA) data center campus, one of the fastest-growing campuses in the fleet.
- Founded Meta’s data center AI Operations program (2022); year one: −49% training interruptions, −70% inference downtime, +180% diagnostic accuracy; established the dashboards, knowledge base, and weekly leadership cadence now used across the data center org
- Delivered the largest single-region capacity increase globally in 2022: 7,005 racks landed and provisioned (32% more than the next closest region), including the fleet’s largest GPU footprint, while maintaining 99.56% server availability
- Co-created the new-hardware introduction process (PVT to MP) originally developed for Grand Teton (OCP H100); still in use today across deployments of AMD Instinct, MTIA, GB200, and GB300
- Co-authored unplanned-downtime alerting strategy; led authoring of the SEV0 incident response plan for Site Managers, institutionalizing operations continuity across data center sites
- Built operational excellence framework (1:1 templates, meeting cadences, analytics SOPs) adopted across all regions globally
- Led insourcing business case analyzing contingent vs. FTE economics across 500+ positions, including staffing models for gigawatt-scale sites through 2030
- Built government and community relationships across Newton County and the state of Georgia
Data Center Operations Manager, Stanton Springs, GA
Managed infrastructure operations teams at Meta’s Stanton Springs campus during a period of rapid expansion and new building commissioning.
- Owned all capacity-related projects across the campus as it went from dirt to provisioning, including new building commissioning and expansion phases
- Directed the turn-up of the largest A100 GPU cluster known to NVIDIA at the time; followed by the largest H100 cluster
- Managed cross-functional coordination across construction, production operations, and headquarters teams for turn-up, turn-down, and retrofit execution
- Developed operational processes and standards subsequently adopted at other sites
- Mentored individual contributors across multiple technical teams
Data Center Operations Manager, Forest City, NC
Led infrastructure teams at Meta’s North Carolina campus during rapid fleet expansion.
- Owned all capacity-related projects across the campus, including commissioning of the site’s third data center building
- Managed cross-functional coordination across construction, production operations, and headquarters teams for turn-up, turn-down, and retrofit execution
- Developed capacity processes adopted fleet-wide; mentored individual contributors across multiple teams
- Built local government and community engagement relationships at city and county levels
Earlier Career
Systems Engineer & Architect, SUM/IT Systems · 2004 – 2013
Designed and deployed systems solutions (Unix, Linux, VMware, Solaris, Windows Server) for SMB customers. Co-created the company’s cloud offering and managed full customer lifecycle from scoping through long-term support.
VP, Operations & Information Systems, The School Box, Inc. · 2000 – 2013
Directed technology operations for a multi-state retailer (~400 employees). Managed all infrastructure, partnered with CEO/CFO on strategy and budgets, and built the organization’s primary technical strategy and knowledge management platform.
Core Competencies
Data Center Operations · Hyperscale Infrastructure · GPU Fleet Operations · AI/ML Training Clusters · New Site Commissioning · Liquid Cooling · Capacity Planning & Delivery · SLO Management · Agentic AI for Operations · Multi-Cloud (OCI, AWS, GCP) · Colocation Management · Workforce Strategy · Organizational Design · Quality Systems · Operational Excellence · Legislative Advocacy · Government Affairs · Community Development
Education & Certifications
Leadership North Carolina, statewide leadership program (graduate)
University of Georgia, Business Management coursework
Red Hat Certified Engineer (RHCE)
Red Hat Certified System Administrator (RHCSA)