MIRACLE

South Korea's post-war transformation is one of the defining growth miracles of the twentieth century — yet rigorous subnational research on how it unfolded has been held back by data that is too coarse (province-level), too infrequent (census every 5–10 years), recorded in mixed Korean and classical Chinese scripts, fragmented across dozens of provincial archives, and scrambled by repeated administrative boundary changes. MIRACLE is assembling, digitising, and harmonising ~2 million pages of municipal records into the first consistent township-year panel for this era.

~3,500 townships
30 annual panels
100+ variables
~2M pages of archives
3+ archival source types
About MIRACLE

MIRACLE is a multi-year effort to build a comprehensive database that traces South Korea's development path at the township level. The goal is to bring all available subnational resources — administrative records, spatial archives, institutional data — into one place, linked by time-consistent identifiers that track townships through two major boundary reorganisations (1963, 1973). A natural starting point is the municipal statistical yearbooks (시군 통계연보), published annually by county governments, which provide the richest and most consistent administrative data for this period.

MIRACLE starts with South Korea's municipal statistical yearbooks, but the ambition extends in two directions. First, within Korea, we plan to incorporate additional administrative sources — expressway construction logs, agricultural extension records, Korea Forest Service archives, colonial-era household registries, and local personnel files — to deepen the panel and enable research designs that link infrastructure, agricultural modernisation, and environmental policy to local institutional conditions.

Second, across countries, the infrastructure we build is designed to accommodate other growth miracle economies with comparable subnational statistical traditions. If similar municipal records exist for Taiwan, or district-level yearbooks for post-war Japan, they belong in the same framework. The goal is a comparative subnational data platform for studying rapid development wherever it has occurred.

  • Municipal Statistical Yearbooks 통계연보 Core — digitising now
    Township-level demographics, agriculture, industry, infrastructure, public finance, and education. Published annually by county governments.
  • Forest Type Maps 산림유형도 Collected
    Korea Forest Service spatial archives. Shapefiles available from 조선임야분포도 (1910) and forest type maps from 1974 onward. Enables studying one of history's largest reforestation programmes.
  • Loan Relation Files 차관관계철 Planned
    Foreign loan records linking firms to locations and financing sources. Geocoding firms and mapping industrial networks.
  • Household Registries 호적부 Planned
    Colonial-era and early-Republic household records. Pre-treatment institutional measures including clan concentration and land ownership.
  • Agricultural Extension Records Planned
    Farm-level adoption of high-yield rice varieties and extension programme participation.
  • Expressway Construction Logs Planned
    Construction timelines and route data for the Gyeongbu Expressway and subsequent motorway network.
  • Personnel Files Planned
    Local government personnel records. Bureaucratic capacity and institutional quality measures.
Data sources

MIRACLE draws on multiple administrative source types. Municipal statistical yearbooks form the backbone; additional archival layers extend the platform to forestry, industrial, and institutional data.

📊
Municipal Statistical Yearbooks 통계연보
Core — digitising now

Published annually by county governments. Township-level data on demographics, agriculture, industry, infrastructure, public finance, and education. Dispersed across provincial archives — never systematically compiled.

🌲
Forest Type Maps 산림유형도
Collected

Korea Forest Service spatial archives. Shapefiles available from 조선임야분포도 (1910) and forest type maps from 1974. Enables studying one of history's largest reforestation programmes.

🏭
Loan Relation Files 차관관계철
Planned

Foreign loan records linking firms to locations and financing sources. Geocoding firms and mapping industrial networks.

🏛️
Household Registries 호적부
Planned

Colonial-era and early-Republic household records. Pre-treatment institutional measures including clan concentration and land ownership.


Pipeline

From archive to analysis-ready panel in six steps:

01

Archive discovery & source identification Done

Systematic survey of provincial archives, university libraries, and government collections to locate surviving yearbook volumes. Mapping what exists, what is missing, and where physical copies are held.

02

Outreach & scanning Done

Building partnerships with municipalities, counties, and provincial archives. Physical scanning of bound volumes into high-resolution page images — the raw input for digitisation.

03

AI-OCR for mixed scripts Current focus

Custom pipeline fine-tuned for mixed Hangul/Hanja archival tables. 87% pilot accuracy, targeting 92–95%. This is what makes the project feasible — these documents were previously unusable at scale.

Structured output — 경지면적현황, 남해군 (1969)
읍면합계논 (답)밭 (전)
소계1모작2모작
남해8,0545,8491,2304,6192,205✓ balanced
이동10,9937,5441,1756,3693,449✓ balanced
삼동12,7857,3491,2396,1105,436✓ balanced
남면11,4706,0128575,1555,458✓ balanced
고현8,3105,6807434,9372,630✓ balanced
창선13,1737,9012,3115,5905,272✓ balanced
All township names correct. Nested headers preserved. Row-level cross-validation passed.
Source: 경지면적현황, 남해군 통계연보 (1969) — mixed Hangul/Hanja table with vertical headers
1農 業 ~22← Page number confusion
22 경 지 면 적 현 황← Vertical text → individual chars
3(단위 :단보)
4구분 합 게 등 게 답 1포작 2포작 전 미 합게← Nested headers flattened
5면별 8,054 5,849 1,230 4,619 2,205 444← Row-column mapping unclear
6남 해 10,993 7,544 1,175 6,369 3,449 890← Numbers may be misaligned
7설 동 11,470 6,012 857 5,155 5,458 841← '삼동' → '설동' misrecognised
8남 9,553 5,891 1,101 4,790 3,662 522← Township name truncated
9저 현 9,564 9,705 550 6,145 2,859 955← '고현' split across lines
10창 13,173 7,901 2,311 5,590 5,272← '창선' → '창' only
⚠ Vertical headers completely failed. Table structure unrecoverable.
Layout parsing failure
Nested headers flattened — column-data mapping lost
Vertical text failure
Vertical Korean split into individual characters
Cell mapping errors
Numbers detached from columns
Same source — context-aware layout parsing + structured output
Step 1: Layout
Step 2: Context OCR
Step 3: Structure
Step 4: Validate
Table regions, header hierarchy, vertical text
'경지면적' context corrects '설동'→'삼동'
Nested headers → hierarchical CSV
Row totals = column totals; cross-ref

See structured output table above.

04

Variable harmonisation Current focus

Definitions, units, and table structures changed across editions and municipalities. We build crosswalks reconciling these into consistent time series.

05

Boundary concordances Pilot complete

Two major reorganisations (1963, 1973) plus dozens of smaller changes. We construct time-consistent miracle_id identifiers.

06

Geocoding & GIS Pilot complete

Every township linked to satellite, elevation, slope, soil, and transport network data. 196 Namhae-gun villages fully geocoded.


Output

The dataset is organised into modules by domain, each a flat township-year panel. Merge across modules using Core Keys. CSV & Stata formats, with full codebook and variable documentation.

miracle_idyearprovmunitwppophhpaddy_haschoolsroad_km
KR-48-840-0101970경남남해군남해읍28,4125,6801,245723.4
KR-48-840-0101975경남남해군남해읍25,8915,3201,198831.7
KR-48-840-0101980경남남해군남해읍22,1055,0101,152838.2
KR-47-720-0301970경북영주시풍기읍31,5506,1401,870918.6
Illustrative example — pilot data release late 2026.
ModuleDescriptionETA
Core Keys
miracle_id · province · municipality · township · concordances
Geographic identifiers and boundary concordances across the 1963/1973 reorganisations.2026
Demographics
population · households · age structure
Population counts, household numbers, demographic composition.2026
Agriculture
paddy area · crop output · livestock
Cultivated area, output (harmonised to metric units), livestock.2026
Industry
establishments · employment · output
Industrial establishments, manufacturing employment, sectoral output.2027
Infrastructure
roads · electricity · water · telecom
Road length, electrification, public utilities.2027
Public Finance
revenues · expenditures · transfers
Municipal revenue/expenditure, central transfers, fiscal capacity.2027
Education
schools · enrolment · teachers
School counts, enrolment, teachers, educational infrastructure.2027
Geospatial
shapefiles · centroids · boundaries
GIS boundary files with consistent township geometries.2027
Institutions
clan concentration · bureaucratic capacity
Pre-treatment institutional measures from 1930 registries and personnel files.2028
📊
Public data explorer — interactive dashboard for browsing county-level data, in development. Preview →
Pilot release: late 2026. Gyeongbu Expressway corridor (~400 townships). Core Keys, Demographics, and Agriculture modules. CSV & Stata formats. Request early access.
Research using MIRACLE

Work in progress and working papers using MIRACLE data:

If you are using or interested in using MIRACLE data, we would like to hear from you. Get in touch.

Current status

Digitisation proceeds province by province, constrained by the uneven survival of physical yearbooks across Korea's provincial archives. Hover over each province for details on coverage, year range, and scanning status.

경기 강원 충북 충남 전북 전남 경북 경남 제주 서울 부산 남해군 pilot 196 villages geocoded Hover for details · Based on administrative boundaries
Pilot complete Digitising Sources located Planned

Last updated March 2026

Interactive coverage map → — township-level digitisation progress across 191 municipalities.


Timeline
2023–24
Done
Source identification. AI-OCR pipeline development. 196 villages geocoded in Namhae-gun. Partnerships with KDI and Sogang.
2025
Done
Systematic digitisation. OCR fine-tuning. Variable harmonisation. GIS boundary reconciliation.
2026
Active
Pilot release: Gyeongbu Expressway corridor townships. Core Keys and initial domain modules.
2027–28
Planned
Full national coverage. Additional archival sources. Expansion to other growth miracle economies.
Team
BSPhoto
Principal Investigator

BooKang Seol

설북강
Postdoctoral Researcher, LSE
bookangseol.com
Photo
Co-Investigator

Changkeun Lee

이창근
Korea Development Institute (KDI)
Photo
Co-Investigator

Hyunjoo Yang

양현주
Dept. of Economics, Sogang University

Hiring research assistants for 2026–27. Get in touch.

Research Assistant

TBD

To be recruited
OCR pipeline & quality validation
Research Assistant

TBD

To be recruited
GIS & geocoding
Research Assistant

TBD

To be recruited
Variable harmonisation

Partners

KDI
KDI
Korea Development Institute
LSE
LSE
London School of Economics
Sogang
Sogang
Sogang University
STEG
STEG
Structural Transformation & Economic Growth

For early access, collaboration, or questions — [enable JavaScript]

Seol, BooKang, Changkeun Lee, and Hyunjoo Yang. "MIRACLE: Subnational Economic Data for South Korea's Developmental Period, 1960–1989." London School of Economics, 2026. @techreport{seol2026miracle, author = {Seol, BooKang and Lee, Changkeun and Yang, Hyunjoo}, title = {{MIRACLE}: Subnational Economic Data for South Korea's Developmental Period, 1960--1989}, institution = {London School of Economics}, year = {2026} }