Name: Vocabcord CEFR Phrase Corpus
Creator: Vocabcord
License: https://creativecommons.org/licenses/by-sa/4.0/

Vocabcord ships a single curated corpus of 33,233 phrases: six learner languages paired with English, plus a monolingual English track, graded across the six CEFR levels and grouped by everyday theme. Every phrase carries a translation, an in-context example, and native-speaker audio. Here is the full breakdown.

33,233 phrases shipping today, every one recorded by a native speaker

What's in the corpus

The corpus covers Spanish, French, German, Italian, Portuguese, and Polish, each paired with English in both directions, plus a monolingual English track for advanced learners. Phrases are graded across the six CEFR levels (A1 through C2) and sorted into everyday themes such as greetings, food, travel, work, health, and numbers.

Counts below tally every phrase entry that ships in the app. Each language pair is delivered in both translation directions (for example Spanish→English and English→Spanish), so a phrase taught both ways is counted in each direction it appears.

By CEFR level

CEFR level	Phrases	Share
A1 — Beginner	11,238	33.8%
A2 — Elementary	7,965	24.0%
B1 — Intermediate	6,435	19.4%
B2 — Upper-intermediate	1,300	3.9%
C1 — Advanced	1,300	3.9%
C2 — Proficient	4,995	15.0%
Total	33,233	100%

The corpus is front-loaded by design: more than half of it sits at A1–A2, where a beginner’s first plug-ins do the most work. C2 is deep because advanced vocabulary spans the widest range of topics.

By language

Language (paired with English)	Phrases
Portuguese	5,960
German	5,956
Italian	5,956
Polish	5,173
Spanish	4,710
French	4,708
English (monolingual track)	770
Total	33,233

Sources, grading, and licensing

The proficiency levels follow the Common European Framework of Reference for Languages (CEFR), the A1 to C2 standard developed and maintained by the Council of Europe and used across European language education.

The English frequency backbone is the New General Service List (NGSL 1.2) by Charles Browne, Brent Culligan, and Joseph Phillips, derived from the Cambridge English Corpus and published under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. Vocabcord’s derived vocabulary lists are shared under the same license. The audio recordings, the app, and its software are separate, original works under their own license.

For the full picture of how phrases are chosen, graded, and recorded, read the methodology page.

Corrections

If a count looks off, or you spot a phrase, translation, or level that needs a fix, tell us on the support page. Real corrections from learners keep the corpus honest.

Vocabcord vocabulary, by the numbers

What's in the corpus

By CEFR level

By language

Sources, grading, and licensing

Corrections