Vocabcord ships a single curated corpus of 33,233 phrases: six learner languages paired with English, plus a monolingual English track, graded across the six CEFR levels and grouped by everyday theme. Every phrase carries a translation, an in-context example, and native-speaker audio. Here is the full breakdown.
What's in the corpus
The corpus covers Spanish, French, German, Italian, Portuguese, and Polish, each paired with English in both directions, plus a monolingual English track for advanced learners. Phrases are graded across the six CEFR levels (A1 through C2) and sorted into everyday themes such as greetings, food, travel, work, health, and numbers.
Counts below tally every phrase entry that ships in the app. Each language pair is delivered in both translation directions (for example Spanish→English and English→Spanish), so a phrase taught both ways is counted in each direction it appears.
By CEFR level
| CEFR level | Phrases | Share |
|---|---|---|
| A1 — Beginner | 11,238 | 33.8% |
| A2 — Elementary | 7,965 | 24.0% |
| B1 — Intermediate | 6,435 | 19.4% |
| B2 — Upper-intermediate | 1,300 | 3.9% |
| C1 — Advanced | 1,300 | 3.9% |
| C2 — Proficient | 4,995 | 15.0% |
| Total | 33,233 | 100% |
The corpus is front-loaded by design: more than half of it sits at A1–A2, where a beginner’s first plug-ins do the most work. C2 is deep because advanced vocabulary spans the widest range of topics.
By language
| Language (paired with English) | Phrases |
|---|---|
| Portuguese | 5,960 |
| German | 5,956 |
| Italian | 5,956 |
| Polish | 5,173 |
| Spanish | 4,710 |
| French | 4,708 |
| English (monolingual track) | 770 |
| Total | 33,233 |
Sources, grading, and licensing
The proficiency levels follow the Common European Framework of Reference for Languages (CEFR), the A1 to C2 standard developed and maintained by the Council of Europe and used across European language education.
The English frequency backbone is the New General Service List (NGSL 1.2) by Charles Browne, Brent Culligan, and Joseph Phillips, derived from the Cambridge English Corpus and published under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. Vocabcord’s derived vocabulary lists are shared under the same license. The audio recordings, the app, and its software are separate, original works under their own license.
For the full picture of how phrases are chosen, graded, and recorded, read the methodology page.
Corrections
If a count looks off, or you spot a phrase, translation, or level that needs a fix, tell us on the support page. Real corrections from learners keep the corpus honest.