Vocabcord vocabulary, by the numbers

Vocabcord ships a single curated corpus of 33,233 phrases: six learner languages paired with English, plus a monolingual English track, graded across the six CEFR levels and grouped by everyday theme. Every phrase carries a translation, an in-context example, and native-speaker audio. Here is the full breakdown.

33,233 phrases shipping today, every one recorded by a native speaker

What's in the corpus

The corpus covers Spanish, French, German, Italian, Portuguese, and Polish, each paired with English in both directions, plus a monolingual English track for advanced learners. Phrases are graded across the six CEFR levels (A1 through C2) and sorted into everyday themes such as greetings, food, travel, work, health, and numbers.

Counts below tally every phrase entry that ships in the app. Each language pair is delivered in both translation directions (for example Spanish→English and English→Spanish), so a phrase taught both ways is counted in each direction it appears.

By CEFR level

CEFR levelPhrasesShare
A1 — Beginner11,23833.8%
A2 — Elementary7,96524.0%
B1 — Intermediate6,43519.4%
B2 — Upper-intermediate1,3003.9%
C1 — Advanced1,3003.9%
C2 — Proficient4,99515.0%
Total33,233100%

The corpus is front-loaded by design: more than half of it sits at A1–A2, where a beginner’s first plug-ins do the most work. C2 is deep because advanced vocabulary spans the widest range of topics.

By language

Language (paired with English)Phrases
Portuguese5,960
German5,956
Italian5,956
Polish5,173
Spanish4,710
French4,708
English (monolingual track)770
Total33,233

Sources, grading, and licensing

The proficiency levels follow the Common European Framework of Reference for Languages (CEFR), the A1 to C2 standard developed and maintained by the Council of Europe and used across European language education.

The English frequency backbone is the New General Service List (NGSL 1.2) by Charles Browne, Brent Culligan, and Joseph Phillips, derived from the Cambridge English Corpus and published under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. Vocabcord’s derived vocabulary lists are shared under the same license. The audio recordings, the app, and its software are separate, original works under their own license.

For the full picture of how phrases are chosen, graded, and recorded, read the methodology page.

Corrections

If a count looks off, or you spot a phrase, translation, or level that needs a fix, tell us on the support page. Real corrections from learners keep the corpus honest.