Note: If you’re looking for the Hebrew version or our crowdsourced transcription efforts, go here.

ivrit.ai is a non-profit effort aiming to make Hebrew a first-class citizen for AI technologies, by providing favorably-licensed datasets.

As of December 2023, we cater the world’s largest Hebrew audio dataset, over 10,000 hours, for commercial model-training use-cases. It is provided free-of-charge.

All ivrit.ai datasets are provided under a specially crafted license that explicitly allows using the content for AI model training – including for commercial models – while still maintaining key rights of content owners.

Please read our Credits page for a list of content creators and volunteers.

You can read more about our first dataset here, and it is of course available on Huggingface.

For more info, please contact us here.