Skip to content
English library

Alpaca

Alpaca dataset from Stanford

Play icon crypto ? OpenAI deepseek grok

🦙🛁 Cleaned Alpaca Dataset

This repository hosts a cleaned and curated version of the dataset used to train the Alpaca LLM. On April 8, 2023, ~50,000 uncurated instructions were replaced with GPT-4-LLM data. Curation is ongoing.

7B and 13B LoRA models (trained in April 2023) are available on Hugging Face:

High-quality data improves model performance, often more effectively than increasing model size.

🧹 Data Cleaning and Curation

The original GPT-3-generated dataset had issues like noise, bias, and poor loss curves. The cleaned version addresses these, improving performance and reducing hallucinations.

Key Issues Fixed:

  • Noisy and inconsistent data.
  • US-centric bias.
  • Over-reliance on GPT-3 limitations.

🚀 Applications

Used in:

  • Multilingual chatbots.
  • Educational and healthcare tools.
  • Creative writing and research assistance.

🔮 Future Plans

  • Expand cultural diversity.
  • Incorporate real-time updates.
  • Integrate user feedback.

🤝 Contribute

Help by:

  • Submitting data.
  • Reporting bugs.
  • Improving documentation.

🌟 Success Stories

  • Startups improved chatbot accuracy by 30%.
  • Universities reduced faculty workload by 20%.
  • Non-profits built multilingual support tools.

Find the plan that's right for you, each plan includes

docs iconsDocs
sheets iconsSheets
slides iconsslides
forms iconsforms
keep iconskeep
sites iconssites
drive iconsdrive
gmail iconsgmail
meet iconsmeet
calendar iconscalendar
Chat_icon@1x iconsChat
docusaurus_keytar iconsjup
docusaurus iconsBusiness
GoogleMaps iconsGoogleMaps
book iconbook
books iconbooks
security iconsecurity
restaurant iconrestaurant
thought iconthought
recipe iconrecipe
news iconnews
deepseek icondeepseek
deepseekr1 icondeepseekr1
deepseekr2 icondeepseekr2
deepseekr2 icondeepseekr3
deepseekr7 icondeepseekr7

Released under the MIT License.

Alpaca has loaded