Someone Made a Dataset of One Million Bluesky Posts for ‘Machine Learning Research’

[ad_1]
A machine learning librarian at Hugging Face just released a dataset composed of one million Bluesky posts, complete with when they were posted and who posted them, intended for machine learning research.
Daniel van Strien posted about the dataset on Bluesky on Tuesday:
First dataset for the new @huggingface.bsky.social @bsky.app community organisation: one-million-bluesky-posts 🦋
📊 1M public posts from Bluesky’s firehose API
🔍 Includes text, metadata, and language predictions
🔬 Perfect to experiment with using ML for Bluesky 🤗huggingface.co/datasets/blu…
— Daniel van Strien (@danielvanstrien.bsky.social) 2024-11-26T13:50:34.824Z
“This dataset contains 1 million public posts collected from Bluesky Social’s firehose API, intended for machine learning research and experimentation with social media data,” the dataset description says. “Each post contains text content, metadata, and information about media attachments and reply relationships.”
This post is for paid members only
Become a paid member for unlimited ad-free access to articles, bonus podcast content, and more.
Sign up for free access to this post
Free members get access to posts like this one along with an email round-up of our week’s stories.
[ad_2]