stashapp/CommunityScrapers

Fork 0

mirror of https://github.com/stashapp/CommunityScrapers.git synced 2026-04-12 05:00:45 -05:00

Go to file

feederbox826 eea0205dd4 [TeamskeetAPI] construct members URL, fallback on global reptyle_bundle

2026-04-12 02:38:04 -04:00

.github

…

.vscode

…

scrapers

…

site_generator

…

templates

…

validator

…

.gitattributes

…

.gitignore

…

build_site.sh

…

LICENSE

…

README.md

…

validate.js

…

README.md

CommunityScrapers repository

This repository contains scrapers created by the Stash community.

❗ Make sure to read ALL of the instructions here before requesting help from the community. ❗

Tip

For a more user friendly step-by-step guide you can check out the Guide to Scraping.

Installing scrapers via manager

Tip

Guide: How to install a scraper?

Scrapers can be installed and managed from the Settings > Metadata Providers page.

Scrapers are installed using the Available Scrapers section. The Community (stable) source is configured by default.

Some scrapers may require manual configuration before they will work, so make sure to check the scraper file for any instructions after installing it.

Installing scrapers manually

Tip

Guide: How to install a scraper?

To download all scrapers at once, clone this git repository. If you only need specific scrapers, download those .yml files individually.

When downloading individual files:

Open the .yml file you want.
Click the Download raw file button.
Save the page as a .yml file to preserve the correct format.

Move scraper files to your configured Scrapers Path under Settings > System > Application Paths (default: ~/.stash/scrapers). You may recognize ~/.stash as the folder where the config and database files are located.

After manually updating the scrapers folder or editing a scraper file, reload scrapers and refresh the edit scene/performer page. (Scrape with... -> Reload scrapers)

Some sites block content if the user agent is not valid. If you get a blocked or denied message, configure the Scraping -> Scraper User Agent setting in Stash. Valid Firefox user agent strings can be found here. Scrapers for those sites should include a comment with a tested and working user agent string.

Scrapers with useCDP set to true require that you have properly configured the Chrome CDP path setting in Stash. If you decide to use a remote instance, the headless Chromium Docker image from chromedp/headless-shell is highly recommended. browserless/chrome is not CDP-compatible and is not supported.

Python scrapers

Some scrapers require external programs to function, usually Python. All scrapers are tested with the newest stable release of Python, currently 3.14.x

Depending on your operating system you may need to install both Python and the scrapers' dependencies before they will work. For Windows users we strongly recommend installing Python using the installers from python.org instead of through the Windows Store, and also installing it outside of the Users folder so it is accessible to the entire system: a commonly used option is C:\Python314.

After installing Python you should install the most commonly used dependencies by running the following command in a terminal window:

python -m pip install stashapp-tools requests cloudscraper beautifulsoup4 lxml

You may need to replace python with py in the command if you are running on Windows.

If Stash does not detect your Python installation you can set the Python executable path in Settings > System > Application Paths. Note that this needs to point to the executable itself and not just the folder it is in.

Manually configured scrapers

Some scrapers need extra configuration before they will work. This is unfortunate if you install them through the web interface as any updates will overwrite your changes.

Python scrapers that need to communicate with your Stash (to create markers, for example, or to search your file system) might need to be configured to talk to your local Stash: by default they will use http://localhost:9999/graphql with no authentication to make their queries, but if your setup requires otherwise then you can find py_common/config.ini and set your own values.
Python scrapers that can be configured will (usually) create a default configuration file called config.ini in their respective directories the first time you run them.
Some scrapers require an API key or a cookie to work. If that is the case there will be instructions in the scraper file itself mentioning that and telling you how to add those fields.

How to use scrapers?

You can find a list of sites that currently have a scraper at https://stashapp.github.io/CommunityScrapers/

💥 For most scrapers you have to provide the object URL.

Stable build (>=v0.11.0)
Once you populate the `URL` field with an appropriate url, the scrape URL button will be active.

Clicking on that button brings up a popup that lets you select which fields to update.

Some scrapers support the Scrape with... function so you can use that instead of adding a url. Scrape with... usually works with either the Title field or the filename so make sure that they provide enough data for the scraper to work with.

A Query button is also available for scrapers that support that. Clicking the button allows you to edit the text that the scraper will use for your queries.

In case of errors/no results during scraping make sure to check stash's log section (Settings > Logs > set Log Level to Debug) for more info.

For more info please check the scraping help section or ask help from the community.

Host your own scrapers

We have a GitHub template available for those that prefer hosting on their own with step-by-step instructions to get started.

Repository: https://github.com/stashapp/scrapers-repo-template

Community support

Forum: discourse.stashapp.cc - Primary place for community support, feature requests, and discussions.
Discord: discord.gg/2TsNFKt - Real-time chat and community support.
Lemmy: discuss.online/c/stashapp - Community discussions.

Contributing

Contributions are always welcome! Use the Scraping Configuration help section to get started and stop by the Discord #scrapers channel with any questions.

Validation

The scrapers in this repository can be validated against a schema and checked for common errors.

Deno is used as a drop-in, sandboxed NodeJS alternative

# check all scrapers
deno run -R=scrapers -R="validator\scraper.schema.json" validate.js
# check specific scraper
deno run -R=scrapers -R="validator\scraper.schema.json" validate.js scrapers/foo.yml scrapers/bar.yml

Deno asks for env and sys permissions from chalk

Docker option

Instead of Deno, Docker can be used to run the validator

docker run --rm -v .:/app denoland/deno:distroless run -R=/app/ -E /app/validate.js --ci

Languages

Python 94%

Ruby 1.9%

JavaScript 1.6%

TypeScript 1.5%

PowerShell 0.5%

Other 0.5%