Search for packages
| purl | pkg:pypi/scrapy@1.3.2 |
| Vulnerability | Summary | Fixed by |
|---|---|---|
|
VCID-385b-344t-23es
Aliases: CVE-2024-3572 GHSA-7j7m-v7m3-jqm7 GMS-2024-327 |
Scrapy decompression bomb vulnerability ### Impact Scrapy limits allowed response sizes by default through the [`DOWNLOAD_MAXSIZE`](https://docs.scrapy.org/en/latest/topics/settings.html#download-maxsize) and [`DOWNLOAD_WARNSIZE`](https://docs.scrapy.org/en/latest/topics/settings.html#download-warnsize) settings. However, those limits were only being enforced during the download of the raw, usually-compressed response bodies, and not during decompression, making Scrapy vulnerable to [decompression bombs](https://cwe.mitre.org/data/definitions/409.html). A malicious website being scraped could send a small response that, on decompression, could exhaust the memory available to the Scrapy process, potentially affecting any other process sharing that memory, and affecting disk usage in case of uncompressed response caching. ### Patches Upgrade to Scrapy 2.11.1. If you are using Scrapy 1.8 or a lower version, and upgrading to Scrapy 2.11.1 is not an option, you may upgrade to Scrapy 1.8.4 instead. ### Workarounds There is no easy workaround. Disabling HTTP decompression altogether is impractical, as HTTP compression is a rather common practice. However, it is technically possible to manually backport the 2.11.1 or 1.8.4 fix, replacing the corresponding components of an unpatched version of Scrapy with patched versions copied into your own code. ### Acknowledgements This security issue was reported by @dmandefy [through huntr.com](https://huntr.com/bounties/c4a0fac9-0c5a-4718-9ee4-2d06d58adabb/). |
Affected by 8 other vulnerabilities. Affected by 6 other vulnerabilities. |
|
VCID-4vw6-u8m8-dbe2
Aliases: CVE-2021-41125 GHSA-jwqp-28gf-p498 PYSEC-2021-363 |
Scrapy is a high-level web crawling and scraping framework for Python. If you use `HttpAuthMiddleware` (i.e. the `http_user` and `http_pass` spider attributes) for HTTP authentication, all requests will expose your credentials to the request target. This includes requests generated by Scrapy components, such as `robots.txt` requests sent by Scrapy when the `ROBOTSTXT_OBEY` setting is set to `True`, or as requests reached through redirects. Upgrade to Scrapy 2.5.1 and use the new `http_auth_domain` spider attribute to control which domains are allowed to receive the configured HTTP authentication credentials. If you are using Scrapy 1.8 or a lower version, and upgrading to Scrapy 2.5.1 is not an option, you may upgrade to Scrapy 1.8.1 instead. If you cannot upgrade, set your HTTP authentication credentials on a per-request basis, using for example the `w3lib.http.basic_auth_header` function to convert your credentials into a value that you can assign to the `Authorization` header of your request, instead of defining your credentials globally using `HttpAuthMiddleware`. |
Affected by 13 other vulnerabilities. Affected by 13 other vulnerabilities. |
|
VCID-64nx-aruy-q7gy
Aliases: CVE-2024-1892 GHSA-cc65-xxvf-f7r9 GMS-2024-287 PYSEC-2024-162 |
A Regular Expression Denial of Service (ReDoS) vulnerability exists in the XMLFeedSpider class of the scrapy/scrapy project, specifically in the parsing of XML content. By crafting malicious XML content that exploits inefficient regular expression complexity used in the parsing process, an attacker can cause a denial-of-service (DoS) condition. This vulnerability allows for the system to hang and consume significant resources, potentially rendering services that utilize Scrapy for XML processing unresponsive. |
Affected by 8 other vulnerabilities. Affected by 6 other vulnerabilities. |
|
VCID-dc1m-rt7j-w3af
Aliases: CVE-2025-6176 GHSA-2qfp-q593-8484 |
Scrapy is vulnerable to a denial of service (DoS) attack due to flaws in brotli decompression implementation Scrapy versions up to 2.13.3 are vulnerable to a denial of service (DoS) attack due to a flaw in its brotli decompression implementation. The protection mechanism against decompression bombs fails to mitigate the brotli variant, allowing remote servers to crash clients with less than 80GB of available memory. This occurs because brotli can achieve extremely high compression ratios for zero-filled data, leading to excessive memory consumption during decompression. Mitigation for this vulnerability needs security enhancement added in brotli v1.2.0. |
Affected by 1 other vulnerability. |
|
VCID-jvzg-u5ks-tkhd
Aliases: GHSA-mfjm-vh54-3f96 GMS-2022-230 |
Cookie-setting is not restricted based on the public suffix list Responses from domain names whose public domain name suffix contains 1 or more periods (e.g. responses from `example.co.uk`, given its public domain name suffix is `co.uk`) are able to set cookies that are included in requests to any other domain sharing the same domain name suffix. |
Affected by 12 other vulnerabilities. Affected by 12 other vulnerabilities. |
|
VCID-kgf5-wu3r-pqc6
Aliases: CVE-2024-3574 GHSA-cw9j-q3vf-hrrv GMS-2024-288 |
Scrapy authorization header leakage on cross-domain redirect ### Impact When you send a request with the `Authorization` header to one domain, and the response asks to redirect to a different domain, Scrapy’s built-in redirect middleware creates a follow-up redirect request that keeps the original `Authorization` header, leaking its content to that second domain. The [right behavior](https://fetch.spec.whatwg.org/#ref-for-cors-non-wildcard-request-header-name) would be to drop the `Authorization` header instead, in this scenario. ### Patches Upgrade to Scrapy 2.11.1. If you are using Scrapy 1.8 or a lower version, and upgrading to Scrapy 2.11.1 is not an option, you may upgrade to Scrapy 1.8.4 instead. ### Workarounds If you cannot upgrade, make sure that you are not using the `Authentication` header, either directly or through some third-party plugin. If you need to use that header in some requests, add `"dont_redirect": True` to the `request.meta` dictionary of those requests to disable following redirects for them. If you need to keep (same domain) redirect support on those requests, make sure you trust the target website not to redirect your requests to a different domain. ### Acknowledgements This security issue was reported by @ranjit-git [through huntr.com](https://huntr.com/bounties/49974321-2718-43e3-a152-62b16eed72a9/). |
Affected by 8 other vulnerabilities. Affected by 6 other vulnerabilities. |
|
VCID-m9gg-8qum-9bh2
Aliases: CVE-2017-14158 GHSA-h7wm-ph43-c39p PYSEC-2017-83 |
Scrapy 1.4 allows remote attackers to cause a denial of service (memory consumption) via large files because arbitrarily many files are read into memory, which is especially problematic if the files are then individually written in a separate thread to a slow storage resource, as demonstrated by interaction between dataReceived (in core/downloader/handlers/http11.py) and S3FilesStore. | There are no reported fixed by versions. |
|
VCID-nekz-z7zw-mfgz
Aliases: GHSA-23j4-mw76-5v7h |
Scrapy allows redirect following in protocols other than HTTP ### Impact Scrapy was following redirects regardless of the URL protocol, so redirects were working for `data://`, `file://`, `ftp://`, `s3://`, and any other scheme defined in the `DOWNLOAD_HANDLERS` setting. However, HTTP redirects should only work between URLs that use the `http://` or `https://` schemes. A malicious actor, given write access to the start requests (e.g. ability to define `start_urls`) of a spider and read access to the spider output, could exploit this vulnerability to: - Redirect to any local file using the `file://` scheme to read its contents. - Redirect to an `ftp://` URL of a malicious FTP server to obtain the FTP username and password configured in the spider or project. - Redirect to any `s3://` URL to read its content using the S3 credentials configured in the spider or project. For `file://` and `s3://`, how the spider implements its parsing of input data into an output item determines what data would be vulnerable. A spider that always outputs the entire contents of a response would be completely vulnerable, while a spider that extracted only fragments from the response could significantly limit vulnerable data. ### Patches Upgrade to Scrapy 2.11.2. ### Workarounds Replace the built-in retry middlewares (`RedirectMiddleware` and `MetaRefreshMiddleware`) with custom ones that implement the fix from Scrapy 2.11.2, and verify that they work as intended. ### References This security issue was reported by @mvsantos at https://github.com/scrapy/scrapy/issues/457. |
Affected by 2 other vulnerabilities. |
|
VCID-t5cn-a543-nyag
Aliases: GHSA-cg34-w3fm-82h3 |
Duplicate Advisory: Scrapy leaks the authorization header on same-domain but cross-origin redirects ## Duplicate Advisory This advisory has been withdrawn because it is a duplicate of GHSA-4qqq-9vqf-3h3f. This link is maintained to preserve external references. ## Original Description In scrapy/scrapy, an issue was identified where the Authorization header is not removed during redirects that only change the scheme (e.g., HTTPS to HTTP) but remain within the same domain. This behavior contravenes the Fetch standard, which mandates the removal of Authorization headers in cross-origin requests when the scheme, host, or port changes. Consequently, when a redirect downgrades from HTTPS to HTTP, the Authorization header may be inadvertently exposed in plaintext, leading to potential sensitive information disclosure to unauthorized actors. The flaw is located in the _build_redirect_request function of the redirect middleware. |
Affected by 2 other vulnerabilities. |
|
VCID-ugxf-pfaw-rqbm
Aliases: GHSA-9x8m-2xpf-crp3 GMS-2022-3357 |
Scrapy before 2.6.2 and 1.8.3 vulnerable to one proxy sending credentials to another ### Impact When the [built-in HTTP proxy downloader middleware](https://docs.scrapy.org/en/2.6/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpproxy) processes a request with `proxy` metadata, and that `proxy` metadata includes proxy credentials, the built-in HTTP proxy downloader middleware sets the `Proxy-Authentication` header, but only if that header is not already set. There are third-party proxy-rotation downloader middlewares that set different `proxy` metadata every time they process a request. Because of request retries and redirects, the same request can be processed by downloader middlewares more than once, including both the built-in HTTP proxy downloader middleware and any third-party proxy-rotation downloader middleware. These third-party proxy-rotation downloader middlewares could change the `proxy` metadata of a request to a new value, but fail to remove the `Proxy-Authentication` header from the previous value of the `proxy` metadata, causing the credentials of one proxy to be leaked to a different proxy. If you rotate proxies from different proxy providers, and any of those proxies requires credentials, you are affected, unless you are handling proxy rotation as described under **Workarounds** below. If you use a third-party downloader middleware for proxy rotation, the same applies to that downloader middleware, and installing a patched version of Scrapy may not be enough; patching that downloader middlware may be necessary as well. ### Patches Upgrade to Scrapy 2.6.2. If you are using Scrapy 1.8 or a lower version, and upgrading to Scrapy 2.6.2 is not an option, you may upgrade to Scrapy 1.8.3 instead. ### Workarounds If you cannot upgrade, make sure that any code that changes the value of the `proxy` request meta also removes the `Proxy-Authorization` header from the request if present. ### For more information If you have any questions or comments about this advisory: * [Open an issue](https://github.com/scrapy/scrapy/issues) * [Email us](mailto:opensource@zyte.com) |
Affected by 10 other vulnerabilities. Affected by 9 other vulnerabilities. |
|
VCID-urb1-hv1z-duga
Aliases: CVE-2024-1968 GHSA-4qqq-9vqf-3h3f PYSEC-2024-258 |
In scrapy/scrapy, an issue was identified where the Authorization header is not removed during redirects that only change the scheme (e.g., HTTPS to HTTP) but remain within the same domain. This behavior contravenes the Fetch standard, which mandates the removal of Authorization headers in cross-origin requests when the scheme, host, or port changes. Consequently, when a redirect downgrades from HTTPS to HTTP, the Authorization header may be inadvertently exposed in plaintext, leading to potential sensitive information disclosure to unauthorized actors. The flaw is located in the _build_redirect_request function of the redirect middleware. |
Affected by 14 other vulnerabilities. Affected by 2 other vulnerabilities. |
|
VCID-veaw-n6vt-zfgu
Aliases: GHSA-jm3v-qxmh-hxwv |
Scrapy's redirects ignoring scheme-specific proxy settings ### Impact When using system proxy settings, which are scheme-specific (i.e. specific to `http://` or `https://` URLs), Scrapy was not accounting for scheme changes during redirects. For example, an HTTP request would use the proxy configured for HTTP and, when redirected to an HTTPS URL, the new HTTPS request would still use the proxy configured for HTTP instead of switching to the proxy configured for HTTPS. Same the other way around. If you have different proxy configurations for HTTP and HTTPS in your system for security reasons (e.g., maybe you don’t want one of your proxy providers to be aware of the URLs that you visit with the other one), this would be a security issue. ### Patches Upgrade to Scrapy 2.11.2. ### Workarounds Replace the built-in retry middlewares (`RedirectMiddleware` and `MetaRefreshMiddleware`) and the `HttpProxyMiddleware` middleware with custom ones that implement the fix from Scrapy 2.11.2, and verify that they work as intended. ### References This security issue was reported by @redapple at https://github.com/scrapy/scrapy/issues/767. |
Affected by 2 other vulnerabilities. |
|
VCID-x9ee-za9y-3fcb
Aliases: CVE-2022-0577 GHSA-cjvr-mfj7-j4j8 PYSEC-2022-159 |
Exposure of Sensitive Information to an Unauthorized Actor in GitHub repository scrapy/scrapy prior to 2.6.1. |
Affected by 12 other vulnerabilities. Affected by 11 other vulnerabilities. |
| Vulnerability | Summary | Aliases |
|---|---|---|
| This package is not known to fix vulnerabilities. | ||