Compare commits

..

2 Commits

Author SHA1 Message Date
pukkandan
9963cda115 Release 2021.03.03 2021-03-03 16:25:04 +05:30
pukkandan
c1be5231b9 [build] fix bug from da7f321e93 2021-03-03 16:25:04 +05:30
91 changed files with 2089 additions and 4781 deletions

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.24.1. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.01. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running yt-dlp version **2021.03.24.1**
- [ ] I've verified that I'm running yt-dlp version **2021.03.01**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@@ -44,7 +44,7 @@ Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your com
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.03.24.1
[debug] yt-dlp version 2021.03.01
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.24.1. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.01. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/yt-dlp/yt-dlp. yt-dlp does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running yt-dlp version **2021.03.24.1**
- [ ] I've verified that I'm running yt-dlp version **2021.03.01**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -21,13 +21,13 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.24.1. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.01. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running yt-dlp version **2021.03.24.1**
- [ ] I've verified that I'm running yt-dlp version **2021.03.01**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.24.1. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.01. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -30,7 +30,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running yt-dlp version **2021.03.24.1**
- [ ] I've verified that I'm running yt-dlp version **2021.03.01**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -46,7 +46,7 @@ Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your com
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.03.24.1
[debug] yt-dlp version 2021.03.01
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,13 +21,13 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.24.1. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.03.01. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running yt-dlp version **2021.03.24.1**
- [ ] I've verified that I'm running yt-dlp version **2021.03.01**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@@ -29,7 +29,7 @@ jobs:
- name: Print version
run: echo "${{ steps.bump_version.outputs.ytdlp_version }}"
- name: Run Make
run: make
run: make yt-dlp
- name: Create Release
id: create_release
uses: actions/create-release@v1

1
.gitignore vendored
View File

@@ -60,7 +60,6 @@ yt-dlp.zip
*.mkv
*.swf
*.part
*.part-*
*.ytdl
*.dump
*.frag

View File

@@ -1,6 +1,5 @@
pukkandan (owner)
shirt-dev (collaborator)
colethedj (collaborator)
h-h-h-h
pauldubois98
nixxo
@@ -22,16 +21,10 @@ nao20010128nao
kurumigi
tsukumi
bbepis
animelover1984
Pccode66
Ashish0804
Ashish
RobinD42
hseg
colethedj
DennyDai
codeasashu
teesid
kevinoconnor7
damianoamatruda
2ShedsJackson
CXwudi
xtkoba

View File

@@ -7,8 +7,7 @@
* Update Changelog.md and CONTRIBUTORS
* Change "Merged with ytdl" version in Readme.md if needed
* Commit to master as `Release <version>`
* Push to origin/release using `git push origin master:release`
build task will now run
* Push to origin/release - build task will now run
* Update version.py using devscripts\update-version.py
* Run `make issuetemplates`
* Commit to master as `[version] update :ci skip all`
@@ -18,115 +17,16 @@
-->
### 2021.04.03
* Merge youtube-dl: Upto 2021.04.01 ([commit/654b4f4](https://github.com/ytdl-org/youtube-dl/commit/654b4f4ff2718f38b3182c1188c5d569c14cc70a))
* Ability to set a specific field in the file's metadata using `--parse-metadata`
* Ability to select n'th best format like `-f bv*.2`
* [DiscoveryPlus] Add discoveryplus.in
* [la7] Add podcasts and podcast playlists by [nixxo](https://github.com/nixxo)
* [mildom] Update extractor with current proxy by [nao20010128nao](https://github.com/nao20010128nao)
* [ard:mediathek] Fix video id extraction
* [generic] Detect Invidious' link element
* [youtube] Show premium state in `availability` by [colethedj](https://github.com/colethedj)
* [viewsource] Add extractor to handle `view-source:`
* [sponskrub] Run before embedding thumbnail
* [documentation] Improve `--parse-metadata` documentation
### 2021.03.24.1
* Revert [commit/8562218](https://github.com/ytdl-org/youtube-dl/commit/8562218350a79d4709da8593bb0c538aa0824acf)
### 2021.03.24
* Merge youtube-dl: Upto 2021.03.25 ([commit/8562218](https://github.com/ytdl-org/youtube-dl/commit/8562218350a79d4709da8593bb0c538aa0824acf))
* Parse metadata from multiple fields using `--parse-metadata`
* Ability to load playlist infojson using `--load-info-json`
* Write current epoch to infojson when using `--no-clean-infojson`
* [youtube_live_chat] fix bug when trying to set cookies
* [niconico] Fix for when logged in by [CXwudi](https://github.com/CXwudi) and [xtkoba](https://github.com/xtkoba)
* [linuxacadamy] Fix login
### 2021.03.21
* Merge youtube-dl: Upto [commit/7e79ba7](https://github.com/ytdl-org/youtube-dl/commit/7e79ba7dd6e6649dd2ce3a74004b2044f2182881)
* Option `--no-clean-infojson` to keep private keys in the infojson
* [aria2c] Support retry/abort unavailable fragments by [damianoamatruda](https://github.com/damianoamatruda)
* [aria2c] Better default arguments
* [movefiles] Fix bugs and make more robust
* [formatSort] Fix `quality` being ignored
* [splitchapters] Fix for older ffmpeg
* [sponskrub] Pass proxy to sponskrub
* Make sure `post_hook` gets the final filename
* Recursively remove any private keys from infojson
* Embed video URL metadata inside `mp4` by [damianoamatruda](https://github.com/damianoamatruda) and [pukkandan](https://github.com/pukkandan)
* Merge `webm` formats into `mkv` if thumbnails are to be embedded by [damianoamatruda](https://github.com/damianoamatruda)
* Use headers and cookies when downloading subtitles by [damianoamatruda](https://github.com/damianoamatruda)
* Parse resolution in info dictionary by [damianoamatruda](https://github.com/damianoamatruda)
* More consistent warning messages by [damianoamatruda](https://github.com/damianoamatruda) and [pukkandan](https://github.com/pukkandan)
* [documentation] Add deprecated options and aliases in readme
* [documentation] Fix some minor mistakes
* [niconico] Partial fix adapted from [animelover1984/youtube-dl@b5eff52](https://github.com/animelover1984/youtube-dl/commit/b5eff52dd9ed5565672ea1694b38c9296db3fade) (login and smile formats still don't work)
* [niconico] Add user extractor by [animelover1984](https://github.com/animelover1984)
* [bilibili] Add anthology support by [animelover1984](https://github.com/animelover1984)
* [amcnetworks] Fix extractor by [2ShedsJackson](https://github.com/2ShedsJackson)
* [stitcher] Merge from youtube-dl by [nixxo](https://github.com/nixxo)
* [rcs] Improved extraction by [nixxo](https://github.com/nixxo)
* [linuxacadamy] Improve regex
* [youtube] Show if video is `private`, `unlisted` etc in info (`availability`) by [colethedj](https://github.com/colethedj) and [pukkandan](https://github.com/pukkandan)
* [youtube] bugfix for channel playlist extraction
* [nbc] Improve metadata extraction by [2ShedsJackson](https://github.com/2ShedsJackson)
### 2021.03.15
* **Split video by chapters**: using option `--split-chapters`
* The output file of the split files can be set with `-o`/`-P` using the prefix `chapter:`
* Additional keys `section_title`, `section_number`, `section_start`, `section_end` are available in the output template
* **Parallel fragment downloads** by [shirt](https://github.com/shirt-dev)
* Use option `--concurrent-fragments` (`-N`) to set the number of threads (default 1)
* Merge youtube-dl: Upto [commit/3be0980](https://github.com/ytdl-org/youtube-dl/commit/3be098010f667b14075e3dfad1e74e5e2becc8ea)
* [zee5] Add Show Extractor by [Ashish0804](https://github.com/Ashish0804) and [pukkandan](https://github.com/pukkandan)
* [rai] fix drm check [nixxo](https://github.com/nixxo)
* [wimtv] Add extractor by [nixxo](https://github.com/nixxo)
* [mtv] Add mtv.it and extract series metadata by [nixxo](https://github.com/nixxo)
* [pluto.tv] Add extractor by [kevinoconnor7](https://github.com/kevinoconnor7)
* [youtube] Rewrite comment extraction by [colethedj](https://github.com/colethedj)
* [embedthumbnail] Set mtime correctly
* Refactor some postprocessor/downloader code by [pukkandan](https://github.com/pukkandan) and [shirt](https://github.com/shirt-dev)
### 2021.03.07
* [youtube] Fix history, mixes, community pages and trending by [pukkandan](https://github.com/pukkandan) and [colethedj](https://github.com/colethedj)
* [youtube] Fix private feeds/playlists on multi-channel accounts by [colethedj](https://github.com/colethedj)
* [youtube] Extract alerts from continuation by [colethedj](https://github.com/colethedj)
* [cbs] Add support for ParamountPlus by [shirt](https://github.com/shirt-dev)
* [mxplayer] Rewrite extractor with show support by [pukkandan](https://github.com/pukkandan) and [Ashish0804](https://github.com/Ashish0804)
* [gedi] Improvements from youtube-dl by [nixxo](https://github.com/nixxo)
* [vimeo] Fix videos with password by [teesid](https://github.com/teesid)
* [lbry] Support `lbry://` url by [nixxo](https://github.com/nixxo)
* [bilibili] Change `Accept` header by [pukkandan](https://github.com/pukkandan) and [animelover1984](https://github.com/animelover1984)
* [trovo] Pass origin header
* [rai] Check for DRM by [nixxo](https://github.com/nixxo)
* [downloader] Fix bug for `ffmpeg`/`httpie`
* [update] Fix updater removing the executable bit on some UNIX distros
* [update] Fix current build hash for UNIX
* [documentation] Include wget/curl/aria2c install instructions for Unix by [Ashish0804](https://github.com/Ashish0804)
* Fix some videos downloading with `m3u8` extension
* Remove "fixup is ignored" warning when fixup wasn't passed by user
### 2021.03.03.2
* [build] Fix bug
### 2021.03.03
* [youtube] Use new browse API for continuation page extraction by [colethedj](https://github.com/colethedj) and [pukkandan](https://github.com/pukkandan)
* Fix HLS playlist downloading by [shirt](https://github.com/shirt-dev)
* Merge youtube-dl: Upto [2021.03.03](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.03.03)
* [youtube] Use new browse API for continuation page extraction by @colethedj and @pukkandan
* Fix HLS playlist downloading by @shirt
* **Merge youtube-dl:** Upto [2021.03.03](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.03.03)
* [mtv] Fix extractor
* [nick] Fix extractor by [DennyDai](https://github.com/DennyDai)
* [mxplayer] Add new extractor by [codeasashu](https://github.com/codeasashu)
* [nick] Fix extractor by @DennyDai
* [mxplayer] Add new extractor by@codeasashu
* [youtube] Throw error when `--extractor-retries` are exhausted
* Reduce default of `--extractor-retries` to 3
* Fix packaging bugs by [hseg](https://github.com/hseg)
* Fix packaging bugs by @hseg
### 2021.03.01
@@ -155,10 +55,10 @@
* Moved project to an organization [yt-dlp](https://github.com/yt-dlp)
* **Completely changed project name to yt-dlp** by [Pccode66](https://github.com/Pccode66) and [pukkandan](https://github.com/pukkandan)
* Also, `youtube-dlc` config files are no longer loaded
* Merge youtube-dl: Upto [commit/4460329](https://github.com/ytdl-org/youtube-dl/commit/44603290e5002153f3ebad6230cc73aef42cc2cd) (except tmz, gedi)
* **Merge youtube-dl:** Upto [commit/4460329](https://github.com/ytdl-org/youtube-dl/commit/44603290e5002153f3ebad6230cc73aef42cc2cd) (except tmz, gedi)
* [Readthedocs](https://yt-dlp.readthedocs.io) support by [shirt](https://github.com/shirt-dev)
* [youtube] Show if video was a live stream in info (`was_live`)
* [Zee5] Add new extractor by [Ashish0804](https://github.com/Ashish0804) and [pukkandan](https://github.com/pukkandan)
* [Zee5] Add new extractor by [Ashish](https://github.com/Ashish) and [pukkandan](https://github.com/pukkandan)
* [jwplatform] Add support for `hyland.com`
* [tennistv] Fix extractor
* [hls] Support media initialization by [shirt](https://github.com/shirt-dev)
@@ -173,7 +73,7 @@
### 2021.02.19
* Merge youtube-dl: Upto [commit/cf2dbec](https://github.com/ytdl-org/youtube-dl/commit/cf2dbec6301177a1fddf72862de05fa912d9869d) (except kakao)
* **Merge youtube-dl:** Upto [commit/cf2dbec](https://github.com/ytdl-org/youtube-dl/commit/cf2dbec6301177a1fddf72862de05fa912d9869d) (except kakao)
* [viki] Fix extractor
* [niconico] Extract `channel` and `channel_id` by [kurumigi](https://github.com/kurumigi)
* [youtube] Multiple page support for hashtag URLs
@@ -198,7 +98,7 @@
### 2021.02.15
* Merge youtube-dl: Upto [2021.02.10](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.02.10) (except archive.org)
* **Merge youtube-dl:** Upto [2021.02.10](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.02.10) (except archive.org)
* [niconico] Improved extraction and support encrypted/SMILE movies by [kurumigi](https://github.com/kurumigi), [tsukumi](https://github.com/tsukumi), [bbepis](https://github.com/bbepis), [pukkandan](https://github.com/pukkandan)
* Fix HLS AES-128 with multiple keys in external downloaders by [shirt](https://github.com/shirt-dev)
* [youtube_live_chat] Fix by using POST API by [siikamiika](https://github.com/siikamiika)
@@ -241,7 +141,7 @@
### 2021.02.04
* Merge youtube-dl: Upto [2021.02.04.1](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.02.04.1)
* **Merge youtube-dl:** Upto [2021.02.04.1](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.02.04.1)
* **Date/time formatting in output template:**
* You can use [`strftime`](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) to format date/time fields. Example: `%(upload_date>%Y-%m-%d)s`
* **Multiple output templates:**
@@ -295,7 +195,7 @@
### 2021.01.24
* Merge youtube-dl: Upto [2021.01.24](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.16)
* **Merge youtube-dl:** Upto [2021.01.24](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.16)
* Plugin support ([documentation](https://github.com/yt-dlp/yt-dlp#plugins))
* **Multiple paths**: New option `-P`/`--paths` to give different paths for different types of files
* The syntax is `-P "type:path" -P "type:path"` ([documentation](https://github.com/yt-dlp/yt-dlp#:~:text=-P,%20--paths%20TYPE:PATH))
@@ -324,7 +224,7 @@
### 2021.01.16
* Merge youtube-dl: Upto [2021.01.16](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.16)
* **Merge youtube-dl:** Upto [2021.01.16](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.16)
* **Configuration files:**
* Portable configuration file: `./yt-dlp.conf`
* Allow the configuration files to be named `yt-dlp` instead of `youtube-dlc`. See [this](https://github.com/yt-dlp/yt-dlp#configuration) for details
@@ -354,7 +254,7 @@
* [archive.org] Fix extractor and add support for audio and playlists by [wporr](https://github.com/wporr)
* [Animelab] Added by [mariuszskon](https://github.com/mariuszskon)
* [youtube:search] Fix view_count by [ohnonot](https://github.com/ohnonot)
* [youtube] Show if video is embeddable in info (`playable_in_embed`)
* [youtube] Show if video is embeddable in info
* Update version badge automatically in README
* Enable `test_youtube_search_matching`
* Create `to_screen` and similar functions in postprocessor/common
@@ -370,8 +270,9 @@
### 2021.01.08
* Merge youtube-dl: Upto [2021.01.08](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.08) except stitcher ([1](https://github.com/ytdl-org/youtube-dl/commit/bb38a1215718cdf36d73ff0a7830a64cd9fa37cc), [2](https://github.com/ytdl-org/youtube-dl/commit/a563c97c5cddf55f8989ed7ea8314ef78e30107f))
* Moved changelog to separate file
* **Merge youtube-dl:** Upto [2021.01.08](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.08)
* Extractor stitcher ([1](https://github.com/ytdl-org/youtube-dl/commit/bb38a1215718cdf36d73ff0a7830a64cd9fa37cc), [2](https://github.com/ytdl-org/youtube-dl/commit/a563c97c5cddf55f8989ed7ea8314ef78e30107f)) have not been merged
* Moved changelog to seperate file
### 2021.01.07-1
@@ -409,7 +310,7 @@
* Changed video format sorting to show video only files and video+audio files together.
* Added `--video-multistreams`, `--no-video-multistreams`, `--audio-multistreams`, `--no-audio-multistreams`
* Added `b`,`w`,`v`,`a` as alias for `best`, `worst`, `video` and `audio` respectively
* Shortcut Options: Added `--write-link`, `--write-url-link`, `--write-webloc-link`, `--write-desktop-link` by [h-h-h-h](https://github.com/h-h-h-h) - See [Internet Shortcut Options](README.md#internet-shortcut-options) for details
* **Shortcut Options:** Added `--write-link`, `--write-url-link`, `--write-webloc-link`, `--write-desktop-link` by [h-h-h-h](https://github.com/h-h-h-h) - See [Internet Shortcut Options](README.md#internet-shortcut-options) for details
* **Sponskrub integration:** Added `--sponskrub`, `--sponskrub-cut`, `--sponskrub-force`, `--sponskrub-location`, `--sponskrub-args` - See [SponSkrub Options](README.md#sponskrub-sponsorblock-options) for details
* Added `--force-download-archive` (`--force-write-archive`) by [h-h-h-h](https://github.com/h-h-h-h)
* Added `--list-formats-as-table`, `--list-formats-old`
@@ -419,38 +320,36 @@
* Relaxed validation for format filters so that any arbitrary field can be used
* Fix for embedding thumbnail in mp3 by [pauldubois98](https://github.com/pauldubois98) ([ytdl-org/youtube-dl#21569](https://github.com/ytdl-org/youtube-dl/pull/21569))
* Make Twitch Video ID output from Playlist and VOD extractor same. This is only a temporary fix
* Merge youtube-dl: Upto [2021.01.03](https://github.com/ytdl-org/youtube-dl/commit/8e953dcbb10a1a42f4e12e4e132657cb0100a1f8) - See [blackjack4494/yt-dlc#280](https://github.com/blackjack4494/yt-dlc/pull/280) for details
* **Merge youtube-dl:** Upto [2021.01.03](https://github.com/ytdl-org/youtube-dl/commit/8e953dcbb10a1a42f4e12e4e132657cb0100a1f8) - See [blackjack4494/yt-dlc#280](https://github.com/blackjack4494/yt-dlc/pull/280) for details
* Extractors [tiktok](https://github.com/ytdl-org/youtube-dl/commit/fb626c05867deab04425bad0c0b16b55473841a2) and [hotstar](https://github.com/ytdl-org/youtube-dl/commit/bb38a1215718cdf36d73ff0a7830a64cd9fa37cc) have not been merged
* Cleaned up the fork for public use
**PS**: All uncredited changes above this point are authored by [pukkandan](https://github.com/pukkandan)
### Unreleased changes in [blackjack4494/yt-dlc](https://github.com/blackjack4494/yt-dlc)
* Updated to youtube-dl release 2020.11.26 by [pukkandan](https://github.com/pukkandan)
* Youtube improvements by [pukkandan](https://github.com/pukkandan)
* Updated to youtube-dl release 2020.11.26
* [youtube]
* Implemented all Youtube Feeds (ytfav, ytwatchlater, ytsubs, ythistory, ytrec) and SearchURL
* Fix ytsearch not returning results sometimes due to promoted content
* Temporary fix for automatic captions - disable json3
* Fix some improper Youtube URLs
* Redirect channel home to /video
* Print youtube's warning message
* Handle Multiple pages for feeds better
* [youtube] Fix ytsearch not returning results sometimes due to promoted content by [colethedj](https://github.com/colethedj)
* [youtube] Temporary fix for automatic captions - disable json3 by [blackjack4494](https://github.com/blackjack4494)
* Multiple pages are handled better for feeds
* Add --break-on-existing by [gergesh](https://github.com/gergesh)
* Pre-check video IDs in the archive before downloading by [pukkandan](https://github.com/pukkandan)
* [bitwave.tv] New extractor by [lorpus](https://github.com/lorpus)
* [Gedi] Add extractor by [nixxo](https://github.com/nixxo)
* [Rcs] Add new extractor by [nixxo](https://github.com/nixxo)
* [skyit] New skyitalia extractor by [nixxo](https://github.com/nixxo)
* [france.tv] Fix thumbnail URL by [renalid](https://github.com/renalid)
* [ina] support mobile links by [B0pol](https://github.com/B0pol)
* [instagram] Fix thumbnail extractor by [nao20010128nao](https://github.com/nao20010128nao)
* [SouthparkDe] Support for English URLs by [xypwn](https://github.com/xypwn)
* [spreaker] fix SpreakerShowIE test URL by [pukkandan](https://github.com/pukkandan)
* [Vlive] Fix playlist handling when downloading a channel by [kyuyeunk](https://github.com/kyuyeunk)
* [tmz] Fix extractor by [diegorodriguezv](https://github.com/diegorodriguezv)
* [generic] Detect embedded bitchute videos by [pukkandan](https://github.com/pukkandan)
* [generic] Extract embedded youtube and twitter videos by [diegorodriguezv](https://github.com/diegorodriguezv)
* [ffmpeg] Ensure all streams are copied by [pukkandan](https://github.com/pukkandan)
* [embedthumbnail] Fix for os.rename error by [pukkandan](https://github.com/pukkandan)
* make_win.bat: don't use UPX to pack vcruntime140.dll by [jbruchon](https://github.com/jbruchon)
* Pre-check video IDs in the archive before downloading
* [bitwave.tv] New extractor
* [Gedi] Add extractor
* [Rcs] Add new extractor
* [skyit] Add support for multiple Sky Italia website and removed old skyitalia extractor
* [france.tv] Fix thumbnail URL
* [ina] support mobile links
* [instagram] Fix extractor
* [itv] BTCC new pages' URL update (articles instead of races)
* [SouthparkDe] Support for English URLs
* [spreaker] fix SpreakerShowIE test URL
* [Vlive] Fix playlist handling when downloading a channel
* [generic] Detect embedded bitchute videos
* [generic] Extract embedded youtube and twitter videos
* [ffmpeg] Ensure all streams are copied
* Fix for os.rename error when embedding thumbnail to video in a different drive
* make_win.bat: don't use UPX to pack vcruntime140.dll

View File

@@ -1,4 +1,4 @@
all: yt-dlp doc pypi-files
all: yt-dlp doc
clean: clean-test clean-dist clean-cache
completions: completion-bash completion-fish completion-zsh
doc: README.md CONTRIBUTING.md issuetemplates supportedsites

212
README.md
View File

@@ -3,7 +3,7 @@
[![Release version](https://img.shields.io/github/v/release/yt-dlp/yt-dlp?color=brightgreen&label=Release)](https://github.com/yt-dlp/yt-dlp/releases/latest)
[![License: Unlicense](https://img.shields.io/badge/License-Unlicense-blue.svg)](LICENSE)
[![CI Status](https://github.com/yt-dlp/yt-dlp/workflows/Core%20Tests/badge.svg?branch=master)](https://github.com/yt-dlp/yt-dlp/actions)
[![Discord](https://img.shields.io/discord/807245652072857610?color=blue&label=discord&logo=discord)](https://discord.gg/H5MNcFW63r)
[![Discord](https://img.shields.io/discord/807245652072857610?color=blue&label=discord&logo=discord)](https://discord.gg/S75JaBna)
[![Commits](https://img.shields.io/github/commit-activity/m/yt-dlp/yt-dlp?label=commits)](https://github.com/yt-dlp/yt-dlp/commits)
[![Last Commit](https://img.shields.io/github/last-commit/yt-dlp/yt-dlp/master)](https://github.com/yt-dlp/yt-dlp/commits)
@@ -13,7 +13,7 @@
A command-line program to download videos from youtube.com and many other [video platforms](supportedsites.md)
This is a [youtube-dl](https://github.com/ytdl-org/youtube-dl) fork based on the now inactive [youtube-dlc](https://github.com/blackjack4494/yt-dlc). The main focus of this project is adding new features and patches while also keeping up to date with the original project
This is a fork of [youtube-dlc](https://github.com/blackjack4494/yt-dlc) which is inturn a fork of [youtube-dl](https://github.com/ytdl-org/youtube-dl)
* [NEW FEATURES](#new-features)
* [INSTALLATION](#installation)
@@ -46,10 +46,7 @@ This is a [youtube-dl](https://github.com/ytdl-org/youtube-dl) fork based on the
* [Filtering Formats](#filtering-formats)
* [Sorting Formats](#sorting-formats)
* [Format Selection examples](#format-selection-examples)
* [MODIFYING METADATA](#modifying-metadata)
* [Modifying metadata examples](#modifying-metadata-examples)
* [PLUGINS](#plugins)
* [DEPRECATED OPTIONS](#deprecated-options)
* [MORE](#more)
@@ -60,7 +57,7 @@ The major new features from the latest release of [blackjack4494/yt-dlc](https:/
* **[Format Sorting](#sorting-formats)**: The default format sorting options have been changed so that higher resolution and better codecs will be now preferred instead of simply using larger bitrate. Furthermore, you can now specify the sort order using `-S`. This allows for much easier format selection that what is possible by simply using `--format` ([examples](#format-selection-examples))
* **Merged with youtube-dl v2021.04.01**: You get all the latest features and patches of [youtube-dl](https://github.com/ytdl-org/youtube-dl) in addition to all the features of [youtube-dlc](https://github.com/blackjack4494/yt-dlc)
* **Merged with youtube-dl v2021.03.03**: You get all the latest features and patches of [youtube-dl](https://github.com/ytdl-org/youtube-dl) in addition to all the features of [youtube-dlc](https://github.com/blackjack4494/yt-dlc)
* **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--get-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, Playlist infojson etc. Note that the NicoNico improvements are not available. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details.
@@ -69,19 +66,17 @@ The major new features from the latest release of [blackjack4494/yt-dlc](https:/
* Youtube search (`ytsearch:`, `ytsearchdate:`) along with Search URLs works correctly
* Redirect channel's home URL automatically to `/video` to preserve the old behaviour
* **Split video by chapters**: Videos can be split into multiple files based on chapters using `--split-chapters`
* **Multithreaded fragment downloads**: Fragment downloads can be natively multi-threaded. Use `--concurrent-fragments` (`-N`) option to set the number of threads used
* **Aria2c with HLS/DASH**: You can use aria2c as the external downloader for DASH(mpd) and HLS(m3u8) formats. No more slow ffmpeg/native downloads
* **New extractors**: AnimeLab, Philo MSO, Rcs, Gedi, bitwave.tv, mildom, audius, zee5, mtv.it, wimtv, pluto.tv
* **New extractors**: AnimeLab, Philo MSO, Rcs, Gedi, bitwave.tv, mildom, audius, zee5
* **Fixed extractors**: archive.org, roosterteeth.com, skyit, instagram, itv, SouthparkDe, spreaker, Vlive, tiktok, akamai, ina, rumble, tennistv, amcnetworks
* **Fixed extractors**: archive.org, roosterteeth.com, skyit, instagram, itv, SouthparkDe, spreaker, Vlive, tiktok, akamai, ina, rumble, tennistv
* **Plugin extractors**: Extractors can be loaded from an external file. See [plugins](#plugins) for details
* **Plugin support**: Extractors can be loaded from an external file. See [plugins](#plugins) for details
* **Multiple paths and output templates**: You can give different [output templates](#output-template) and download paths for different types of files. You can also set a temporary path where intermediary files are downloaded to using `--paths` (`-P`)
* **Multiple paths and output templates**: You can give different [output templates](#output-template) and download paths for different types of files. You can also set a temporary path where intermediary files are downloaded to. See [`--paths`](https://github.com/yt-dlp/yt-dlp/#:~:text=-P,%20--paths%20TYPE:PATH) for details
<!-- Relative link doesn't work for "#:~:text=" -->
* **Portable Configuration**: Configuration files are automatically loaded from the home and root directories. See [configuration](#configuration) for details
@@ -108,23 +103,6 @@ You can install yt-dlp using one of the following methods:
* Use pip+git: `python -m pip install --upgrade git+https://github.com/yt-dlp/yt-dlp.git@release`
* Install master branch: `python -m pip install --upgrade git+https://github.com/yt-dlp/yt-dlp`
UNIX users (Linux, macOS, BSD) can also install the [latest release](https://github.com/yt-dlp/yt-dlp/releases/latest) one of the following ways:
```
sudo curl -L https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlp
```
```
sudo wget https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -O /usr/local/bin/yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlp
```
```
sudo aria2c https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlp
```
### UPDATE
Starting from version `2021.02.09`, you can use `yt-dlp -U` to update if you are using the provided release.
If you are using `pip`, simply re-run the same command that was used to install the program.
@@ -199,7 +177,7 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
only list them
--no-flat-playlist Extract the videos of a playlist
--mark-watched Mark videos watched (YouTube only)
--no-mark-watched Do not mark videos watched (default)
--no-mark-watched Do not mark videos watched
--no-colors Do not emit color codes in output
## Network Options:
@@ -302,8 +280,6 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--no-include-ads Do not download advertisements (default)
## Download Options:
-N, --concurrent-fragments N Number of fragments to download
concurrently (default is 1)
-r, --limit-rate RATE Maximum download rate in bytes per second
(e.g. 50K or 4.2M)
-R, --retries RETRIES Number of retries (default is 10), or
@@ -426,11 +402,6 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--write-description etc. (default)
--no-write-playlist-metafiles Do not write playlist metadata when using
--write-info-json, --write-description etc.
--clean-infojson Remove some private fields such as
filenames from the infojson. Note that it
could still contain some personal
information (default)
--no-clean-infojson Write all fields to the infojson
--get-comments Retrieve video comments to be placed in the
.info.json file. The comments are fetched
even without this option if the extraction
@@ -474,8 +445,7 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--no-warnings Ignore warnings
-s, --simulate Do not download the video and do not write
anything to disk
--skip-download Do not download the video but write all
related files (Alias: --no-download)
--skip-download Do not download the video
-g, --get-url Simulate, quiet but print URL
-e, --get-title Simulate, quiet but print title
--get-id Simulate, quiet but print id
@@ -512,7 +482,7 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--encoding ENCODING Force the specified encoding (experimental)
--no-check-certificate Suppress HTTPS certificate validation
--prefer-insecure Use an unencrypted connection to retrieve
information about the video (Currently
information about the video. (Currently
supported only for YouTube)
--user-agent UA Specify a custom user agent
--referer URL Specify a custom referer, use if the video
@@ -526,11 +496,15 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--sleep-requests SECONDS Number of seconds to sleep between requests
during data extraction
--sleep-interval SECONDS Number of seconds to sleep before each
download. This is the minimum time to sleep
when used along with --max-sleep-interval
(Alias: --min-sleep-interval)
--max-sleep-interval SECONDS Maximum number of seconds to sleep. Can
only be used along with --min-sleep-interval
download when used alone or a lower bound
of a range for randomized sleep before each
download (minimum possible number of
seconds to sleep) when used along with
--max-sleep-interval
--max-sleep-interval SECONDS Upper bound of a range for randomized sleep
before each download (maximum possible
number of seconds to sleep). Must only be
used along with --min-sleep-interval
--sleep-subtitles SECONDS Number of seconds to sleep before each
subtitle download
@@ -580,16 +554,16 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--write-subs Write subtitle file
--no-write-subs Do not write subtitle file (default)
--write-auto-subs Write automatically generated subtitle file
(Alias: --write-automatic-subs)
--no-write-auto-subs Do not write auto-generated subtitles
(default) (Alias: --no-write-automatic-subs)
(YouTube only)
--no-write-auto-subs Do not write automatically generated
subtitle file (default)
--all-subs Download all the available subtitles of the
video
--list-subs List all available subtitles for the video
--sub-format FORMAT Subtitle format, accepts formats
preference, for example: "srt" or
"ass/srt/best"
--sub-langs LANGS Languages of the subtitles to download
--sub-lang LANGS Languages of the subtitles to download
(optional) separated by commas, use --list-
subs for available language tags
@@ -643,19 +617,18 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
ExtractAudio, VideoRemuxer, VideoConvertor,
EmbedSubtitle, Metadata, Merger,
FixupStretched, FixupM4a, FixupM3u8,
SubtitlesConvertor, EmbedThumbnail and
SplitChapters. The supported executables
are: SponSkrub, FFmpeg, FFprobe, and
AtomicParsley. You can also specify
"PP+EXE:ARGS" to give the arguments to the
specified executable only when being used
by the specified postprocessor.
Additionally, for ffmpeg/ffprobe, "_i"/"_o"
can be appended to the prefix optionally
followed by a number to pass the argument
before the specified input/output file. Eg:
--ppa "Merger+ffmpeg_i1:-v quiet". You can
use this option multiple times to give
SubtitlesConvertor and EmbedThumbnail. The
supported executables are: SponSkrub,
FFmpeg, FFprobe, and AtomicParsley. You can
also specify "PP+EXE:ARGS" to give the
arguments to the specified executable only
when being used by the specified
postprocessor. Additionally, for
ffmpeg/ffprobe, a number can be appended to
the exe name seperated by "_i" to pass the
argument before the specified input file.
Eg: --ppa "Merger+ffmpeg_i1:-v quiet". You
can use this option multiple times to give
different arguments to different
postprocessors. (Alias: --ppa)
-k, --keep-video Keep the intermediate video file on disk
@@ -671,9 +644,20 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--no-embed-thumbnail Do not embed thumbnail (default)
--add-metadata Write metadata to the video file
--no-add-metadata Do not write metadata (default)
--parse-metadata FROM:TO Parse additional metadata like title/artist
from other fields; see "MODIFYING METADATA"
for details
--parse-metadata FIELD:FORMAT Parse additional metadata like title/artist
from other fields. Give field name to
extract data from, and format of the field
seperated by a ":". Either regular
expression with named capture groups or a
similar syntax to the output template can
also be used. The parsed parameters replace
any existing values and can be use in
output template. This option can be used
multiple times. Example: --parse-metadata
"title:%(artist)s - %(title)s" matches a
title like "Coldplay - Paradise". Example
(regex): --parse-metadata
"description:Artist - (?P<artist>.+?)"
--xattrs Write metadata to the video file's xattrs
(using dublin core and xdg standards)
--fixup POLICY Automatically correct known faults of the
@@ -688,16 +672,8 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
downloading and post-processing, similar to
find's -exec syntax. Example: --exec 'adb
push {} /sdcard/Music/ && rm {}'
--convert-subs FORMAT Convert the subtitles to another format
--convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt|lrc)
(Alias: --convert-subtitles)
--split-chapters Split video into multiple files based on
internal chapters. The "chapter:" prefix
can be used with "--paths" and "--output"
to set the output filename for the split
files. See "OUTPUT TEMPLATE" for details
--no-split-chapters Do not split video based on chapters
(default)
## SponSkrub (SponsorBlock) Options:
[SponSkrub](https://github.com/yt-dlp/SponSkrub) is a utility to
@@ -813,9 +789,9 @@ The `-o` option is used to indicate a template for the output file names while `
**tl;dr:** [navigate me to examples](#output-template-examples).
The basic usage of `-o` is not to set any template arguments when downloading a single file, like in `yt-dlp -o funny_video.flv "https://some/video"` (hard-coding file extension like this is not recommended). However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations. Date/time fields can also be formatted according to [strftime formatting](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) by specifying it inside the parantheses separated from the field name using a `>`. For example, `%(duration>%H-%M-%S)s`.
The basic usage of `-o` is not to set any template arguments when downloading a single file, like in `yt-dlp -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations. Date/time fields can also be formatted according to [strftime formatting](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) by specifying it inside the parantheses seperated from the field name using a `>`. For example, `%(duration>%H-%M-%S)s`.
Additionally, you can set different output templates for the various metadata files separately from the general output template by specifying the type of file followed by the template separated by a colon ":". The different filetypes supported are `subtitle`, `thumbnail`, `description`, `annotation`, `infojson`, `pl_description`, `pl_infojson`, `chapter`. For example, `-o '%(title)s.%(ext)s' -o 'thumbnail:%(title)s\%(title)s.%(ext)s'` will put the thumbnails in a folder with the same name as the video.
Additionally, you can set different output templates for the various metadata files seperately from the general output template by specifying the type of file followed by the template seperated by a colon ":". The different filetypes supported are `subtitle|thumbnail|description|annotation|infojson|pl_description|pl_infojson`. For example, `-o '%(title)s.%(ext)s' -o 'thumbnail:%(title)s\%(title)s.%(ext)s'` will put the thumbnails in a folder with the same name as the video.
The available fields are:
@@ -824,7 +800,6 @@ The available fields are:
- `url` (string): Video URL
- `ext` (string): Video filename extension
- `alt_title` (string): A secondary title of the video
- `description` (string): The description of the video
- `display_id` (string): An alternative identifier for the video
- `uploader` (string): Full name of the video uploader
- `license` (string): License name the video is licensed under
@@ -848,7 +823,6 @@ The available fields are:
- `is_live` (boolean): Whether this video is a live stream or a fixed-length video
- `was_live` (boolean): Whether this video was originally a live stream
- `playable_in_embed` (string): Whether this video is allowed to play in embedded players on other sites
- `availability` (string): Whether the video is 'private', 'premium_only', 'subscriber_only', 'needs_auth', 'unlisted' or 'public'
- `start_time` (numeric): Time in seconds where the reproduction should start, as specified in the URL
- `end_time` (numeric): Time in seconds where the reproduction should end, as specified in the URL
- `format` (string): A human-readable description of the format
@@ -908,13 +882,6 @@ Available for the media that is a track or a part of a music album:
- `disc_number` (numeric): Number of the disc or other physical medium the track belongs to
- `release_year` (numeric): Year (YYYY) when the album was released
Available for `chapter:` prefix when using `--split-chapters` for videos with internal chapters:
- `section_title` (string): Title of the chapter
- `section_number` (numeric): Number of the chapter within the file
- `section_start` (numeric): Start time of the chapter in seconds
- `section_end` (numeric): End time of the chapter in seconds
Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with placeholder value provided with `--output-na-placeholder` (`NA` by default).
For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `yt-dlp test video` and id `BaW_jenozKcj`, this will result in a `yt-dlp test video-BaW_jenozKcj.mp4` file created in the current directory.
@@ -947,7 +914,7 @@ youtube-dl_test_video_.mp4 # A simple file name
# Download YouTube playlist videos in separate directory indexed by video order in a playlist
$ yt-dlp -o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re
# Download YouTube playlist videos in separate directories according to their uploaded year
# Download YouTube playlist videos in seperate directories according to their uploaded year
$ yt-dlp -o '%(upload_date>%Y)s/%(title)s.%(ext)s' https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re
# Download all playlists of YouTube channel/user keeping each playlist in separate directory:
@@ -968,7 +935,7 @@ $ yt-dlp -o - BaW_jenozKc
By default, yt-dlp tries to download the best available quality if you **don't** pass any options.
This is generally equivalent to using `-f bestvideo*+bestaudio/best`. However, if multiple audiostreams is enabled (`--audio-multistreams`), the default format changes to `-f bestvideo+bestaudio/best`. Similarly, if ffmpeg is unavailable, or if you use yt-dlp to stream to `stdout` (`-o -`), the default becomes `-f best/bestvideo+bestaudio`.
The general syntax for format selection is `-f FORMAT` (or `--format FORMAT`) where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
The general syntax for format selection is `--f FORMAT` (or `--format FORMAT`) where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
**tl;dr:** [navigate me to examples](#format-selection-examples).
@@ -992,9 +959,7 @@ You can also use special names to select particular edge case formats:
- `ba*`, `bestaudio*`: Select the best quality format that contains audio. It may also contain video. Equivalent to `best*[acodec!=none]`
- `wa*`, `worstaudio*`: Select the worst quality format that contains audio. It may also contain video. Equivalent to `worst*[acodec!=none]`
For example, to download the worst quality video-only format you can use `-f worstvideo`. It is however recomended not to use `worst` and related options. When your format selector is `worst`, the format which is worst in all respects is selected. Most of the time, what you actually want is the video with the smallest filesize instead. So it is generally better to use `-f best -S +size,+br,+res,+fps` instead of `-f worst`. See [sorting formats](#sorting-formats) for more details.
You can select the n'th best format of a type by using `best<type>.<n>`. For example, `best.2` will select the 2nd best combined format. Similarly, `bv*.3` will select the 3rd best format that contains a video stream.
For example, to download the worst quality video-only format you can use `-f worstvideo`. It is however recomended to never actually use `worst` and related options. When your format selector is `worst`, the format which is worst in all respects is selected. Most of the time, what you actually want is the video with the smallest filesize instead. So it is generally better to use `-f best -S +size,+br,+res,+fps` instead of `-f worst`. See [sorting formats](#sorting-formats) for more details.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
@@ -1065,7 +1030,7 @@ You can change the criteria for being considered the `best` by using `-S` (`--fo
- `br`: Equivalent to using `tbr,vbr,abr`
- `asr`: Audio sample rate in Hz
Note that any other **numerical** field made available by the extractor can also be used. All fields, unless specified otherwise, are sorted in decending order. To reverse this, prefix the field with a `+`. Eg: `+res` prefers format with the smallest resolution. Additionally, you can suffix a prefered value for the fields, separated by a `:`. Eg: `res:720` prefers larger videos, but no larger than 720p and the smallest video if there are no videos less than 720p. For `codec` and `ext`, you can provide two prefered values, the first for video and the second for audio. Eg: `+codec:avc:m4a` (equivalent to `+vcodec:avc,+acodec:m4a`) sets the video codec preference to `h264` > `h265` > `vp9` > `vp9.2` > `av01` > `vp8` > `h263` > `theora` and audio codec preference to `mp4a` > `aac` > `vorbis` > `opus` > `mp3` > `ac3` > `dts`. You can also make the sorting prefer the nearest values to the provided by using `~` as the delimiter. Eg: `filesize~1G` prefers the format with filesize closest to 1 GiB.
Note that any other **numerical** field made available by the extractor can also be used. All fields, unless specified otherwise, are sorted in decending order. To reverse this, prefix the field with a `+`. Eg: `+res` prefers format with the smallest resolution. Additionally, you can suffix a prefered value for the fields, seperated by a `:`. Eg: `res:720` prefers larger videos, but no larger than 720p and the smallest video if there are no videos less than 720p. For `codec` and `ext`, you can provide two prefered values, the first for video and the second for audio. Eg: `+codec:avc:m4a` (equivalent to `+vcodec:avc,+acodec:m4a`) sets the video codec preference to `h264` > `h265` > `vp9` > `vp9.2` > `av01` > `vp8` > `h263` > `theora` and audio codec preference to `mp4a` > `aac` > `vorbis` > `opus` > `mp3` > `ac3` > `dts`. You can also make the sorting prefer the nearest values to the provided by using `~` as the delimiter. Eg: `filesize~1G` prefers the format with filesize closest to 1 GiB.
The fields `hasvid`, `ie_pref`, `lang` are always given highest priority in sorting, irrespective of the user-defined order. This behaviour can be changed by using `--force-format-sort`. Apart from these, the default order used is: `quality,res,fps,codec:vp9.2,size,br,asr,proto,ext,hasaud,source,id`. Note that the extractors may override this default order, but they cannot override the user-provided order.
@@ -1190,72 +1155,11 @@ $ yt-dlp -S 'res:720,fps'
$ yt-dlp -S '+res:480,codec,br'
```
# MODIFYING METADATA
The metadata obtained the the extractors can be modified by using `--parse-metadata FROM:TO`. The general syntax is to give the name of a field or a template (with similar syntax to [output template](#output-template)) to extract data from, and the format to interpret it as, separated by a colon ":". Either a [python regular expression](https://docs.python.org/3/library/re.html#regular-expression-syntax) with named capture groups or a similar syntax to the [output template](#output-template) (only `%(field)s` formatting is supported) can be used for `TO`. The option can be used multiple times to parse and modify various fields.
Note that any field created by this can be used in the [output template](#output-template) and will also affect the media file's metadata added when using `--add-metadata`.
You can also use this to change only the metadata that is embedded in the media file. To do this, set the value of the corresponding field with a `meta_` prefix. For example, any value you set to `meta_description` field will be added to the `description` field in the file. You can use this to set a different "description" and "synopsis", for example.
## Modifying metadata examples
Note that on Windows you may need to use double quotes instead of single.
```bash
# Interpret the title as "Artist - Title"
$ yt-dlp --parse-metadata 'title:%(artist)s - %(title)s'
# Regex example
$ yt-dlp --parse-metadata 'description:Artist - (?P<artist>.+)'
# Set title as "Series name S01E05"
$ yt-dlp --parse-metadata '%(series)s S%(season_number)02dE%(episode_number)02d:%(title)s'
# Set "comment" field in video metadata using description instead of webpage_url
$ yt-dlp --parse-metadata 'description:(?s)(?P<meta_comment>.+)' --add-metadata
```
# PLUGINS
Plugins are loaded from `<root-dir>/ytdlp_plugins/<type>/__init__.py`. Currently only `extractor` plugins are supported. Support for `downloader` and `postprocessor` plugins may be added in the future. See [ytdlp_plugins](ytdlp_plugins) for example.
**Note**: `<root-dir>` is the directory of the binary (`<root-dir>/yt-dlp`), or the root directory of the module if you are running directly from source-code (`<root dir>/yt_dlp/__main__.py`)
# DEPRECATED OPTIONS
These are all the deprecated options and the current alternative to achieve the same effect
--cn-verification-proxy URL --geo-verification-proxy URL
--id -o "%(id)s.%(ext)s"
-A, --auto-number -o "%(autonumber)s-%(id)s.%(ext)s"
-t, --title -o "%(title)s-%(id)s.%(ext)s"
-l, --literal -o accepts literal names
--autonumber-size NUMBER Use string formatting. Eg: %(autonumber)03d
--metadata-from-title FORMAT --parse-metadata "%(title)s:FORMAT"
--prefer-avconv avconv is no longer officially supported (Alias: --no-prefer-ffmpeg)
--prefer-ffmpeg Default (Alias: --no-prefer-avconv)
--avconv-location avconv is no longer officially supported
-C, --call-home Not implemented
--no-call-home Default
--write-srt --write-subs
--no-write-srt --no-write-subs
--srt-lang LANGS --sub-langs LANGS
--prefer-unsecure --prefer-insecure
--rate-limit RATE --limit-rate RATE
--force-write-download-archive --force-write-archive
--dump-intermediate-pages --dump-pages
--dump-headers --print-traffic
--youtube-print-sig-code No longer supported
--trim-file-names LENGTH --trim-filenames LENGTH
--yes-overwrites --force-overwrites
--load-info --load-info-json
--split-tracks --split-chapters
--no-split-tracks --no-split-chapters
--sponskrub-args ARGS --ppa "sponskrub:ARGS"
--test Only used for testing extractors
# MORE
For FAQ, Developer Instructions etc., see the [original README](https://github.com/ytdl-org/youtube-dl#faq)

View File

@@ -97,8 +97,7 @@
- **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:episodes**
- **bbc.co.uk:iplayer:group**
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist**
- **BBVTV**
- **Beatport**
@@ -249,7 +248,6 @@
- **DiscoveryGoPlaylist**
- **DiscoveryNetworksDe**
- **DiscoveryPlus**
- **DiscoveryPlusIndia**
- **DiscoveryVR**
- **Disney**
- **dlive:stream**
@@ -349,7 +347,8 @@
- **Gaskrank**
- **Gazeta**
- **GDCVault**
- **GediDigital**
- **Gedi**
- **GediEmbeds**
- **generic**: Generic downloader that works on some sites
- **Gfycat**
- **GiantBomb**
@@ -458,8 +457,6 @@
- **kuwo:singer**: 酷我音乐 - 歌手
- **kuwo:song**: 酷我音乐
- **la7.it**
- **la7.it:pod:episode**
- **la7.it:podcast**
- **laola1tv**
- **laola1tv:embed**
- **lbry**
@@ -547,7 +544,6 @@
- **mixcloud:playlist**
- **mixcloud:user**
- **MLB**
- **MLBVideo**
- **Mnet**
- **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
@@ -566,8 +562,6 @@
- **mtg**: MTG services
- **mtv**
- **mtv.de**
- **mtv.it**
- **mtv.it:programma**
- **mtv:video**
- **mtvjapan**
- **mtvservices:embedded**
@@ -639,7 +633,6 @@
- **nicknight**
- **niconico**: ニコニコ動画
- **NiconicoPlaylist**
- **NiconicoUser**
- **Nintendo**
- **Nitter**
- **njoy**: N-JOY
@@ -710,9 +703,6 @@
- **OutsideTV**
- **PacktPub**
- **PacktPubCourse**
- **PalcoMP3:artist**
- **PalcoMP3:song**
- **PalcoMP3:video**
- **pandora.tv**: 판도라TV
- **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos
@@ -745,7 +735,6 @@
- **Playwire**
- **pluralsight**
- **pluralsight:course**
- **PlutoTV**
- **podomatic**
- **Pokemon**
- **PokemonWatch**
@@ -926,7 +915,6 @@
- **stanfordoc**: Stanford Open ClassRoom
- **Steam**
- **Stitcher**
- **StitcherShow**
- **StoryFire**
- **StoryFireSeries**
- **StoryFireUser**
@@ -1099,7 +1087,6 @@
- **Vidbit**
- **Viddler**
- **Videa**
- **video.arnes.si**: Arnes Video
- **video.google:search**: Google Video search
- **video.sky.it**
- **video.sky.it:live**
@@ -1185,7 +1172,6 @@
- **Weibo**
- **WeiboMobile**
- **WeiqiTV**: WQTV
- **WimTV**
- **Wistia**
- **WistiaPlaylist**
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
@@ -1256,9 +1242,7 @@
- **ZDF**
- **ZDFChannel**
- **Zee5**
- **zee5:series**
- **Zhihu**
- **zingmp3**: mp3.zing.vn
- **zingmp3:album**
- **zoom**
- **Zype**

View File

@@ -37,6 +37,7 @@ class TestAllURLsMatching(unittest.TestCase):
assertPlaylist('PL63F0C78739B09958')
assertTab('https://www.youtube.com/AsapSCIENCE')
assertTab('https://www.youtube.com/embedded')
assertTab('https://www.youtube.com/feed') # Own channel's home page
assertTab('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertTab('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertTab('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')

View File

@@ -14,10 +14,10 @@ from yt_dlp.postprocessor import MetadataFromFieldPP, MetadataFromTitlePP
class TestMetadataFromField(unittest.TestCase):
def test_format_to_regex(self):
pp = MetadataFromFieldPP(None, ['title:%(title)s - %(artist)s'])
self.assertEqual(pp._data[0]['regex'], r'(?P<title>.+)\ \-\ (?P<artist>.+)')
self.assertEqual(pp._data[0]['regex'], r'(?P<title>[^\r\n]+)\ \-\ (?P<artist>[^\r\n]+)')
class TestMetadataFromTitle(unittest.TestCase):
def test_format_to_regex(self):
pp = MetadataFromTitlePP(None, '%(title)s - %(artist)s')
self.assertEqual(pp._titleregex, r'(?P<title>.+)\ \-\ (?P<artist>.+)')
self.assertEqual(pp._titleregex, r'(?P<title>[^\r\n]+)\ \-\ (?P<artist>[^\r\n]+)')

View File

@@ -60,14 +60,12 @@ from .utils import (
encode_compat_str,
encodeFilename,
error_to_compat_str,
EntryNotInPlaylist,
ExistingVideoReached,
expand_path,
ExtractorError,
float_or_none,
format_bytes,
format_field,
FORMAT_RE,
formatSeconds,
GeoRestrictedError,
int_or_none,
@@ -218,7 +216,6 @@ class YoutubeDL(object):
logtostderr: Log messages to stderr instead of stdout.
writedescription: Write the video description to a .description file
writeinfojson: Write the video description to a .info.json file
clean_infojson: Remove private fields from the infojson
writecomments: Extract video comments. This will not be written to disk
unless writeinfojson is also given
writeannotations: Write the video annotations to a .annotations.xml file
@@ -773,93 +770,95 @@ class YoutubeDL(object):
'Put from __future__ import unicode_literals at the top of your code file or consider switching to Python 3.x.')
return outtmpl_dict
def prepare_outtmpl(self, outtmpl, info_dict, sanitize=None):
""" Make the template and info_dict suitable for substitution (outtmpl % info_dict)"""
template_dict = dict(info_dict)
# duration_string
template_dict['duration_string'] = ( # %(duration>%H-%M-%S)s is wrong if duration > 24hrs
formatSeconds(info_dict['duration'], '-')
if info_dict.get('duration', None) is not None
else None)
# epoch
template_dict['epoch'] = int(time.time())
# autonumber
autonumber_size = self.params.get('autonumber_size')
if autonumber_size is None:
autonumber_size = 5
template_dict['autonumber'] = self.params.get('autonumber_start', 1) - 1 + self._num_downloads
# resolution if not defined
if template_dict.get('resolution') is None:
if template_dict.get('width') and template_dict.get('height'):
template_dict['resolution'] = '%dx%d' % (template_dict['width'], template_dict['height'])
elif template_dict.get('height'):
template_dict['resolution'] = '%sp' % template_dict['height']
elif template_dict.get('width'):
template_dict['resolution'] = '%dx?' % template_dict['width']
if sanitize is None:
sanitize = lambda k, v: v
template_dict = dict((k, v if isinstance(v, compat_numeric_types) else sanitize(k, v))
for k, v in template_dict.items()
if v is not None and not isinstance(v, (list, tuple, dict)))
na = self.params.get('outtmpl_na_placeholder', 'NA')
template_dict = collections.defaultdict(lambda: na, template_dict)
# For fields playlist_index and autonumber convert all occurrences
# of %(field)s to %(field)0Nd for backward compatibility
field_size_compat_map = {
'playlist_index': len(str(template_dict['n_entries'])),
'autonumber': autonumber_size,
}
FIELD_SIZE_COMPAT_RE = r'(?<!%)%\((?P<field>autonumber|playlist_index)\)s'
mobj = re.search(FIELD_SIZE_COMPAT_RE, outtmpl)
if mobj:
outtmpl = re.sub(
FIELD_SIZE_COMPAT_RE,
r'%%(\1)0%dd' % field_size_compat_map[mobj.group('field')],
outtmpl)
numeric_fields = list(self._NUMERIC_FIELDS)
# Format date
FORMAT_DATE_RE = FORMAT_RE.format(r'(?P<key>(?P<field>\w+)>(?P<format>.+?))')
for mobj in re.finditer(FORMAT_DATE_RE, outtmpl):
conv_type, field, frmt, key = mobj.group('type', 'field', 'format', 'key')
if key in template_dict:
continue
value = strftime_or_none(template_dict.get(field), frmt, na)
if conv_type in 'crs': # string
value = sanitize(field, value)
else: # number
numeric_fields.append(key)
value = float_or_none(value, default=None)
if value is not None:
template_dict[key] = value
# Missing numeric fields used together with integer presentation types
# in format specification will break the argument substitution since
# string NA placeholder is returned for missing fields. We will patch
# output template for missing fields to meet string presentation type.
for numeric_field in numeric_fields:
if numeric_field not in template_dict:
outtmpl = re.sub(
FORMAT_RE.format(re.escape(numeric_field)),
r'%({0})s'.format(numeric_field), outtmpl)
return outtmpl, template_dict
def _prepare_filename(self, info_dict, tmpl_type='default'):
try:
template_dict = dict(info_dict)
template_dict['duration_string'] = ( # %(duration>%H-%M-%S)s is wrong if duration > 24hrs
formatSeconds(info_dict['duration'], '-')
if info_dict.get('duration', None) is not None
else None)
template_dict['epoch'] = int(time.time())
autonumber_size = self.params.get('autonumber_size')
if autonumber_size is None:
autonumber_size = 5
template_dict['autonumber'] = self.params.get('autonumber_start', 1) - 1 + self._num_downloads
if template_dict.get('resolution') is None:
if template_dict.get('width') and template_dict.get('height'):
template_dict['resolution'] = '%dx%d' % (template_dict['width'], template_dict['height'])
elif template_dict.get('height'):
template_dict['resolution'] = '%sp' % template_dict['height']
elif template_dict.get('width'):
template_dict['resolution'] = '%dx?' % template_dict['width']
sanitize = lambda k, v: sanitize_filename(
compat_str(v),
restricted=self.params.get('restrictfilenames'),
is_id=(k == 'id' or k.endswith('_id')))
template_dict = dict((k, v if isinstance(v, compat_numeric_types) else sanitize(k, v))
for k, v in template_dict.items()
if v is not None and not isinstance(v, (list, tuple, dict)))
na = self.params.get('outtmpl_na_placeholder', 'NA')
template_dict = collections.defaultdict(lambda: na, template_dict)
outtmpl = self.outtmpl_dict.get(tmpl_type, self.outtmpl_dict['default'])
outtmpl, template_dict = self.prepare_outtmpl(outtmpl, info_dict, sanitize)
force_ext = OUTTMPL_TYPES.get(tmpl_type)
# For fields playlist_index and autonumber convert all occurrences
# of %(field)s to %(field)0Nd for backward compatibility
field_size_compat_map = {
'playlist_index': len(str(template_dict['n_entries'])),
'autonumber': autonumber_size,
}
FIELD_SIZE_COMPAT_RE = r'(?<!%)%\((?P<field>autonumber|playlist_index)\)s'
mobj = re.search(FIELD_SIZE_COMPAT_RE, outtmpl)
if mobj:
outtmpl = re.sub(
FIELD_SIZE_COMPAT_RE,
r'%%(\1)0%dd' % field_size_compat_map[mobj.group('field')],
outtmpl)
# As of [1] format syntax is:
# %[mapping_key][conversion_flags][minimum_width][.precision][length_modifier]type
# 1. https://docs.python.org/2/library/stdtypes.html#string-formatting
FORMAT_RE = r'''(?x)
(?<!%)
%
\({0}\) # mapping key
(?:[#0\-+ ]+)? # conversion flags (optional)
(?:\d+)? # minimum field width (optional)
(?:\.\d+)? # precision (optional)
[hlL]? # length modifier (optional)
(?P<type>[diouxXeEfFgGcrs%]) # conversion type
'''
numeric_fields = list(self._NUMERIC_FIELDS)
# Format date
FORMAT_DATE_RE = FORMAT_RE.format(r'(?P<key>(?P<field>\w+)>(?P<format>.+?))')
for mobj in re.finditer(FORMAT_DATE_RE, outtmpl):
conv_type, field, frmt, key = mobj.group('type', 'field', 'format', 'key')
if key in template_dict:
continue
value = strftime_or_none(template_dict.get(field), frmt, na)
if conv_type in 'crs': # string
value = sanitize(field, value)
else: # number
numeric_fields.append(key)
value = float_or_none(value, default=None)
if value is not None:
template_dict[key] = value
# Missing numeric fields used together with integer presentation types
# in format specification will break the argument substitution since
# string NA placeholder is returned for missing fields. We will patch
# output template for missing fields to meet string presentation type.
for numeric_field in numeric_fields:
if numeric_field not in template_dict:
outtmpl = re.sub(
FORMAT_RE.format(re.escape(numeric_field)),
r'%({0})s'.format(numeric_field), outtmpl)
# expand_path translates '%%' into '%' and '$$' into '$'
# correspondingly that is not what we want since we need to keep
@@ -874,7 +873,6 @@ class YoutubeDL(object):
# title "Hello $PATH", we don't want `$PATH` to be expanded.
filename = expand_path(outtmpl).replace(sep, '') % template_dict
force_ext = OUTTMPL_TYPES.get(tmpl_type)
if force_ext is not None:
filename = replace_extension(filename, force_ext, template_dict.get('ext'))
@@ -1173,24 +1171,57 @@ class YoutubeDL(object):
else:
raise Exception('Invalid result type: %s' % result_type)
def _ensure_dir_exists(self, path):
return make_dir(path, self.report_error)
def __process_playlist(self, ie_result, download):
# We process each entry in the playlist
playlist = ie_result.get('title') or ie_result.get('id')
self.to_screen('[download] Downloading playlist: %s' % playlist)
if 'entries' not in ie_result:
raise EntryNotInPlaylist()
incomplete_entries = bool(ie_result.get('requested_entries'))
if incomplete_entries:
def fill_missing_entries(entries, indexes):
ret = [None] * max(*indexes)
for i, entry in zip(indexes, entries):
ret[i - 1] = entry
return ret
ie_result['entries'] = fill_missing_entries(ie_result['entries'], ie_result['requested_entries'])
if self.params.get('allow_playlist_files', True):
ie_copy = {
'playlist': playlist,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': 0
}
ie_copy.update(dict(ie_result))
def ensure_dir_exists(path):
return make_dir(path, self.report_error)
if self.params.get('writeinfojson', False):
infofn = self.prepare_filename(ie_copy, 'pl_infojson')
if not ensure_dir_exists(encodeFilename(infofn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(infofn)):
self.to_screen('[info] Playlist metadata is already present')
else:
playlist_info = dict(ie_result)
# playlist_info['entries'] = list(playlist_info['entries']) # Entries is a generator which shouldnot be resolved here
del playlist_info['entries']
self.to_screen('[info] Writing playlist metadata as JSON to: ' + infofn)
try:
write_json_file(self.filter_requested_info(playlist_info), infofn)
except (OSError, IOError):
self.report_error('Cannot write playlist metadata to JSON file ' + infofn)
if self.params.get('writedescription', False):
descfn = self.prepare_filename(ie_copy, 'pl_description')
if not ensure_dir_exists(encodeFilename(descfn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(descfn)):
self.to_screen('[info] Playlist description is already present')
elif ie_result.get('description') is None:
self.report_warning('There\'s no playlist description to write.')
else:
try:
self.to_screen('[info] Writing playlist description to: ' + descfn)
with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
descfile.write(ie_result['description'])
except (OSError, IOError):
self.report_error('Cannot write playlist description file ' + descfn)
return
playlist_results = []
@@ -1217,20 +1248,25 @@ class YoutubeDL(object):
def make_playlistitems_entries(list_ie_entries):
num_entries = len(list_ie_entries)
for i in playlistitems:
if -num_entries < i <= num_entries:
yield list_ie_entries[i - 1]
elif incomplete_entries:
raise EntryNotInPlaylist()
return [
list_ie_entries[i - 1] for i in playlistitems
if -num_entries <= i - 1 < num_entries]
def report_download(num_entries):
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, num_entries))
if isinstance(ie_entries, list):
n_all_entries = len(ie_entries)
if playlistitems:
entries = list(make_playlistitems_entries(ie_entries))
entries = make_playlistitems_entries(ie_entries)
else:
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
msg = 'Collected %d videos; downloading %d of them' % (n_all_entries, n_entries)
self.to_screen(
'[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
elif isinstance(ie_entries, PagedList):
if playlistitems:
entries = []
@@ -1242,73 +1278,25 @@ class YoutubeDL(object):
entries = ie_entries.getslice(
playliststart, playlistend)
n_entries = len(entries)
msg = 'Downloading %d videos' % n_entries
report_download(n_entries)
else: # iterable
if playlistitems:
entries = list(make_playlistitems_entries(list(itertools.islice(
ie_entries, 0, max(playlistitems)))))
entries = make_playlistitems_entries(list(itertools.islice(
ie_entries, 0, max(playlistitems))))
else:
entries = list(itertools.islice(
ie_entries, playliststart, playlistend))
n_entries = len(entries)
msg = 'Downloading %d videos' % n_entries
if any((entry is None for entry in entries)):
raise EntryNotInPlaylist()
if not playlistitems and (playliststart or playlistend):
playlistitems = list(range(1 + playliststart, 1 + playliststart + len(entries)))
ie_result['entries'] = entries
ie_result['requested_entries'] = playlistitems
if self.params.get('allow_playlist_files', True):
ie_copy = {
'playlist': playlist,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': 0
}
ie_copy.update(dict(ie_result))
if self.params.get('writeinfojson', False):
infofn = self.prepare_filename(ie_copy, 'pl_infojson')
if not self._ensure_dir_exists(encodeFilename(infofn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(infofn)):
self.to_screen('[info] Playlist metadata is already present')
else:
self.to_screen('[info] Writing playlist metadata as JSON to: ' + infofn)
try:
write_json_file(self.filter_requested_info(ie_result, self.params.get('clean_infojson', True)), infofn)
except (OSError, IOError):
self.report_error('Cannot write playlist metadata to JSON file ' + infofn)
if self.params.get('writedescription', False):
descfn = self.prepare_filename(ie_copy, 'pl_description')
if not self._ensure_dir_exists(encodeFilename(descfn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(descfn)):
self.to_screen('[info] Playlist description is already present')
elif ie_result.get('description') is None:
self.report_warning('There\'s no playlist description to write.')
else:
try:
self.to_screen('[info] Writing playlist description to: ' + descfn)
with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
descfile.write(ie_result['description'])
except (OSError, IOError):
self.report_error('Cannot write playlist description file ' + descfn)
return
report_download(n_entries)
if self.params.get('playlistreverse', False):
entries = entries[::-1]
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
self.to_screen('[%s] playlist %s: %s' % (ie_result['extractor'], playlist, msg))
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
@@ -1322,7 +1310,7 @@ class YoutubeDL(object):
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': playlistitems[i - 1] if playlistitems else i,
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
@@ -1576,25 +1564,21 @@ class YoutubeDL(object):
else:
format_fallback = False
mobj = re.match(
r'(?P<bw>best|worst|b|w)(?P<type>video|audio|v|a)?(?P<mod>\*)?(?:\.(?P<n>[1-9]\d*))?$',
format_spec)
if mobj is not None:
format_idx = int_or_none(mobj.group('n'), default=1)
format_idx = format_idx - 1 if mobj.group('bw')[0] == 'w' else -format_idx
format_type = (mobj.group('type') or [None])[0]
not_format_type = {'v': 'a', 'a': 'v'}.get(format_type)
format_modified = mobj.group('mod') is not None
format_spec_obj = re.match(r'(best|worst|b|w)(video|audio|v|a)?(\*)?$', format_spec)
if format_spec_obj is not None:
format_idx = 0 if format_spec_obj.group(1)[0] == 'w' else -1
format_type = format_spec_obj.group(2)[0] if format_spec_obj.group(2) else False
not_format_type = 'v' if format_type == 'a' else 'a'
format_modified = format_spec_obj.group(3) is not None
format_fallback = not format_type and not format_modified # for b, w
filter_f = (
(lambda f: f.get('%scodec' % format_type) != 'none')
if format_type and format_modified # bv*, ba*, wv*, wa*
else (lambda f: f.get('%scodec' % not_format_type) == 'none')
if format_type # bv, ba, wv, wa
else (lambda f: f.get('vcodec') != 'none' and f.get('acodec') != 'none')
if not format_modified # b, w
else None) # b*, w*
filter_f = ((lambda f: f.get(format_type + 'codec') != 'none')
if format_type and format_modified # bv*, ba*, wv*, wa*
else (lambda f: f.get(not_format_type + 'codec') == 'none')
if format_type # bv, ba, wv, wa
else (lambda f: f.get('vcodec') != 'none' and f.get('acodec') != 'none')
if not format_modified # b, w
else None) # b*, w*
else:
format_idx = -1
filter_f = ((lambda f: f.get('ext') == format_spec)
@@ -1606,16 +1590,13 @@ class YoutubeDL(object):
if not formats:
return
matches = list(filter(filter_f, formats)) if filter_f is not None else formats
n = len(matches)
if -n <= format_idx < n:
if matches:
yield matches[format_idx]
elif format_fallback and ctx['incomplete_formats']:
elif format_fallback == 'force' or (format_fallback and ctx['incomplete_formats']):
# for extractors with incomplete formats (audio only (soundcloud)
# or video only (imgur)) best/worst will fallback to
# best/worst {video,audio}-only format
n = len(formats)
if -n <= format_idx < n:
yield formats[format_idx]
yield formats[format_idx]
elif selector.type == MERGE: # +
def _merge(formats_pair):
@@ -1663,7 +1644,7 @@ class YoutubeDL(object):
new_dict.update({
'width': the_only_video.get('width'),
'height': the_only_video.get('height'),
'resolution': the_only_video.get('resolution') or self.format_resolution(the_only_video),
'resolution': the_only_video.get('resolution'),
'fps': the_only_video.get('fps'),
'vcodec': the_only_video.get('vcodec'),
'vbr': the_only_video.get('vbr'),
@@ -1813,18 +1794,14 @@ class YoutubeDL(object):
if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id']
for ts_key, date_key in (
('timestamp', 'upload_date'),
('release_timestamp', 'release_date'),
):
if info_dict.get(date_key) is None and info_dict.get(ts_key) is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict[ts_key])
info_dict[date_key] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError):
pass
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict['timestamp'])
info_dict['upload_date'] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError):
pass
# Auto generate title fields corresponding to the *_number fields when missing
# in order to always have clean titles. This is very common for TV series.
@@ -2066,7 +2043,7 @@ class YoutubeDL(object):
print_mandatory('format')
if self.params.get('forcejson', False):
self.post_extract(info_dict)
self.to_stdout(json.dumps(info_dict, default=repr))
self.to_stdout(json.dumps(info_dict))
def process_info(self, info_dict):
"""Process a single resolved IE result."""
@@ -2094,7 +2071,6 @@ class YoutubeDL(object):
info_dict = self.pre_process(info_dict)
# info_dict['_filename'] needs to be set for backward compatibility
info_dict['_filename'] = full_filename = self.prepare_filename(info_dict, warn=True)
temp_filename = self.prepare_filename(info_dict, 'temp')
files_to_move = {}
@@ -2113,14 +2089,17 @@ class YoutubeDL(object):
if full_filename is None:
return
if not self._ensure_dir_exists(encodeFilename(full_filename)):
def ensure_dir_exists(path):
return make_dir(path, self.report_error)
if not ensure_dir_exists(encodeFilename(full_filename)):
return
if not self._ensure_dir_exists(encodeFilename(temp_filename)):
if not ensure_dir_exists(encodeFilename(temp_filename)):
return
if self.params.get('writedescription', False):
descfn = self.prepare_filename(info_dict, 'description')
if not self._ensure_dir_exists(encodeFilename(descfn)):
if not ensure_dir_exists(encodeFilename(descfn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(descfn)):
self.to_screen('[info] Video description is already present')
@@ -2137,7 +2116,7 @@ class YoutubeDL(object):
if self.params.get('writeannotations', False):
annofn = self.prepare_filename(info_dict, 'annotation')
if not self._ensure_dir_exists(encodeFilename(annofn)):
if not ensure_dir_exists(encodeFilename(annofn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(annofn)):
self.to_screen('[info] Video annotations are already present')
@@ -2160,10 +2139,7 @@ class YoutubeDL(object):
fd.add_progress_hook(ph)
if self.params.get('verbose'):
self.to_screen('[debug] Invoking downloader on %r' % info.get('url'))
new_info = dict(info)
if new_info.get('http_headers') is None:
new_info['http_headers'] = self._calc_headers(new_info)
return fd.download(name, new_info, subtitle)
return fd.download(name, info, subtitle)
subtitles_are_requested = any([self.params.get('writesubtitles', False),
self.params.get('writeautomaticsub')])
@@ -2182,7 +2158,6 @@ class YoutubeDL(object):
sub_filename_final = subtitles_filename(sub_fn, sub_lang, sub_format, info_dict.get('ext'))
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(sub_filename)):
self.to_screen('[info] Video subtitle %s.%s is already present' % (sub_lang, sub_format))
sub_info['filepath'] = sub_filename
files_to_move[sub_filename] = sub_filename_final
else:
self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
@@ -2192,15 +2167,13 @@ class YoutubeDL(object):
# See https://github.com/ytdl-org/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_info['data'])
sub_info['filepath'] = sub_filename
files_to_move[sub_filename] = sub_filename_final
except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + sub_filename)
return
else:
try:
dl(sub_filename, sub_info.copy(), subtitle=True)
sub_info['filepath'] = sub_filename
dl(sub_filename, sub_info, subtitle=True)
files_to_move[sub_filename] = sub_filename_final
except (ExtractorError, IOError, OSError, ValueError, compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_warning('Unable to download subtitle for "%s": %s' %
@@ -2231,14 +2204,14 @@ class YoutubeDL(object):
if self.params.get('writeinfojson', False):
infofn = self.prepare_filename(info_dict, 'infojson')
if not self._ensure_dir_exists(encodeFilename(infofn)):
if not ensure_dir_exists(encodeFilename(infofn)):
return
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(infofn)):
self.to_screen('[info] Video metadata is already present')
else:
self.to_screen('[info] Writing video metadata as JSON to: ' + infofn)
try:
write_json_file(self.filter_requested_info(info_dict, self.params.get('clean_infojson', True)), infofn)
write_json_file(self.filter_requested_info(info_dict), infofn)
except (OSError, IOError):
self.report_error('Cannot write video metadata to JSON file ' + infofn)
return
@@ -2249,7 +2222,7 @@ class YoutubeDL(object):
for thumb_ext in self._write_thumbnails(info_dict, thumb_fn_temp):
thumb_filename_temp = replace_extension(thumb_fn_temp, thumb_ext, info_dict.get('ext'))
thumb_filename = replace_extension(thumbfn, thumb_ext, info_dict.get('ext'))
files_to_move[thumb_filename_temp] = thumb_filename
files_to_move[thumb_filename_temp] = info_dict['__thumbnail_filename'] = thumb_filename
# Write internet shortcut files
url_link = webloc_link = desktop_link = False
@@ -2362,17 +2335,10 @@ class YoutubeDL(object):
requested_formats = info_dict['requested_formats']
old_ext = info_dict['ext']
if self.params.get('merge_output_format') is None:
if not compatible_formats(requested_formats):
info_dict['ext'] = 'mkv'
self.report_warning(
'Requested formats are incompatible for merge and will be merged into mkv.')
if (info_dict['ext'] == 'webm'
and self.params.get('writethumbnail', False)
and info_dict.get('thumbnails')):
info_dict['ext'] = 'mkv'
self.report_warning(
'webm doesn\'t support embedding a thumbnail, mkv will be used.')
if self.params.get('merge_output_format') is None and not compatible_formats(requested_formats):
info_dict['ext'] = 'mkv'
self.report_warning(
'Requested formats are incompatible for merge and will be merged into mkv.')
def correct_ext(filename):
filename_real_ext = os.path.splitext(filename)[1][1:]
@@ -2394,7 +2360,7 @@ class YoutubeDL(object):
fname = prepend_extension(
self.prepare_filename(new_info, 'temp'),
'f%s' % f['format_id'], new_info['ext'])
if not self._ensure_dir_exists(fname):
if not ensure_dir_exists(fname):
return
downloaded.append(fname)
partial_success, real_download = dl(fname, new_info)
@@ -2471,8 +2437,9 @@ class YoutubeDL(object):
else:
assert fixup_policy in ('ignore', 'never')
if ('protocol' in info_dict
and get_suitable_downloader(info_dict, self.params).__name__ == 'HlsFD'):
if (info_dict.get('protocol') == 'm3u8_native'
or info_dict.get('protocol') == 'm3u8'
and self.params.get('hls_prefer_native')):
if fixup_policy == 'warn':
self.report_warning('%s: malformed AAC bitstream detected.' % (
info_dict['id']))
@@ -2488,13 +2455,13 @@ class YoutubeDL(object):
assert fixup_policy in ('ignore', 'never')
try:
info_dict = self.post_process(dl_filename, info_dict, files_to_move)
self.post_process(dl_filename, info_dict, files_to_move)
except PostProcessingError as err:
self.report_error('Postprocessing: %s' % str(err))
return
try:
for ph in self._post_hooks:
ph(info_dict['filepath'])
ph(full_filename)
except Exception as err:
self.report_error('post hooks: %s' % str(err))
return
@@ -2534,7 +2501,7 @@ class YoutubeDL(object):
else:
if self.params.get('dump_single_json', False):
self.post_extract(res)
self.to_stdout(json.dumps(res, default=repr))
self.to_stdout(json.dumps(res))
return self._download_retcode
@@ -2543,10 +2510,10 @@ class YoutubeDL(object):
[info_filename], mode='r',
openhook=fileinput.hook_encoded('utf-8'))) as f:
# FileInput doesn't have a read method, we can't call json.load
info = self.filter_requested_info(json.loads('\n'.join(f)), self.params.get('clean_infojson', True))
info = self.filter_requested_info(json.loads('\n'.join(f)))
try:
self.process_ie_result(info, download=True)
except (DownloadError, EntryNotInPlaylist):
except DownloadError:
webpage_url = info.get('webpage_url')
if webpage_url is not None:
self.report_warning('The info failed to download, trying with "%s"' % webpage_url)
@@ -2556,32 +2523,21 @@ class YoutubeDL(object):
return self._download_retcode
@staticmethod
def filter_requested_info(info_dict, actually_filter=True):
if not actually_filter:
info_dict['epoch'] = int(time.time())
return info_dict
exceptions = {
'remove': ['requested_formats', 'requested_subtitles', 'requested_entries', 'filepath', 'entries'],
'keep': ['_type'],
}
keep_key = lambda k: k in exceptions['keep'] or not (k.startswith('_') or k in exceptions['remove'])
filter_fn = lambda obj: (
list(map(filter_fn, obj)) if isinstance(obj, (list, tuple))
else obj if not isinstance(obj, dict)
else dict((k, filter_fn(v)) for k, v in obj.items() if keep_key(k)))
return filter_fn(info_dict)
def filter_requested_info(info_dict):
fields_to_remove = ('requested_formats', 'requested_subtitles')
return dict(
(k, v) for k, v in info_dict.items()
if (k[0] != '_' or k == '_type') and k not in fields_to_remove)
def run_pp(self, pp, infodict):
def run_pp(self, pp, infodict, files_to_move={}):
files_to_delete = []
if '__files_to_move' not in infodict:
infodict['__files_to_move'] = {}
files_to_delete, infodict = pp.run(infodict)
if not files_to_delete:
return infodict
return files_to_move, infodict
if self.params.get('keepvideo', False):
for f in files_to_delete:
infodict['__files_to_move'].setdefault(f, '')
files_to_move.setdefault(f, '')
else:
for old_filename in set(files_to_delete):
self.to_screen('Deleting original file %s (pass -k to keep)' % old_filename)
@@ -2589,16 +2545,16 @@ class YoutubeDL(object):
os.remove(encodeFilename(old_filename))
except (IOError, OSError):
self.report_warning('Unable to remove downloaded original file')
if old_filename in infodict['__files_to_move']:
del infodict['__files_to_move'][old_filename]
return infodict
if old_filename in files_to_move:
del files_to_move[old_filename]
return files_to_move, infodict
@staticmethod
def post_extract(info_dict):
def actual_post_extract(info_dict):
if info_dict.get('_type') in ('playlist', 'multi_video'):
for video_dict in info_dict.get('entries', {}):
actual_post_extract(video_dict or {})
actual_post_extract(video_dict)
return
if '__post_extractor' not in info_dict:
@@ -2609,27 +2565,25 @@ class YoutubeDL(object):
del info_dict['__post_extractor']
return
actual_post_extract(info_dict or {})
actual_post_extract(info_dict)
def pre_process(self, ie_info):
info = dict(ie_info)
for pp in self._pps['beforedl']:
info = self.run_pp(pp, info)
info = self.run_pp(pp, info)[1]
return info
def post_process(self, filename, ie_info, files_to_move=None):
def post_process(self, filename, ie_info, files_to_move={}):
"""Run all the postprocessors on the given file."""
info = dict(ie_info)
info['filepath'] = filename
info['__files_to_move'] = files_to_move or {}
info['__files_to_move'] = {}
for pp in ie_info.get('__postprocessors', []) + self._pps['normal']:
info = self.run_pp(pp, info)
info = self.run_pp(MoveFilesAfterDownloadPP(self), info)
del info['__files_to_move']
files_to_move, info = self.run_pp(pp, info, files_to_move)
info = self.run_pp(MoveFilesAfterDownloadPP(self, files_to_move), info)[1]
for pp in self._pps['aftermove']:
info = self.run_pp(pp, info)
return info
info = self.run_pp(pp, info, {})[1]
def _make_archive_id(self, info_dict):
video_id = info_dict.get('id')
@@ -2678,11 +2632,12 @@ class YoutubeDL(object):
return 'audio only'
if format.get('resolution') is not None:
return format['resolution']
if format.get('width') and format.get('height'):
res = '%dx%d' % (format['width'], format['height'])
elif format.get('height'):
res = '%sp' % format['height']
elif format.get('width'):
if format.get('height') is not None:
if format.get('width') is not None:
res = '%sx%s' % (format['width'], format['height'])
else:
res = '%sp' % format['height']
elif format.get('width') is not None:
res = '%dx?' % format['width']
else:
res = default
@@ -2996,7 +2951,7 @@ class YoutubeDL(object):
thumb_ext = determine_ext(t['url'], 'jpg')
suffix = '%s.' % t['id'] if multiple else ''
thumb_display_id = '%s ' % t['id'] if multiple else ''
t['filepath'] = thumb_filename = replace_extension(filename, suffix + thumb_ext, info_dict.get('ext'))
t['filename'] = thumb_filename = replace_extension(filename, suffix + thumb_ext, info_dict.get('ext'))
if not self.params.get('overwrites', True) and os.path.exists(encodeFilename(thumb_filename)):
ret.append(suffix + thumb_ext)

View File

@@ -180,8 +180,6 @@ def _real_main(argv=None):
if opts.overwrites:
# --yes-overwrites implies --no-continue
opts.continue_dl = False
if opts.concurrent_fragment_downloads <= 0:
raise ValueError('Concurrent fragments must be positive')
def parse_retries(retries, name=''):
if retries in ('inf', 'infinite'):
@@ -279,14 +277,9 @@ def _real_main(argv=None):
def report_conflict(arg1, arg2):
write_string('WARNING: %s is ignored since %s was given\n' % (arg2, arg1), out=sys.stderr)
if opts.remuxvideo and opts.recodevideo:
report_conflict('--recode-video', '--remux-video')
opts.remuxvideo = False
if opts.sponskrub_cut and opts.split_chapters and opts.sponskrub is not False:
report_conflict('--split-chapter', '--sponskrub-cut')
opts.sponskrub_cut = False
if opts.allow_unplayable_formats:
if opts.extractaudio:
report_conflict('--allow-unplayable-formats', '--extract-audio')
@@ -368,9 +361,19 @@ def _real_main(argv=None):
# this was the old behaviour if only --all-sub was given.
if opts.allsubtitles and not opts.writeautomaticsub:
opts.writesubtitles = True
# This should be above EmbedThumbnail since sponskrub removes the thumbnail attachment
# but must be below EmbedSubtitle and FFmpegMetadata
# See https://github.com/yt-dlp/yt-dlp/issues/204 , https://github.com/faissaloo/SponSkrub/issues/29
if opts.embedthumbnail:
already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
postprocessors.append({
'key': 'EmbedThumbnail',
'already_have_thumbnail': already_have_thumbnail
})
if not already_have_thumbnail:
opts.writethumbnail = True
# XAttrMetadataPP should be run after post-processors that may change file
# contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# This should be below all ffmpeg PP because it may cut parts out from the video
# If opts.sponskrub is None, sponskrub is used, but it silently fails if the executable can't be found
if opts.sponskrub is not False:
postprocessors.append({
@@ -381,19 +384,6 @@ def _real_main(argv=None):
'force': opts.sponskrub_force,
'ignoreerror': opts.sponskrub is None,
})
if opts.embedthumbnail:
already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
postprocessors.append({
'key': 'EmbedThumbnail',
'already_have_thumbnail': already_have_thumbnail
})
if not already_have_thumbnail:
opts.writethumbnail = True
if opts.split_chapters:
postprocessors.append({'key': 'FFmpegSplitChapters'})
# XAttrMetadataPP should be run after post-processors that may change file contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# ExecAfterDownload must be the last PP
if opts.exec_cmd:
postprocessors.append({
@@ -473,7 +463,6 @@ def _real_main(argv=None):
'extractor_retries': opts.extractor_retries,
'skip_unavailable_fragments': opts.skip_unavailable_fragments,
'keep_fragments': opts.keep_fragments,
'concurrent_fragment_downloads': opts.concurrent_fragment_downloads,
'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer,
'http_chunk_size': opts.http_chunk_size,
@@ -493,7 +482,6 @@ def _real_main(argv=None):
'writeannotations': opts.writeannotations,
'writeinfojson': opts.writeinfojson,
'allow_playlist_files': opts.allow_playlist_files,
'clean_infojson': opts.clean_infojson,
'getcomments': opts.getcomments,
'writethumbnail': opts.writethumbnail,
'write_all_thumbnails': opts.write_all_thumbnails,

View File

@@ -326,12 +326,6 @@ class FileDownloader(object):
"""Report it was impossible to resume download."""
self.to_screen('[download] Unable to resume')
@staticmethod
def supports_manifest(manifest):
""" Whether the downloader can download the fragments from the manifest.
Redefine in subclasses if needed. """
pass
def download(self, filename, info_dict, subtitle=False):
"""Download to a filename using the info from info_dict
Return True on success and False otherwise

View File

@@ -1,26 +1,18 @@
from __future__ import unicode_literals
try:
import concurrent.futures
can_threaded_download = True
except ImportError:
can_threaded_download = False
from ..downloader import _get_real_downloader
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import (
DownloadError,
sanitize_open,
urljoin,
)
class DashSegmentsFD(FragmentFD):
"""
Download segments in a DASH manifest. External downloaders can take over
the fragment downloads by supporting the 'frag_urls' protocol
Download segments in a DASH manifest
"""
FD_NAME = 'dashsegments'
@@ -45,7 +37,7 @@ class DashSegmentsFD(FragmentFD):
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
fragments_to_download = []
fragment_urls = []
frag_index = 0
for i, fragment in enumerate(fragments):
frag_index += 1
@@ -56,17 +48,49 @@ class DashSegmentsFD(FragmentFD):
assert fragment_base_url
fragment_url = urljoin(fragment_base_url, fragment['path'])
fragments_to_download.append({
'frag_index': frag_index,
'index': i,
'url': fragment_url,
})
if real_downloader:
fragment_urls.append(fragment_url)
continue
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
count = 0
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict)
if not success:
return False
self._append_fragment(ctx, frag_content)
break
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
except DownloadError:
# Don't retry fragment if error occurred during HTTP downloading
# itself since it has own retry settings
if not fatal:
self.report_skip_fragment(frag_index)
break
raise
if count > fragment_retries:
if not fatal:
self.report_skip_fragment(frag_index)
continue
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False
if real_downloader:
self.to_screen(
'[%s] Fragment downloads will be delegated to %s' % (self.FD_NAME, real_downloader.get_basename()))
info_copy = info_dict.copy()
info_copy['fragments'] = fragments_to_download
info_copy['url_list'] = fragment_urls
fd = real_downloader(self.ydl, self.params)
# TODO: Make progress updates work without hooking twice
# for ph in self._progress_hooks:
@@ -75,104 +99,5 @@ class DashSegmentsFD(FragmentFD):
if not success:
return False
else:
def download_fragment(fragment):
i = fragment['index']
frag_index = fragment['frag_index']
fragment_url = fragment['url']
ctx['fragment_index'] = frag_index
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
count = 0
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict)
if not success:
return False, frag_index
break
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
except DownloadError:
# Don't retry fragment if error occurred during HTTP downloading
# itself since it has own retry settings
if not fatal:
break
raise
if count > fragment_retries:
if not fatal:
return False, frag_index
self.report_error('Giving up after %s fragment retries' % fragment_retries)
return False, frag_index
return frag_content, frag_index
def append_fragment(frag_content, frag_index):
if frag_content:
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], frag_index)
try:
file, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
file.close()
self._append_fragment(ctx, frag_content)
return True
except FileNotFoundError:
if skip_unavailable_fragments:
self.report_skip_fragment(frag_index)
return True
else:
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
else:
if skip_unavailable_fragments:
self.report_skip_fragment(frag_index)
return True
else:
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
max_workers = self.params.get('concurrent_fragment_downloads', 1)
if can_threaded_download and max_workers > 1:
self.report_warning('The download speed shown is only of one thread. This is a known issue')
with concurrent.futures.ThreadPoolExecutor(max_workers) as pool:
futures = [pool.submit(download_fragment, fragment) for fragment in fragments_to_download]
# timeout must be 0 to return instantly
done, not_done = concurrent.futures.wait(futures, timeout=0)
try:
while not_done:
# Check every 1 second for KeyboardInterrupt
freshly_done, not_done = concurrent.futures.wait(not_done, timeout=1)
done |= freshly_done
except KeyboardInterrupt:
for future in not_done:
future.cancel()
# timeout must be none to cancel
concurrent.futures.wait(not_done, timeout=None)
raise KeyboardInterrupt
results = [future.result() for future in futures]
for frag_content, frag_index in results:
result = append_fragment(frag_content, frag_index)
if not result:
return False
else:
for fragment in fragments_to_download:
frag_content, frag_index = download_fragment(fragment)
result = append_fragment(frag_content, frag_index)
if not result:
return False
self._finish_frag_download(ctx)
return True

View File

@@ -24,6 +24,7 @@ from ..utils import (
cli_bool_option,
cli_configuration_args,
encodeFilename,
error_to_compat_str,
encodeArgument,
handle_youtubedl_headers,
check_executable,
@@ -107,8 +108,7 @@ class ExternalFD(FileDownloader):
def _configuration_args(self, *args, **kwargs):
return cli_configuration_args(
self.params.get('external_downloader_args'),
[self.get_basename(), 'default'],
*args, **kwargs)
self.get_basename(), *args, **kwargs)
def _call_downloader(self, tmpfilename, info_dict):
""" Either overwrite this or implement _make_cmd """
@@ -116,43 +116,24 @@ class ExternalFD(FileDownloader):
self._debug_cmd(cmd)
if 'fragments' in info_dict:
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
count = 0
while count <= fragment_retries:
p = subprocess.Popen(
cmd, stderr=subprocess.PIPE)
_, stderr = process_communicate_or_kill(p)
if p.returncode == 0:
break
# TODO: Decide whether to retry based on error code
# https://aria2.github.io/manual/en/html/aria2c.html#exit-status
self.to_stderr(stderr.decode('utf-8', 'replace'))
count += 1
if count <= fragment_retries:
self.to_screen(
'[%s] Got error. Retrying fragments (attempt %d of %s)...'
% (self.get_basename(), count, self.format_retries(fragment_retries)))
if count > fragment_retries:
if not skip_unavailable_fragments:
self.report_error('Giving up after %s fragment retries' % fragment_retries)
return -1
p = subprocess.Popen(
cmd, stderr=subprocess.PIPE)
_, stderr = process_communicate_or_kill(p)
if p.returncode != 0:
self.to_stderr(stderr.decode('utf-8', 'replace'))
if 'url_list' in info_dict:
file_list = []
for [i, url] in enumerate(info_dict['url_list']):
tmpsegmentname = '%s_%s.frag' % (tmpfilename, i)
file_list.append(tmpsegmentname)
key_list = info_dict.get('key_list')
decrypt_info = None
dest, _ = sanitize_open(tmpfilename, 'wb')
for frag_index, fragment in enumerate(info_dict['fragments']):
fragment_filename = '%s-Frag%d' % (tmpfilename, frag_index)
try:
src, _ = sanitize_open(fragment_filename, 'rb')
except IOError:
if skip_unavailable_fragments and frag_index > 1:
self.to_screen('[%s] Skipping fragment %d ...' % (self.get_basename(), frag_index))
continue
self.report_error('Unable to open fragment %d' % frag_index)
return -1
decrypt_info = fragment.get('decrypt_info')
if decrypt_info:
for i, file in enumerate(file_list):
src, _ = sanitize_open(file, 'rb')
if key_list:
decrypt_info = next((x for x in key_list if x['INDEX'] == i), decrypt_info)
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV')
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
@@ -168,16 +149,19 @@ class ExternalFD(FileDownloader):
fragment_data = src.read()
dest.write(fragment_data)
src.close()
if not self.params.get('keep_fragments', False):
os.remove(encodeFilename(fragment_filename))
dest.close()
os.remove(encodeFilename('%s.frag.urls' % tmpfilename))
else:
p = subprocess.Popen(
cmd, stderr=subprocess.PIPE)
_, stderr = process_communicate_or_kill(p)
if p.returncode != 0:
self.to_stderr(stderr.decode('utf-8', 'replace'))
if not self.params.get('keep_fragments', False):
for file_path in file_list:
try:
os.remove(file_path)
except OSError as ose:
self.report_error("Unable to delete file %s; %s" % (file_path, error_to_compat_str(ose)))
try:
file_path = '%s.frag.urls' % tmpfilename
os.remove(file_path)
except OSError as ose:
self.report_error("Unable to delete file %s; %s" % (file_path, error_to_compat_str(ose)))
return p.returncode
def _prepare_url(self, info_dict, url):
@@ -261,22 +245,15 @@ class Aria2cFD(ExternalFD):
AVAILABLE_OPT = '-v'
SUPPORTED_PROTOCOLS = ('http', 'https', 'ftp', 'ftps', 'frag_urls')
@staticmethod
def supports_manifest(manifest):
UNSUPPORTED_FEATURES = [
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [1]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
]
check_results = (not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES)
return all(check_results)
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-c',
'--console-log-level=warn', '--summary-interval=0', '--download-result=hide',
'--file-allocation=none', '-x16', '-j16', '-s16']
if 'fragments' in info_dict:
cmd += ['--allow-overwrite=true', '--allow-piece-length-change=true']
cmd = [self.exe, '-c']
dn = os.path.dirname(tmpfilename)
if 'url_list' not in info_dict:
cmd += ['--out', os.path.basename(tmpfilename)]
verbose_level_args = ['--console-log-level=warn', '--summary-interval=0']
cmd += self._configuration_args(['--file-allocation=none', '-x16', '-j16', '-s16'] + verbose_level_args)
if dn:
cmd += ['--dir', dn]
if info_dict.get('http_headers') is not None:
for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)]
@@ -284,25 +261,19 @@ class Aria2cFD(ExternalFD):
cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=')
cmd += self._configuration_args()
dn = os.path.dirname(tmpfilename)
if dn:
cmd += ['--dir', dn]
if 'fragments' not in info_dict:
cmd += ['--out', os.path.basename(tmpfilename)]
cmd += ['--auto-file-renaming=false']
if 'fragments' in info_dict:
cmd += ['--file-allocation=none', '--uri-selector=inorder']
if 'url_list' in info_dict:
cmd += verbose_level_args
cmd += ['--uri-selector', 'inorder', '--download-result=hide']
url_list_file = '%s.frag.urls' % tmpfilename
url_list = []
for frag_index, fragment in enumerate(info_dict['fragments']):
fragment_filename = '%s-Frag%d' % (os.path.basename(tmpfilename), frag_index)
url_list.append('%s\n\tout=%s' % (fragment['url'], fragment_filename))
for [i, url] in enumerate(info_dict['url_list']):
tmpsegmentname = '%s_%s.frag' % (os.path.basename(tmpfilename), i)
url_list.append('%s\n\tout=%s' % (url, tmpsegmentname))
stream, _ = sanitize_open(url_list_file, 'wb')
stream.write('\n'.join(url_list).encode('utf-8'))
stream.close()
cmd += ['-i', url_list_file]
else:
cmd += ['--', info_dict['url']]
@@ -311,8 +282,8 @@ class Aria2cFD(ExternalFD):
class HttpieFD(ExternalFD):
@classmethod
def available(cls, path=None):
return check_executable(path or 'http', ['--version'])
def available(cls):
return check_executable('http', ['--version'])
def _make_cmd(self, tmpfilename, info_dict):
cmd = ['http', '--download', '--output', tmpfilename, info_dict['url']]
@@ -327,7 +298,7 @@ class FFmpegFD(ExternalFD):
SUPPORTED_PROTOCOLS = ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms')
@classmethod
def available(cls, path=None): # path is ignored for ffmpeg
def available(cls):
return FFmpegPostProcessor().available
def _call_downloader(self, tmpfilename, info_dict):

View File

@@ -7,11 +7,6 @@ try:
can_decrypt_frag = True
except ImportError:
can_decrypt_frag = False
try:
import concurrent.futures
can_threaded_download = True
except ImportError:
can_threaded_download = False
from ..downloader import _get_real_downloader
from .fragment import FragmentFD
@@ -24,17 +19,12 @@ from ..compat import (
)
from ..utils import (
parse_m3u8_attributes,
sanitize_open,
update_url_query,
)
class HlsFD(FragmentFD):
"""
Download segments in a m3u8 manifest. External downloaders can take over
the fragment downloads by supporting the 'frag_urls' protocol and
re-defining 'supports_manifest' function
"""
""" A limited implementation that does not require ffmpeg """
FD_NAME = 'hlsnative'
@@ -63,15 +53,12 @@ class HlsFD(FragmentFD):
UNSUPPORTED_FEATURES += [
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
]
def check_results():
yield not info_dict.get('is_live')
is_aes128_enc = '#EXT-X-KEY:METHOD=AES-128' in manifest
yield with_crypto or not is_aes128_enc
yield not (is_aes128_enc and r'#EXT-X-BYTERANGE' in manifest)
for feature in UNSUPPORTED_FEATURES:
yield not re.search(feature, manifest)
return all(check_results())
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
is_aes128_enc = '#EXT-X-KEY:METHOD=AES-128' in manifest
check_results.append(with_crypto or not is_aes128_enc)
check_results.append(not (is_aes128_enc and r'#EXT-X-BYTERANGE' in manifest))
check_results.append(not info_dict.get('is_live'))
return all(check_results)
def real_download(self, filename, info_dict):
man_url = info_dict['url']
@@ -83,24 +70,20 @@ class HlsFD(FragmentFD):
if not self.can_download(s, info_dict, self.params.get('allow_unplayable_formats')):
if info_dict.get('extra_param_to_segment_url') or info_dict.get('_decryption_key_url'):
self.report_error('pycryptodome not found. Please install')
self.report_error('pycryptodome not found. Please install it.')
return False
if self.can_download(s, info_dict, with_crypto=True):
self.report_warning('pycryptodome is needed to download this file natively')
fd = FFmpegFD(self.ydl, self.params)
self.report_warning('pycryptodome is needed to download this file with hlsnative')
self.report_warning(
'%s detected unsupported features; extraction will be delegated to %s' % (self.FD_NAME, fd.get_basename()))
'hlsnative has detected features it does not support, '
'extraction will be delegated to ffmpeg')
fd = FFmpegFD(self.ydl, self.params)
# TODO: Make progress updates work without hooking twice
# for ph in self._progress_hooks:
# fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict)
real_downloader = _get_real_downloader(info_dict, 'frag_urls', self.params, None)
if real_downloader and not real_downloader.supports_manifest(s):
real_downloader = None
if real_downloader:
self.to_screen(
'[%s] Fragment downloads will be delegated to %s' % (self.FD_NAME, real_downloader.get_basename()))
def is_ad_fragment_start(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s
@@ -110,7 +93,7 @@ class HlsFD(FragmentFD):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s
or s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment'))
fragments = []
fragment_urls = []
media_frags = 0
ad_frags = 0
@@ -153,12 +136,14 @@ class HlsFD(FragmentFD):
i = 0
media_sequence = 0
decrypt_info = {'METHOD': 'NONE'}
key_list = []
byte_range = {}
discontinuity_count = 0
frag_index = 0
ad_frag_next = False
for line in s.splitlines():
line = line.strip()
download_frag = False
if line:
if not line.startswith('#'):
if format_index and discontinuity_count != format_index:
@@ -175,20 +160,17 @@ class HlsFD(FragmentFD):
if extra_query:
frag_url = update_url_query(frag_url, extra_query)
fragments.append({
'frag_index': frag_index,
'url': frag_url,
'decrypt_info': decrypt_info,
'byte_range': byte_range,
'media_sequence': media_sequence,
})
if real_downloader:
fragment_urls.append(frag_url)
continue
download_frag = True
elif line.startswith('#EXT-X-MAP'):
if format_index and discontinuity_count != format_index:
continue
if frag_index > 0:
self.report_error(
'Initialization fragment found after media fragments, unable to download')
'initialization fragment found after media fragments, unable to download')
return False
frag_index += 1
map_info = parse_m3u8_attributes(line[11:])
@@ -198,14 +180,9 @@ class HlsFD(FragmentFD):
else compat_urlparse.urljoin(man_url, map_info.get('URI')))
if extra_query:
frag_url = update_url_query(frag_url, extra_query)
fragments.append({
'frag_index': frag_index,
'url': frag_url,
'decrypt_info': decrypt_info,
'byte_range': byte_range,
'media_sequence': media_sequence
})
if real_downloader:
fragment_urls.append(frag_url)
continue
if map_info.get('BYTERANGE'):
splitted_byte_range = map_info.get('BYTERANGE').split('@')
@@ -214,6 +191,7 @@ class HlsFD(FragmentFD):
'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]),
}
download_frag = True
elif line.startswith('#EXT-X-KEY'):
decrypt_url = decrypt_info.get('URI')
@@ -228,6 +206,9 @@ class HlsFD(FragmentFD):
decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_query)
if decrypt_url != decrypt_info['URI']:
decrypt_info['KEY'] = None
key_data = decrypt_info.copy()
key_data['INDEX'] = frag_index
key_list.append(key_data)
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:])
@@ -244,16 +225,58 @@ class HlsFD(FragmentFD):
ad_frag_next = False
elif line.startswith('#EXT-X-DISCONTINUITY'):
discontinuity_count += 1
i += 1
media_sequence += 1
# We only download the first fragment during the test
if test:
fragments = [fragments[0] if fragments else None]
if download_frag:
count = 0
headers = info_dict.get('http_headers', {})
if byte_range:
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'] - 1)
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(
ctx, frag_url, info_dict, headers)
if not success:
return False
break
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/ytdl-org/youtube-dl/issues/10165,
# https://github.com/ytdl-org/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
if count > fragment_retries:
if skip_unavailable_fragments:
i += 1
media_sequence += 1
self.report_skip_fragment(frag_index)
continue
self.report_error(
'giving up after %s fragment retries' % fragment_retries)
return False
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
# Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
# size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if not test:
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
self._append_fragment(ctx, frag_content)
# We only download the first fragment during the test
if test:
break
i += 1
media_sequence += 1
if real_downloader:
info_copy = info_dict.copy()
info_copy['fragments'] = fragments
info_copy['url_list'] = fragment_urls
info_copy['key_list'] = key_list
fd = real_downloader(self.ydl, self.params)
# TODO: Make progress updates work without hooking twice
# for ph in self._progress_hooks:
@@ -262,107 +285,5 @@ class HlsFD(FragmentFD):
if not success:
return False
else:
def download_fragment(fragment):
frag_index = fragment['frag_index']
frag_url = fragment['url']
decrypt_info = fragment['decrypt_info']
byte_range = fragment['byte_range']
media_sequence = fragment['media_sequence']
ctx['fragment_index'] = frag_index
count = 0
headers = info_dict.get('http_headers', {})
if byte_range:
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'] - 1)
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(
ctx, frag_url, info_dict, headers)
if not success:
return False, frag_index
break
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/ytdl-org/youtube-dl/issues/10165,
# https://github.com/ytdl-org/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
if count > fragment_retries:
self.report_error('Giving up after %s fragment retries' % fragment_retries)
return False, frag_index
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
# Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
# size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if not test:
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
return frag_content, frag_index
def append_fragment(frag_content, frag_index):
if frag_content:
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], frag_index)
try:
file, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
file.close()
self._append_fragment(ctx, frag_content)
return True
except FileNotFoundError:
if skip_unavailable_fragments:
self.report_skip_fragment(frag_index)
return True
else:
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
else:
if skip_unavailable_fragments:
self.report_skip_fragment(frag_index)
return True
else:
self.report_error(
'fragment %s not found, unable to continue' % frag_index)
return False
max_workers = self.params.get('concurrent_fragment_downloads', 1)
if can_threaded_download and max_workers > 1:
self.report_warning('The download speed shown is only of one thread. This is a known issue')
with concurrent.futures.ThreadPoolExecutor(max_workers) as pool:
futures = [pool.submit(download_fragment, fragment) for fragment in fragments]
# timeout must be 0 to return instantly
done, not_done = concurrent.futures.wait(futures, timeout=0)
try:
while not_done:
# Check every 1 second for KeyboardInterrupt
freshly_done, not_done = concurrent.futures.wait(not_done, timeout=1)
done |= freshly_done
except KeyboardInterrupt:
for future in not_done:
future.cancel()
# timeout must be none to cancel
concurrent.futures.wait(not_done, timeout=None)
raise KeyboardInterrupt
results = [future.result() for future in futures]
for frag_content, frag_index in results:
result = append_fragment(frag_content, frag_index)
if not result:
return False
else:
for fragment in fragments:
frag_content, frag_index = download_fragment(fragment)
result = append_fragment(frag_content, frag_index)
if not result:
return False
self._finish_frag_download(ctx)
return True

View File

@@ -117,7 +117,7 @@ class RtmpFD(FileDownloader):
# Check for rtmpdump first
if not check_executable('rtmpdump', ['-h']):
self.report_error('RTMP download detected but "rtmpdump" could not be run. Please install')
self.report_error('RTMP download detected but "rtmpdump" could not be run. Please install it.')
return False
# Download using rtmpdump. rtmpdump returns exit code 2 when

View File

@@ -24,7 +24,7 @@ class RtspFD(FileDownloader):
args = [
'mpv', '-really-quiet', '--vo=null', '--stream-dump=' + tmpfilename, url]
else:
self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install one')
self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.')
return False
self._debug_cmd(args)

View File

@@ -79,7 +79,8 @@ class YoutubeLiveChatReplayFD(FragmentFD):
self._prepare_and_start_frag_download(ctx)
success, raw_fragment = dl_fragment(info_dict['url'])
success, raw_fragment = dl_fragment(
'https://www.youtube.com/watch?v={}'.format(video_id))
if not success:
return False
try:

View File

@@ -65,35 +65,15 @@ class AMCNetworksIE(ThePlatformIE):
def _real_extract(self, url):
site, display_id = re.match(self._VALID_URL, url).groups()
requestor_id = self._REQUESTOR_ID_MAP[site]
page_data = self._download_json(
'https://content-delivery-gw.svc.ds.amcn.com/api/v2/content/amcn/%s/url/%s'
% (requestor_id.lower(), display_id), display_id)['data']
properties = page_data.get('properties') or {}
properties = self._download_json(
'https://content-delivery-gw.svc.ds.amcn.com/api/v2/content/amcn/%s/url/%s' % (requestor_id.lower(), display_id),
display_id)['data']['properties']
query = {
'mbr': 'true',
'manifest': 'm3u',
}
video_player_count = 0
try:
for v in page_data['children']:
if v.get('type') == 'video-player':
releasePid = v['properties']['currentVideo']['meta']['releasePid']
tp_path = 'M_UwQC/' + releasePid
media_url = 'https://link.theplatform.com/s/' + tp_path
video_player_count += 1
except KeyError:
pass
if video_player_count > 1:
self.report_warning(
'The JSON data has %d video players. Only one will be extracted' % video_player_count)
# Fall back to videoPid if releasePid not found.
# TODO: Fall back to videoPid if releasePid manifest uses DRM.
if not video_player_count:
tp_path = 'M_UwQC/media/' + properties['videoPid']
media_url = 'https://link.theplatform.com/s/' + tp_path
tp_path = 'M_UwQC/media/' + properties['videoPid']
media_url = 'https://link.theplatform.com/s/' + tp_path
theplatform_metadata = self._download_theplatform_metadata(tp_path, display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
@@ -110,41 +90,30 @@ class AMCNetworksIE(ThePlatformIE):
formats, subtitles = self._extract_theplatform_smil(
media_url, video_id)
self._sort_formats(formats)
thumbnails = []
thumbnail_urls = [properties.get('imageDesktop')]
if 'thumbnail' in info:
thumbnail_urls.append(info.pop('thumbnail'))
for thumbnail_url in thumbnail_urls:
if not thumbnail_url:
continue
mobj = re.search(r'(\d+)x(\d+)', thumbnail_url)
thumbnails.append({
'url': thumbnail_url,
'width': int(mobj.group(1)) if mobj else None,
'height': int(mobj.group(2)) if mobj else None,
})
info.update({
'age_limit': parse_age_limit(rating),
'formats': formats,
'id': video_id,
'subtitles': subtitles,
'thumbnails': thumbnails,
'formats': formats,
'age_limit': parse_age_limit(parse_age_limit(rating)),
})
ns_keys = theplatform_metadata.get('$xmlns', {}).keys()
if ns_keys:
ns = list(ns_keys)[0]
episode = theplatform_metadata.get(ns + '$episodeTitle') or None
episode_number = int_or_none(
theplatform_metadata.get(ns + '$episode'))
series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(
theplatform_metadata.get(ns + '$season'))
series = theplatform_metadata.get(ns + '$show') or None
episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(
theplatform_metadata.get(ns + '$episode'))
if season_number:
title = 'Season %d - %s' % (season_number, title)
if series:
title = '%s - %s' % (series, title)
info.update({
'title': title,
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
'season_number': season_number,
'series': series,
})
return info

View File

@@ -42,7 +42,6 @@ class ApplePodcastsIE(InfoExtractor):
ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id)
ember_data = ember_data.get(episode_id) or ember_data
episode = ember_data['data']['attributes']
description = episode.get('description') or {}

View File

@@ -272,8 +272,7 @@ class ARDMediathekIE(ARDMediathekBaseIE):
else: # request JSON file
if not document_id:
video_id = self._search_regex(
(r'/play/(?:config|media|sola)/(\d+)', r'contentId["\']\s*:\s*(\d+)'),
webpage, 'media id', default=None)
r'/play/(?:config|media)/(\d+)', webpage, 'media id')
info = self._extract_media_info(
'http://www.ardmediathek.de/play/media/%s' % video_id,
webpage, video_id)

View File

@@ -1,101 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
remove_start,
)
class ArnesIE(InfoExtractor):
IE_NAME = 'video.arnes.si'
IE_DESC = 'Arnes Video'
_VALID_URL = r'https?://video\.arnes\.si/(?:[a-z]{2}/)?(?:watch|embed|api/(?:asset|public/video))/(?P<id>[0-9a-zA-Z]{12})'
_TESTS = [{
'url': 'https://video.arnes.si/watch/a1qrWTOQfVoU?t=10',
'md5': '4d0f4d0a03571b33e1efac25fd4a065d',
'info_dict': {
'id': 'a1qrWTOQfVoU',
'ext': 'mp4',
'title': 'Linearna neodvisnost, definicija',
'description': 'Linearna neodvisnost, definicija',
'license': 'PRIVATE',
'creator': 'Polona Oblak',
'timestamp': 1585063725,
'upload_date': '20200324',
'channel': 'Polona Oblak',
'channel_id': 'q6pc04hw24cj',
'channel_url': 'https://video.arnes.si/?channel=q6pc04hw24cj',
'duration': 596.75,
'view_count': int,
'tags': ['linearna_algebra'],
'start_time': 10,
}
}, {
'url': 'https://video.arnes.si/api/asset/s1YjnV7hadlC/play.mp4',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/en/watch/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC?t=123&hideRelated=1',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/api/public/video/s1YjnV7hadlC',
'only_matching': True,
}]
_BASE_URL = 'https://video.arnes.si'
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
self._BASE_URL + '/api/public/video/' + video_id, video_id)['data']
title = video['title']
formats = []
for media in (video.get('media') or []):
media_url = media.get('url')
if not media_url:
continue
formats.append({
'url': self._BASE_URL + media_url,
'format_id': remove_start(media.get('format'), 'FORMAT_'),
'format_note': media.get('formatTranslation'),
'width': int_or_none(media.get('width')),
'height': int_or_none(media.get('height')),
})
self._sort_formats(formats)
channel = video.get('channel') or {}
channel_id = channel.get('url')
thumbnail = video.get('thumbnailUrl')
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': self._BASE_URL + thumbnail,
'description': video.get('description'),
'license': video.get('license'),
'creator': video.get('author'),
'timestamp': parse_iso8601(video.get('creationTime')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': self._BASE_URL + '/?channel=' + channel_id if channel_id else None,
'duration': float_or_none(video.get('duration'), 1000),
'view_count': int_or_none(video.get('views')),
'tags': video.get('hashtags'),
'start_time': int_or_none(compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('t', [None])[0]),
}

View File

@@ -49,7 +49,6 @@ class BandcampIE(InfoExtractor):
'uploader': 'Ben Prunty',
'timestamp': 1396508491,
'upload_date': '20140403',
'release_timestamp': 1396483200,
'release_date': '20140403',
'duration': 260.877,
'track': 'Lanius (Battle)',
@@ -70,7 +69,6 @@ class BandcampIE(InfoExtractor):
'uploader': 'Mastodon',
'timestamp': 1322005399,
'upload_date': '20111122',
'release_timestamp': 1076112000,
'release_date': '20040207',
'duration': 120.79,
'track': 'Hail to Fire',
@@ -199,7 +197,7 @@ class BandcampIE(InfoExtractor):
'thumbnail': thumbnail,
'uploader': artist,
'timestamp': timestamp,
'release_timestamp': unified_timestamp(tralbum.get('album_release_date')),
'release_date': unified_strdate(tralbum.get('album_release_date')),
'duration': duration,
'track': track,
'track_number': track_number,

View File

@@ -1,22 +1,17 @@
# coding: utf-8
from __future__ import unicode_literals
import functools
import itertools
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_parse_qs,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
OnDemandPagedList,
clean_html,
dict_get,
float_or_none,
@@ -816,7 +811,7 @@ class BBCIE(BBCCoUkIE):
@classmethod
def suitable(cls, url):
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerEpisodesIE, BBCCoUkIPlayerGroupIE, BBCCoUkPlaylistIE)
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url))
@@ -1343,149 +1338,21 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
playlist_id, title, description)
class BBCCoUkIPlayerPlaylistBaseIE(InfoExtractor):
_VALID_URL_TMPL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/%%s/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
@staticmethod
def _get_default(episode, key, default_key='default'):
return try_get(episode, lambda x: x[key][default_key])
def _get_description(self, data):
synopsis = data.get(self._DESCRIPTION_KEY) or {}
return dict_get(synopsis, ('large', 'medium', 'small'))
def _fetch_page(self, programme_id, per_page, series_id, page):
elements = self._get_elements(self._call_api(
programme_id, per_page, page + 1, series_id))
for element in elements:
episode = self._get_episode(element)
episode_id = episode.get('id')
if not episode_id:
continue
thumbnail = None
image = self._get_episode_image(episode)
if image:
thumbnail = image.replace('{recipe}', 'raw')
category = self._get_default(episode, 'labels', 'category')
yield {
'_type': 'url',
'id': episode_id,
'title': self._get_episode_field(episode, 'subtitle'),
'url': 'https://www.bbc.co.uk/iplayer/episode/' + episode_id,
'thumbnail': thumbnail,
'description': self._get_description(episode),
'categories': [category] if category else None,
'series': self._get_episode_field(episode, 'title'),
'ie_key': BBCCoUkIE.ie_key(),
}
def _real_extract(self, url):
pid = self._match_id(url)
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
series_id = qs.get('seriesId', [None])[0]
page = qs.get('page', [None])[0]
per_page = 36 if page else self._PAGE_SIZE
fetch_page = functools.partial(self._fetch_page, pid, per_page, series_id)
entries = fetch_page(int(page) - 1) if page else OnDemandPagedList(fetch_page, self._PAGE_SIZE)
playlist_data = self._get_playlist_data(self._call_api(pid, 1))
return self.playlist_result(
entries, pid, self._get_playlist_title(playlist_data),
self._get_description(playlist_data))
class BBCCoUkIPlayerEpisodesIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:episodes'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'episodes'
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
_TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance',
'description': 'md5:58eb101aee3116bad4da05f91179c0cb',
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 8,
'playlist_mincount': 6,
'skip': 'This programme is not currently available on BBC iPlayer',
}, {
# all seasons
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 10,
}, {
# explicit season
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster?seriesId=b094m6nv',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 5,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 37,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove?page=2',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 1,
}]
_PAGE_SIZE = 100
_DESCRIPTION_KEY = 'synopsis'
def _get_episode_image(self, episode):
return self._get_default(episode, 'image')
def _get_episode_field(self, episode, field):
return self._get_default(episode, field)
@staticmethod
def _get_elements(data):
return data['entities']['results']
@staticmethod
def _get_episode(element):
return element.get('episode') or {}
def _call_api(self, pid, per_page, page=1, series_id=None):
variables = {
'id': pid,
'page': page,
'perPage': per_page,
}
if series_id:
variables['sliceId'] = series_id
return self._download_json(
'https://graph.ibl.api.bbc.co.uk/', pid, headers={
'Content-Type': 'application/json'
}, data=json.dumps({
'id': '5692d93d5aac8d796a0305e895e61551',
'variables': variables,
}).encode('utf-8'))['data']['programme']
@staticmethod
def _get_playlist_data(data):
return data
def _get_playlist_title(self, data):
return self._get_default(data, 'title')
class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:group'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'group'
_TESTS = [{
# Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': {
@@ -1494,56 +1361,14 @@ class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
},
'playlist_mincount': 10,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 47,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7?page=2',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 11,
}]
_PAGE_SIZE = 200
_DESCRIPTION_KEY = 'synopses'
def _get_episode_image(self, episode):
return self._get_default(episode, 'images', 'standard')
def _get_episode_field(self, episode, field):
return episode.get(field)
@staticmethod
def _get_elements(data):
return data['elements']
@staticmethod
def _get_episode(element):
return element
def _call_api(self, pid, per_page, page=1, series_id=None):
return self._download_json(
'http://ibl.api.bbc.co.uk/ibl/v1/groups/%s/episodes' % pid,
pid, query={
'page': page,
'per_page': per_page,
})['group_episodes']
@staticmethod
def _get_playlist_data(data):
return data['group']
def _get_playlist_title(self, data):
return data.get('title')
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
webpage, 'description', fatal=False, group='value')
return title, description
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):

View File

@@ -7,7 +7,6 @@ import re
from .common import InfoExtractor, SearchInfoExtractor
from ..compat import (
compat_str,
compat_parse_qs,
compat_urlparse,
)
@@ -16,7 +15,6 @@ from ..utils import (
int_or_none,
float_or_none,
parse_iso8601,
try_get,
smuggle_url,
str_or_none,
strip_jsonp,
@@ -115,13 +113,6 @@ class BiliBiliIE(InfoExtractor):
# new BV video id format
'url': 'https://www.bilibili.com/video/BV1JE411F741',
'only_matching': True,
}, {
# Anthology
'url': 'https://www.bilibili.com/video/BV1bK411W797',
'info_dict': {
'id': 'BV1bK411W797',
},
'playlist_count': 17,
}]
_APP_KEY = 'iVGUTjsxvpLeuDCf'
@@ -148,19 +139,9 @@ class BiliBiliIE(InfoExtractor):
page_id = mobj.group('page')
webpage = self._download_webpage(url, video_id)
# Bilibili anthologies are similar to playlists but all videos share the same video ID as the anthology itself.
# If the video has no page argument, check to see if it's an anthology
if page_id is None:
if not self._downloader.params.get('noplaylist'):
r = self._extract_anthology_entries(bv_id, video_id, webpage)
if r is not None:
self.to_screen('Downloading anthology %s - add --no-playlist to just download video' % video_id)
return r
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
if 'anime/' not in url:
cid = self._search_regex(
r'\bcid(?:["\']:|=)(\d+),["\']page(?:["\']:|=)' + compat_str(page_id), webpage, 'cid',
r'\bcid(?:["\']:|=)(\d+),["\']page(?:["\']:|=)' + str(page_id), webpage, 'cid',
default=None
) or self._search_regex(
r'\bcid(?:["\']:|=)(\d+)', webpage, 'cid',
@@ -189,7 +170,6 @@ class BiliBiliIE(InfoExtractor):
cid = js['result']['cid']
headers = {
'Accept': 'application/json',
'Referer': url
}
headers.update(self.geo_verification_headers())
@@ -243,18 +223,7 @@ class BiliBiliIE(InfoExtractor):
title = self._html_search_regex(
(r'<h1[^>]+\btitle=(["\'])(?P<title>(?:(?!\1).)+)\1',
r'(?s)<h1[^>]*>(?P<title>.+?)</h1>'), webpage, 'title',
group='title')
# Get part title for anthologies
if page_id is not None:
# TODO: The json is already downloaded by _extract_anthology_entries. Don't redownload for each video
part_title = try_get(
self._download_json(
"https://api.bilibili.com/x/player/pagelist?bvid=%s&jsonp=jsonp" % bv_id,
video_id, note='Extracting videos in anthology'),
lambda x: x['data'][int(page_id) - 1]['part'])
title = part_title or title
group='title') + ('_p' + str(page_id) if page_id is not None else '')
description = self._html_search_meta('description', webpage)
timestamp = unified_timestamp(self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time',
@@ -264,7 +233,7 @@ class BiliBiliIE(InfoExtractor):
# TODO 'view_count' requires deobfuscating Javascript
info = {
'id': compat_str(video_id) if page_id is None else '%s_p%s' % (video_id, page_id),
'id': str(video_id) if page_id is None else '%s_p%s' % (video_id, page_id),
'cid': cid,
'title': title,
'description': description,
@@ -330,7 +299,7 @@ class BiliBiliIE(InfoExtractor):
global_info = {
'_type': 'multi_video',
'id': compat_str(video_id),
'id': video_id,
'bv_id': bv_id,
'title': title,
'description': description,
@@ -342,20 +311,6 @@ class BiliBiliIE(InfoExtractor):
return global_info
def _extract_anthology_entries(self, bv_id, video_id, webpage):
title = self._html_search_regex(
(r'<h1[^>]+\btitle=(["\'])(?P<title>(?:(?!\1).)+)\1',
r'(?s)<h1[^>]*>(?P<title>.+?)</h1>'), webpage, 'title',
group='title')
json_data = self._download_json(
"https://api.bilibili.com/x/player/pagelist?bvid=%s&jsonp=jsonp" % bv_id,
video_id, note='Extracting videos in anthology')
if len(json_data['data']) > 1:
return self.playlist_from_matches(
json_data['data'], bv_id, title, ie=BiliBiliIE.ie_key(),
getter=lambda entry: 'https://www.bilibili.com/video/%s?p=%d' % (bv_id, entry['page']))
def _get_video_id_set(self, id, is_bv):
query = {'bvid': id} if is_bv else {'aid': id}
response = self._download_json(
@@ -550,7 +505,7 @@ class BiliBiliSearchIE(SearchInfoExtractor):
videos = data['result']
for video in videos:
e = self.url_result(video['arcurl'], 'BiliBili', compat_str(video['aid']))
e = self.url_result(video['arcurl'], 'BiliBili', str(video['aid']))
entries.append(e)
if(len(entries) >= n or len(videos) >= BiliBiliSearchIE.MAX_NUMBER_OF_RESULTS):

View File

@@ -27,10 +27,10 @@ class CBSBaseIE(ThePlatformFeedIE):
class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:(?:cbs|paramountplus)\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': {
'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
'ext': 'mp4',
@@ -52,19 +52,16 @@ class CBSIE(CBSBaseIE):
}, {
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}, {
'url': 'https://www.paramountplus.com/shows/all-rise/video/QmR1WhNkh1a_IrdHZrbcRklm176X_rVc/all-rise-space/',
'only_matching': True,
}]
def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517):
items_data = self._download_xml(
'https://can.cbs.com/thunder/player/videoPlayerService.php',
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': site, 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title') or xpath_text(video_data, 'videotitle', 'title')
tp_path = 'dJ5BDC/media/guid/%d/%s' % (mpx_acc, content_id)
tp_release_url = 'https://link.theplatform.com/s/' + tp_path
tp_release_url = 'http://link.theplatform.com/s/' + tp_path
asset_types = []
subtitles = {}

View File

@@ -231,9 +231,8 @@ class InfoExtractor(object):
uploader: Full name of the video uploader.
license: License name the video is licensed under.
creator: The creator of the video.
release_timestamp: UNIX timestamp of the moment the video was released.
release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video was uploaded
timestamp: UNIX timestamp of the moment the video became available.
upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
@@ -252,8 +251,8 @@ class InfoExtractor(object):
* "data": The subtitles file contents
* "url": A URL pointing to the subtitles file
"ext" will be calculated from URL if missing
automatic_captions: Like 'subtitles'; contains automatically generated
captions instead of normal subtitles
automatic_captions: Like 'subtitles', used by the YoutubeIE for
automatically generated captions
duration: Length of the video in seconds, as an integer or float.
view_count: How many users have watched the video on the platform.
like_count: Number of positive ratings of the video
@@ -265,7 +264,6 @@ class InfoExtractor(object):
properties (all but one of text or html optional):
* "author" - human-readable name of the comment author
* "author_id" - user ID of the comment author
* "author_thumbnail" - The thumbnail of the comment author
* "id" - Comment ID
* "html" - Comment as HTML
* "text" - Plain text of the comment
@@ -273,12 +271,6 @@ class InfoExtractor(object):
* "parent" - ID of the comment this one is replying to.
Set to "root" to indicate that this is a
comment to the original video.
* "like_count" - Number of positive ratings of the comment
* "dislike_count" - Number of negative ratings of the comment
* "is_favorited" - Whether the comment is marked as
favorite by the video uploader
* "author_is_uploader" - Whether the comment is made by
the video uploader
age_limit: Age restriction for the video, as an integer (years)
webpage_url: The URL to the video webpage, if given to yt-dlp it
should allow to get the same result again. (It will be set
@@ -301,11 +293,7 @@ class InfoExtractor(object):
playable_in_embed: Whether this video is allowed to play in embedded
players on other sites. Can be True (=always allowed),
False (=never allowed), None (=unknown), or a string
specifying the criteria for embedability (Eg: 'whitelist')
availability: Under what condition the video is available. One of
'private', 'premium_only', 'subscriber_only', 'needs_auth',
'unlisted' or 'public'. Use 'InfoExtractor._availability'
to set it
specifying the criteria for embedability (Eg: 'whitelist').
__post_extractor: A function to be called just before the metadata is
written to either disk, logger or console. The function
must return a dict which will be added to the info_dict.
@@ -1398,7 +1386,7 @@ class InfoExtractor(object):
return self._hidden_inputs(form)
class FormatSort:
regex = r' *((?P<reverse>\+)?(?P<field>[a-zA-Z0-9_]+)((?P<separator>[~:])(?P<limit>.*?))?)? *$'
regex = r' *((?P<reverse>\+)?(?P<field>[a-zA-Z0-9_]+)((?P<seperator>[~:])(?P<limit>.*?))?)? *$'
default = ('hidden', 'hasvid', 'ie_pref', 'lang', 'quality',
'res', 'fps', 'codec:vp9.2', 'size', 'br', 'asr',
@@ -1421,8 +1409,8 @@ class InfoExtractor(object):
'ie_pref': {'priority': True, 'type': 'extractor'},
'hasvid': {'priority': True, 'field': 'vcodec', 'type': 'boolean', 'not_in_list': ('none',)},
'hasaud': {'field': 'acodec', 'type': 'boolean', 'not_in_list': ('none',)},
'lang': {'priority': True, 'convert': 'ignore', 'field': 'language_preference'},
'quality': {'convert': 'float_none'},
'lang': {'priority': True, 'convert': 'ignore', 'type': 'extractor', 'field': 'language_preference'},
'quality': {'convert': 'float_none', 'type': 'extractor'},
'filesize': {'convert': 'bytes'},
'fs_approx': {'convert': 'bytes', 'field': 'filesize_approx'},
'id': {'convert': 'string', 'field': 'format_id'},
@@ -1433,7 +1421,7 @@ class InfoExtractor(object):
'vbr': {'convert': 'float_none'},
'abr': {'convert': 'float_none'},
'asr': {'convert': 'float_none'},
'source': {'convert': 'ignore', 'field': 'source_preference'},
'source': {'convert': 'ignore', 'type': 'extractor', 'field': 'source_preference'},
'codec': {'type': 'combined', 'field': ('vcodec', 'acodec')},
'br': {'type': 'combined', 'field': ('tbr', 'vbr', 'abr'), 'same_limit': True},
@@ -1558,7 +1546,7 @@ class InfoExtractor(object):
if self._get_field_setting(field, 'type') == 'alias':
field = self._get_field_setting(field, 'field')
reverse = match.group('reverse') is not None
closest = match.group('separator') == '~'
closest = match.group('seperator') == '~'
limit_text = match.group('limit')
has_limit = limit_text is not None
@@ -1861,9 +1849,8 @@ class InfoExtractor(object):
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None, quality=None,
m3u8_id=None, note=None, errnote=None,
fatal=True, live=False, data=None, headers={},
query={}):
m3u8_id=None, live=False, note=None, errnote=None,
fatal=True, data=None, headers={}, query={}):
res = self._download_webpage_handle(
m3u8_url, video_id,
note=note or 'Downloading m3u8 information',
@@ -2063,11 +2050,11 @@ class InfoExtractor(object):
playlist_formats = _extract_m3u8_playlist_formats(manifest_url, video_id=video_id,
fatal=fatal, data=data, headers=headers)
for frmt in playlist_formats:
for format in playlist_formats:
format_id = []
if m3u8_id:
format_id.append(m3u8_id)
format_index = frmt.get('index')
format_index = format.get('index')
stream_name = build_stream_name()
# Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided
@@ -2122,8 +2109,6 @@ class InfoExtractor(object):
# TODO: update acodec for audio only formats with
# the same GROUP-ID
f['acodec'] = 'none'
if not f.get('ext'):
f['ext'] = 'm4a' if f.get('vcodec') == 'none' else 'mp4'
formats.append(f)
# for DailyMotion
@@ -3221,10 +3206,7 @@ class InfoExtractor(object):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """
req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req)
cookie = req.get_header('Cookie')
if cookie and sys.version_info[0] == 2:
cookie = str(cookie)
return compat_cookies.SimpleCookie(cookie)
return compat_cookies.SimpleCookie(req.get_header('Cookie'))
def _apply_first_set_cookie_header(self, url_handle, cookie):
"""
@@ -3339,20 +3321,6 @@ class InfoExtractor(object):
def _generic_title(self, url):
return compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
@staticmethod
def _availability(is_private, needs_premium, needs_subscription, needs_auth, is_unlisted):
all_known = all(map(
lambda x: x is not None,
(is_private, needs_premium, needs_subscription, needs_auth, is_unlisted)))
return (
'private' if is_private
else 'premium_only' if needs_premium
else 'subscriber_only' if needs_subscription
else 'needs_auth' if needs_auth
else 'unlisted' if is_unlisted
else 'public' if all_known
else None)
class SearchInfoExtractor(InfoExtractor):
"""

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
@@ -60,16 +58,3 @@ class MmsIE(InfoExtractor):
'title': title,
'url': url,
}
class ViewSourceIE(InfoExtractor):
IE_DESC = False
_VALID_URL = r'view-source:(?P<url>.+)'
_TEST = {
'url': 'view-source:https://www.youtube.com/watch?v=BaW_jenozKc',
'only_matching': True
}
def _real_extract(self, url):
return self.url_result(re.match(self._VALID_URL, url).group('url'))

View File

@@ -296,51 +296,6 @@ class DPlayIE(InfoExtractor):
url, display_id, host, 'dplay' + country, country)
class DiscoveryPlusIndiaIE(DPlayIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.in/videos?' + DPlayIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoveryplus.in/videos/how-do-they-do-it/fugu-and-more?seasonId=8&type=EPISODE',
'info_dict': {
'id': '27104',
'ext': 'mp4',
'display_id': 'how-do-they-do-it/fugu-and-more',
'title': 'Fugu and More',
'description': 'The Japanese catch, prepare and eat the deadliest fish on the planet.',
'duration': 1319,
'timestamp': 1582309800,
'upload_date': '20200221',
'series': 'How Do They Do It?',
'season_number': 8,
'episode_number': 2,
'creator': 'Discovery Channel',
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
'skip': 'Cookies (not necessarily logged in) are needed'
}]
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers['x-disco-params'] = 'realm=%s' % realm
headers['x-disco-client'] = 'WEB:UNKNOWN:dplus-india:17.0.0'
def _download_video_playback_info(self, disco_base, video_id, headers):
return self._download_json(
disco_base + 'playback/v3/videoPlaybackInfo',
video_id, headers=headers, data=json.dumps({
'deviceInfo': {
'adBlocker': False,
},
'videoId': video_id,
}).encode('utf-8'))['data']['attributes']['streaming']
def _real_extract(self, url):
display_id = self._match_id(url)
return self._get_disco_api_info(
url, display_id, 'ap2-prod-direct.discoveryplus.in', 'dplusindia', 'in')
class DiscoveryPlusIE(DPlayIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/video' + DPlayIE._PATH_REGEX
_TESTS = [{

View File

@@ -80,7 +80,6 @@ from .arte import (
ArteTVEmbedIE,
ArteTVPlaylistIE,
)
from .arnes import ArnesIE
from .asiancrush import (
AsianCrushIE,
AsianCrushPlaylistIE,
@@ -109,8 +108,7 @@ from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .bbc import (
BBCCoUkIE,
BBCCoUkArticleIE,
BBCCoUkIPlayerEpisodesIE,
BBCCoUkIPlayerGroupIE,
BBCCoUkIPlayerPlaylistIE,
BBCCoUkPlaylistIE,
BBCIE,
)
@@ -265,7 +263,6 @@ from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import (
MmsIE,
RtmpIE,
ViewSourceIE,
)
from .condenast import CondeNastIE
from .contv import CONtvIE
@@ -317,7 +314,6 @@ from .douyutv import (
from .dplay import (
DPlayIE,
DiscoveryPlusIE,
DiscoveryPlusIndiaIE,
HGTVDeIE,
)
from .dreisat import DreiSatIE
@@ -454,7 +450,10 @@ from .gamestar import GameStarIE
from .gaskrank import GaskrankIE
from .gazeta import GazetaIE
from .gdcvault import GDCVaultIE
from .gedidigital import GediDigitalIE
from .gedi import (
GediIE,
GediEmbedsIE,
)
from .generic import GenericIE
from .gfycat import GfycatIE
from .giantbomb import GiantBombIE
@@ -585,11 +584,7 @@ from .kuwo import (
KuwoCategoryIE,
KuwoMvIE,
)
from .la7 import (
LA7IE,
LA7PodcastEpisodeIE,
LA7PodcastIE,
)
from .la7 import LA7IE
from .laola1tv import (
Laola1TvEmbedIE,
Laola1TvIE,
@@ -716,10 +711,7 @@ from .mixcloud import (
MixcloudUserIE,
MixcloudPlaylistIE,
)
from .mlb import (
MLBIE,
MLBVideoIE,
)
from .mlb import MLBIE
from .mnet import MnetIE
from .moevideo import MoeVideoIE
from .mofosex import (
@@ -743,8 +735,6 @@ from .mtv import (
MTVServicesEmbeddedIE,
MTVDEIE,
MTVJapanIE,
MTVItaliaIE,
MTVItaliaProgrammaIE,
)
from .muenchentv import MuenchenTVIE
from .mwave import MwaveIE, MwaveMeetGreetIE
@@ -833,7 +823,7 @@ from .nick import (
NickNightIE,
NickRuIE,
)
from .niconico import NiconicoIE, NiconicoPlaylistIE, NiconicoUserIE
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninecninemedia import NineCNineMediaIE
from .ninegag import NineGagIE
from .ninenow import NineNowIE
@@ -928,11 +918,6 @@ from .packtpub import (
PacktPubIE,
PacktPubCourseIE,
)
from .palcomp3 import (
PalcoMP3IE,
PalcoMP3ArtistIE,
PalcoMP3VideoIE,
)
from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
@@ -969,7 +954,6 @@ from .plays import PlaysTVIE
from .playtvak import PlaytvakIE
from .playvid import PlayvidIE
from .playwire import PlaywireIE
from .plutotv import PlutoTVIE
from .pluralsight import (
PluralsightIE,
PluralsightCourseIE,
@@ -1195,10 +1179,7 @@ from .spike import (
BellatorIE,
ParamountNetworkIE,
)
from .stitcher import (
StitcherIE,
StitcherShowIE,
)
from .stitcher import StitcherIE
from .sport5 import Sport5IE
from .sportbox import SportBoxIE
from .sportdeutschland import SportDeutschlandIE
@@ -1579,7 +1560,6 @@ from .weibo import (
WeiboMobileIE
)
from .weiqitv import WeiqiTVIE
from .wimtv import WimTVIE
from .wistia import (
WistiaIE,
WistiaPlaylistIE,
@@ -1686,14 +1666,8 @@ from .zattoo import (
ZattooLiveIE,
)
from .zdf import ZDFIE, ZDFChannelIE
from .zee5 import (
Zee5IE,
Zee5SeriesIE,
)
from .zhihu import ZhihuIE
from .zingmp3 import (
ZingMp3IE,
ZingMp3AlbumIE,
)
from .zingmp3 import ZingMp3IE
from .zee5 import Zee5IE
from .zoom import ZoomIE
from .zype import ZypeIE

View File

@@ -401,7 +401,7 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
(r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'data-id=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
r'data-id="([^"]+)"'),
webpage, 'video id')
return self._make_url_result(video_id)

View File

@@ -17,7 +17,7 @@ class FujiTVFODPlus7IE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
formats = self._extract_m3u8_formats(
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id, 'mp4')
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id)
for f in formats:
wh = self._BITRATE_MAP.get(f.get('tbr'))
if wh:

266
yt_dlp/extractor/gedi.py Normal file
View File

@@ -0,0 +1,266 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
base_url,
url_basename,
urljoin,
)
class GediBaseIE(InfoExtractor):
@staticmethod
def _clean_audio_fmts(formats):
unique_formats = []
for f in formats:
if 'acodec' in f:
unique_formats.append(f)
formats[:] = unique_formats
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player_data = re.findall(
r'PlayerFactory\.setParam\(\'(?P<type>.+?)\',\s*\'(?P<name>.+?)\',\s*\'(?P<val>.+?)\'\);',
webpage)
formats = []
audio_fmts = []
hls_fmts = []
http_fmts = []
title = ''
thumb = ''
fmt_reg = r'(?P<t>video|audio)-(?P<p>rrtv|hls)-(?P<h>[\w\d]+)(?:-(?P<br>[\w\d]+))?$'
br_reg = r'video-rrtv-(?P<br>\d+)-'
for t, n, v in player_data:
if t == 'format':
m = re.match(fmt_reg, n)
if m:
# audio formats
if m.group('t') == 'audio':
if m.group('p') == 'hls':
audio_fmts.extend(self._extract_m3u8_formats(
v, video_id, 'm4a', m3u8_id='hls', fatal=False))
elif m.group('p') == 'rrtv':
audio_fmts.append({
'format_id': 'mp3',
'url': v,
'tbr': 128,
'ext': 'mp3',
'vcodec': 'none',
'acodec': 'mp3',
})
# video formats
elif m.group('t') == 'video':
# hls manifest video
if m.group('p') == 'hls':
hls_fmts.extend(self._extract_m3u8_formats(
v, video_id, 'mp4', m3u8_id='hls', fatal=False))
# direct mp4 video
elif m.group('p') == 'rrtv':
if not m.group('br'):
mm = re.search(br_reg, v)
http_fmts.append({
'format_id': 'https-' + m.group('h'),
'protocol': 'https',
'url': v,
'tbr': int(m.group('br')) if m.group('br') else
(int(mm.group('br')) if mm.group('br') else 0),
'height': int(m.group('h'))
})
elif t == 'param':
if n == 'videotitle':
title = v
if n == 'image_full_play':
thumb = v
title = self._og_search_title(webpage) if title == '' else title
# clean weird char
title = compat_str(title).encode('utf8', 'replace').replace(b'\xc3\x82', b'').decode('utf8', 'replace')
if audio_fmts:
self._clean_audio_fmts(audio_fmts)
self._sort_formats(audio_fmts)
if hls_fmts:
self._sort_formats(hls_fmts)
if http_fmts:
self._sort_formats(http_fmts)
formats.extend(audio_fmts)
formats.extend(hls_fmts)
formats.extend(http_fmts)
return {
'id': video_id,
'title': title,
'description': self._html_search_meta('twitter:description', webpage),
'thumbnail': thumb,
'formats': formats,
}
class GediIE(GediBaseIE):
_VALID_URL = r'''(?x)https?://video\.
(?:
(?:espresso\.)?repubblica
|lastampa
|huffingtonpost
|ilsecoloxix
|iltirreno
|messaggeroveneto
|ilpiccolo
|gazzettadimantova
|mattinopadova
|laprovinciapavese
|tribunatreviso
|nuovavenezia
|gazzettadimodena
|lanuovaferrara
|corrierealpi
|lasentinella
)
(?:\.gelocal)?\.it/(?!embed/).+?/(?P<id>[\d/]+)(?:\?|\&|$)'''
_TESTS = [{
'url': 'https://video.lastampa.it/politica/il-paradosso-delle-regionali-la-lega-vince-ma-sembra-aver-perso/121559/121683',
'md5': '84658d7fb9e55a6e57ecc77b73137494',
'info_dict': {
'id': '121559/121683',
'ext': 'mp4',
'title': 'Il paradosso delle Regionali: ecco perché la Lega vince ma sembra aver perso',
'description': 'md5:de7f4d6eaaaf36c153b599b10f8ce7ca',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}, {
'url': 'https://video.repubblica.it/motori/record-della-pista-a-spa-francorchamps-la-pagani-huayra-roadster-bc-stupisce/367415/367963',
'md5': 'e763b94b7920799a0e0e23ffefa2d157',
'info_dict': {
'id': '367415/367963',
'ext': 'mp4',
'title': 'Record della pista a Spa Francorchamps, la Pagani Huayra Roadster BC stupisce',
'description': 'md5:5deb503cefe734a3eb3f07ed74303920',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}, {
'url': 'https://video.ilsecoloxix.it/sport/cassani-e-i-brividi-azzurri-ai-mondiali-di-imola-qui-mi-sono-innamorato-del-ciclismo-da-ragazzino-incredibile-tornarci-da-ct/66184/66267',
'md5': 'e48108e97b1af137d22a8469f2019057',
'info_dict': {
'id': '66184/66267',
'ext': 'mp4',
'title': 'Cassani e i brividi azzurri ai Mondiali di Imola: \\"Qui mi sono innamorato del ciclismo da ragazzino, incredibile tornarci da ct\\"',
'description': 'md5:fc9c50894f70a2469bb9b54d3d0a3d3b',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}, {
'url': 'https://video.iltirreno.gelocal.it/sport/dentro-la-notizia-ferrari-cosa-succede-a-maranello/141059/142723',
'md5': 'a6e39f3bdc1842bbd92abbbbef230817',
'info_dict': {
'id': '141059/142723',
'ext': 'mp4',
'title': 'Dentro la notizia - Ferrari, cosa succede a Maranello',
'description': 'md5:9907d65b53765681fa3a0b3122617c1f',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}]
class GediEmbedsIE(GediBaseIE):
_VALID_URL = r'''(?x)https?://video\.
(?:
(?:espresso\.)?repubblica
|lastampa
|huffingtonpost
|ilsecoloxix
|iltirreno
|messaggeroveneto
|ilpiccolo
|gazzettadimantova
|mattinopadova
|laprovinciapavese
|tribunatreviso
|nuovavenezia
|gazzettadimodena
|lanuovaferrara
|corrierealpi
|lasentinella
)
(?:\.gelocal)?\.it/embed/.+?/(?P<id>[\d/]+)(?:\?|\&|$)'''
_TESTS = [{
'url': 'https://video.huffingtonpost.it/embed/politica/cotticelli-non-so-cosa-mi-sia-successo-sto-cercando-di-capire-se-ho-avuto-un-malore/29312/29276?responsive=true&el=video971040871621586700',
'md5': 'f4ac23cadfea7fef89bea536583fa7ed',
'info_dict': {
'id': '29312/29276',
'ext': 'mp4',
'title': 'Cotticelli: \\"Non so cosa mi sia successo. Sto cercando di capire se ho avuto un malore\\"',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}, {
'url': 'https://video.espresso.repubblica.it/embed/tutti-i-video/01-ted-villa/14772/14870&width=640&height=360',
'md5': '0391c2c83c6506581003aaf0255889c0',
'info_dict': {
'id': '14772/14870',
'ext': 'mp4',
'title': 'Festival EMERGENCY, Villa: «La buona informazione aiuta la salute» (14772-14870)',
'description': 'md5:2bce954d278248f3c950be355b7c2226',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-social-play\.jpg$',
},
}]
@staticmethod
def _sanitize_urls(urls):
# add protocol if missing
for i, e in enumerate(urls):
if e.startswith('//'):
urls[i] = 'https:%s' % e
# clean iframes urls
for i, e in enumerate(urls):
urls[i] = urljoin(base_url(e), url_basename(e))
return urls
@staticmethod
def _extract_urls(webpage):
entries = [
mobj.group('url')
for mobj in re.finditer(r'''(?x)
(?:
data-frame-src=|
<iframe[^\n]+src=
)
(["'])
(?P<url>https?://video\.
(?:
(?:espresso\.)?repubblica
|lastampa
|huffingtonpost
|ilsecoloxix
|iltirreno
|messaggeroveneto
|ilpiccolo
|gazzettadimantova
|mattinopadova
|laprovinciapavese
|tribunatreviso
|nuovavenezia
|gazzettadimodena
|lanuovaferrara
|corrierealpi
|lasentinella
)
(?:\.gelocal)?\.it/embed/.+?)
\1''', webpage)]
return GediEmbedsIE._sanitize_urls(entries)
@staticmethod
def _extract_url(webpage):
urls = GediEmbedsIE._extract_urls(webpage)
return urls[0] if urls else None

View File

@@ -1,210 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
base_url,
determine_ext,
int_or_none,
url_basename,
urljoin,
)
class GediDigitalIE(InfoExtractor):
_VALID_URL = r'''(?x)(?P<url>(?:https?:)//video\.
(?:
(?:
(?:espresso\.)?repubblica
|lastampa
|ilsecoloxix
|huffingtonpost
)|
(?:
iltirreno
|messaggeroveneto
|ilpiccolo
|gazzettadimantova
|mattinopadova
|laprovinciapavese
|tribunatreviso
|nuovavenezia
|gazzettadimodena
|lanuovaferrara
|corrierealpi
|lasentinella
)\.gelocal
)\.it(?:/[^/]+){2,4}/(?P<id>\d+))(?:$|[?&].*)'''
_TESTS = [{
'url': 'https://video.lastampa.it/politica/il-paradosso-delle-regionali-la-lega-vince-ma-sembra-aver-perso/121559/121683',
'md5': '84658d7fb9e55a6e57ecc77b73137494',
'info_dict': {
'id': '121683',
'ext': 'mp4',
'title': 'Il paradosso delle Regionali: ecco perché la Lega vince ma sembra aver perso',
'description': 'md5:de7f4d6eaaaf36c153b599b10f8ce7ca',
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-full-.+?\.jpg$',
'duration': 125,
},
}, {
'url': 'https://video.huffingtonpost.it/embed/politica/cotticelli-non-so-cosa-mi-sia-successo-sto-cercando-di-capire-se-ho-avuto-un-malore/29312/29276?responsive=true&el=video971040871621586700',
'only_matching': True,
}, {
'url': 'https://video.espresso.repubblica.it/embed/tutti-i-video/01-ted-villa/14772/14870&width=640&height=360',
'only_matching': True,
}, {
'url': 'https://video.repubblica.it/motori/record-della-pista-a-spa-francorchamps-la-pagani-huayra-roadster-bc-stupisce/367415/367963',
'only_matching': True,
}, {
'url': 'https://video.ilsecoloxix.it/sport/cassani-e-i-brividi-azzurri-ai-mondiali-di-imola-qui-mi-sono-innamorato-del-ciclismo-da-ragazzino-incredibile-tornarci-da-ct/66184/66267',
'only_matching': True,
}, {
'url': 'https://video.iltirreno.gelocal.it/sport/dentro-la-notizia-ferrari-cosa-succede-a-maranello/141059/142723',
'only_matching': True,
}, {
'url': 'https://video.messaggeroveneto.gelocal.it/locale/maria-giovanna-elmi-covid-vaccino/138155/139268',
'only_matching': True,
}, {
'url': 'https://video.ilpiccolo.gelocal.it/dossier/big-john/dinosauro-big-john-al-via-le-visite-guidate-a-trieste/135226/135751',
'only_matching': True,
}, {
'url': 'https://video.gazzettadimantova.gelocal.it/locale/dal-ponte-visconteo-di-valeggio-l-and-8217sos-dei-ristoratori-aprire-anche-a-cena/137310/137818',
'only_matching': True,
}, {
'url': 'https://video.mattinopadova.gelocal.it/dossier/coronavirus-in-veneto/covid-a-vo-un-anno-dopo-un-cuore-tricolore-per-non-dimenticare/138402/138964',
'only_matching': True,
}, {
'url': 'https://video.laprovinciapavese.gelocal.it/locale/mede-zona-rossa-via-alle-vaccinazioni-per-gli-over-80/137545/138120',
'only_matching': True,
}, {
'url': 'https://video.tribunatreviso.gelocal.it/dossier/coronavirus-in-veneto/ecco-le-prima-vaccinazioni-di-massa-nella-marca/134485/135024',
'only_matching': True,
}, {
'url': 'https://video.nuovavenezia.gelocal.it/locale/camion-troppo-alto-per-il-ponte-ferroviario-perde-il-carico/135734/136266',
'only_matching': True,
}, {
'url': 'https://video.gazzettadimodena.gelocal.it/locale/modena-scoperta-la-proteina-che-predice-il-livello-di-gravita-del-covid/139109/139796',
'only_matching': True,
}, {
'url': 'https://video.lanuovaferrara.gelocal.it/locale/due-bombole-di-gpl-aperte-e-abbandonate-i-vigili-bruciano-il-gas/134391/134957',
'only_matching': True,
}, {
'url': 'https://video.corrierealpi.gelocal.it/dossier/cortina-2021-i-mondiali-di-sci-alpino/mondiali-di-sci-il-timelapse-sulla-splendida-olympia/133760/134331',
'only_matching': True,
}, {
'url': 'https://video.lasentinella.gelocal.it/locale/vestigne-centra-un-auto-e-si-ribalta/138931/139466',
'only_matching': True,
}, {
'url': 'https://video.espresso.repubblica.it/tutti-i-video/01-ted-villa/14772',
'only_matching': True,
}]
@staticmethod
def _sanitize_urls(urls):
# add protocol if missing
for i, e in enumerate(urls):
if e.startswith('//'):
urls[i] = 'https:%s' % e
# clean iframes urls
for i, e in enumerate(urls):
urls[i] = urljoin(base_url(e), url_basename(e))
return urls
@staticmethod
def _extract_urls(webpage):
entries = [
mobj.group('eurl')
for mobj in re.finditer(r'''(?x)
(?:
data-frame-src=|
<iframe[^\n]+src=
)
(["'])(?P<eurl>%s)\1''' % GediDigitalIE._VALID_URL, webpage)]
return GediDigitalIE._sanitize_urls(entries)
@staticmethod
def _extract_url(webpage):
urls = GediDigitalIE._extract_urls(webpage)
return urls[0] if urls else None
@staticmethod
def _clean_formats(formats):
format_urls = set()
clean_formats = []
for f in formats:
if f['url'] not in format_urls:
if f.get('audio_ext') != 'none' and not f.get('acodec'):
continue
format_urls.add(f['url'])
clean_formats.append(f)
formats[:] = clean_formats
def _real_extract(self, url):
video_id = self._match_id(url)
url = re.match(self._VALID_URL, url).group('url')
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
['twitter:title', 'og:title'], webpage, fatal=True)
player_data = re.findall(
r"PlayerFactory\.setParam\('(?P<type>format|param)',\s*'(?P<name>[^']+)',\s*'(?P<val>[^']+)'\);",
webpage)
formats = []
duration = thumb = None
for t, n, v in player_data:
if t == 'format':
if n in ('video-hds-vod-ec', 'video-hls-vod-ec', 'video-viralize', 'video-youtube-pfp'):
continue
elif n.endswith('-vod-ak'):
formats.extend(self._extract_akamai_formats(
v, video_id, {'http': 'media.gedidigital.it'}))
else:
ext = determine_ext(v)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
v, video_id, 'mp4', 'm3u8_native', m3u8_id=n, fatal=False))
continue
f = {
'format_id': n,
'url': v,
}
if ext == 'mp3':
abr = int_or_none(self._search_regex(
r'-mp3-audio-(\d+)', v, 'abr', default=None))
f.update({
'abr': abr,
'tbr': abr,
'acodec': ext,
'vcodec': 'none'
})
else:
mobj = re.match(r'^video-rrtv-(\d+)(?:-(\d+))?$', n)
if mobj:
f.update({
'height': int(mobj.group(1)),
'vbr': int_or_none(mobj.group(2)),
})
if not f.get('vbr'):
f['vbr'] = int_or_none(self._search_regex(
r'-video-rrtv-(\d+)', v, 'abr', default=None))
formats.append(f)
elif t == 'param':
if n in ['image_full', 'image']:
thumb = v
elif n == 'videoDuration':
duration = int_or_none(v)
self._clean_formats(formats)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(
['twitter:description', 'og:description', 'description'], webpage),
'thumbnail': thumb or self._og_search_thumbnail(webpage),
'formats': formats,
'duration': duration,
}

View File

@@ -127,14 +127,13 @@ from .expressen import ExpressenIE
from .zype import ZypeIE
from .odnoklassniki import OdnoklassnikiIE
from .kinja import KinjaEmbedIE
from .gedidigital import GediDigitalIE
from .gedi import GediEmbedsIE
from .rcs import RCSEmbedsIE
from .bitchute import BitChuteIE
from .rumble import RumbleEmbedIE
from .arcpublishing import ArcPublishingIE
from .medialaan import MedialaanIE
from .simplecast import SimplecastIE
from .wimtv import WimTVIE
class GenericIE(InfoExtractor):
@@ -2251,15 +2250,6 @@ class GenericIE(InfoExtractor):
},
'playlist_mincount': 52,
},
{
# WimTv embed player
'url': 'http://www.msmotor.tv/wearefmi-pt-2-2021/',
'info_dict': {
'id': 'wearefmi-pt-2-2021',
'title': '#WEAREFMI PT.2 2021 MsMotorTV',
},
'playlist_count': 1,
},
]
def report_following_redirect(self, new_url):
@@ -2659,15 +2649,6 @@ class GenericIE(InfoExtractor):
if vid_me_embed_url is not None:
return self.url_result(vid_me_embed_url, 'Vidme')
# Invidious Instances
# https://github.com/yt-dlp/yt-dlp/issues/195
# https://github.com/iv-org/invidious/pull/1730
youtube_url = self._search_regex(
r'<link rel="alternate" href="(https://www\.youtube\.com/watch\?v=[0-9A-Za-z_-]{11})"',
webpage, 'youtube link', default=None)
if youtube_url:
return self.url_result(youtube_url, YoutubeIE.ie_key())
# Look for YouTube embeds
youtube_urls = YoutubeIE._extract_urls(webpage)
if youtube_urls:
@@ -2974,7 +2955,7 @@ class GenericIE(InfoExtractor):
webpage)
if not mobj:
mobj = re.search(
r'data-video-link=["\'](?P<url>http://m\.mlb\.com/video/[^"\']+)',
r'data-video-link=["\'](?P<url>http://m.mlb.com/video/[^"\']+)',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'MLB')
@@ -3358,22 +3339,17 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
zype_urls, video_id, video_title, ie=ZypeIE.ie_key())
gedi_urls = GediDigitalIE._extract_urls(webpage)
# Look for RCS media group embeds
gedi_urls = GediEmbedsIE._extract_urls(webpage)
if gedi_urls:
return self.playlist_from_matches(
gedi_urls, video_id, video_title, ie=GediDigitalIE.ie_key())
gedi_urls, video_id, video_title, ie=GediEmbedsIE.ie_key())
# Look for RCS media group embeds
rcs_urls = RCSEmbedsIE._extract_urls(webpage)
if rcs_urls:
return self.playlist_from_matches(
rcs_urls, video_id, video_title, ie=RCSEmbedsIE.ie_key())
wimtv_urls = WimTVIE._extract_urls(webpage)
if wimtv_urls:
return self.playlist_from_matches(
wimtv_urls, video_id, video_title, ie=WimTVIE.ie_key())
bitchute_urls = BitChuteIE._extract_urls(webpage)
if bitchute_urls:
return self.playlist_from_matches(

View File

@@ -12,7 +12,6 @@ from ..compat import (
)
from ..utils import (
ExtractorError,
float_or_none,
get_element_by_attribute,
int_or_none,
lowercase_escape,
@@ -33,7 +32,6 @@ class InstagramIE(InfoExtractor):
'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1371748545,
'upload_date': '20130620',
'uploader_id': 'naomipq',
@@ -50,7 +48,6 @@ class InstagramIE(InfoExtractor):
'ext': 'mp4',
'title': 'Video by britneyspears',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1453760977,
'upload_date': '20160125',
'uploader_id': 'britneyspears',
@@ -89,24 +86,6 @@ class InstagramIE(InfoExtractor):
'title': 'Post by instagram',
'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957',
},
}, {
# IGTV
'url': 'https://www.instagram.com/tv/BkfuX9UB-eK/',
'info_dict': {
'id': 'BkfuX9UB-eK',
'ext': 'mp4',
'title': 'Fingerboarding Tricks with @cass.fb',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 53.83,
'timestamp': 1530032919,
'upload_date': '20180626',
'uploader_id': 'instagram',
'uploader': 'Instagram',
'like_count': int,
'comment_count': int,
'comments': list,
'description': 'Meet Cass Hirst (@cass.fb), a fingerboarding pro who can perform tiny ollies and kickflips while blindfolded.',
}
}, {
'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True,
@@ -180,9 +159,7 @@ class InstagramIE(InfoExtractor):
description = try_get(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
compat_str) or media.get('caption')
title = media.get('title')
thumbnail = media.get('display_src') or media.get('display_url')
duration = float_or_none(media.get('video_duration'))
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username')
@@ -223,10 +200,9 @@ class InstagramIE(InfoExtractor):
continue
entries.append({
'id': node.get('shortcode') or node['id'],
'title': node.get('title') or 'Video %d' % edge_num,
'title': 'Video %d' % edge_num,
'url': node_video_url,
'thumbnail': node.get('display_url'),
'duration': float_or_none(node.get('video_duration')),
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
'view_count': int_or_none(node.get('video_view_count')),
@@ -263,9 +239,8 @@ class InstagramIE(InfoExtractor):
'id': video_id,
'formats': formats,
'ext': 'mp4',
'title': title or 'Video by %s' % uploader_id,
'title': 'Video by %s' % uploader_id,
'description': description,
'duration': duration,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader_id': uploader_id,

View File

@@ -146,7 +146,7 @@ class IviIE(InfoExtractor):
expected=True)
elif not pycryptodomex_found:
raise ExtractorError(
'pycryptodomex not found. Please install',
'pycryptodomex not found. Please install it.',
expected=True)
elif message:
extractor_msg += ': ' + message

View File

@@ -1,15 +1,9 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
parse_duration,
smuggle_url,
unified_strdate,
)
@@ -63,141 +57,3 @@ class LA7IE(InfoExtractor):
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'ie_key': 'Kaltura',
}
class LA7PodcastEpisodeIE(InfoExtractor):
IE_NAME = 'la7.it:pod:episode'
_VALID_URL = r'''(?x)(https?://)?
(?:www\.)?la7\.it/[^/]+/podcast/([^/]+-)?(?P<id>\d+)'''
_TESTS = [{
'url': 'https://www.la7.it/voicetown/podcast/la-carezza-delle-memoria-di-carlo-verdone-23-03-2021-371497',
'md5': '7737d4d79b3c1a34b3de3e16297119ed',
'info_dict': {
'id': '371497',
'ext': 'mp3',
'title': '"La carezza delle memoria" di Carlo Verdone',
'description': 'md5:5abf07c3c551a687db80af3f9ceb7d52',
'thumbnail': 'https://www.la7.it/sites/default/files/podcast/371497.jpg',
'upload_date': '20210323',
},
}, {
# embed url
'url': 'https://www.la7.it/embed/podcast/371497',
'only_matching': True,
}, {
# date already in the title
'url': 'https://www.la7.it/propagandalive/podcast/lintervista-di-diego-bianchi-ad-annalisa-cuzzocrea-puntata-del-1932021-20-03-2021-371130',
'only_matching': True,
}, {
# title same as show_title
'url': 'https://www.la7.it/otto-e-mezzo/podcast/otto-e-mezzo-26-03-2021-372340',
'only_matching': True,
}]
def _extract_info(self, webpage, video_id=None, ppn=None):
if not video_id:
video_id = self._search_regex(
r'data-nid=([\'"])(?P<vid>\d+)\1',
webpage, 'video_id', group='vid')
media_url = self._search_regex(
(r'src:\s*([\'"])(?P<url>.+?mp3.+?)\1',
r'data-podcast=([\'"])(?P<url>.+?mp3.+?)\1'),
webpage, 'media_url', group='url')
ext = determine_ext(media_url)
formats = [{
'url': media_url,
'format_id': ext,
'ext': ext,
}]
self._sort_formats(formats)
title = self._html_search_regex(
(r'<div class="title">(?P<title>.+?)</',
r'<title>(?P<title>[^<]+)</title>',
r'title:\s*([\'"])(?P<title>.+?)\1'),
webpage, 'title', group='title')
description = (
self._html_search_regex(
(r'<div class="description">(.+?)</div>',
r'<div class="description-mobile">(.+?)</div>',
r'<div class="box-txt">([^<]+?)</div>',
r'<div class="field-content"><p>(.+?)</p></div>'),
webpage, 'description', default=None)
or self._html_search_meta('description', webpage))
thumb = self._html_search_regex(
(r'<div class="podcast-image"><img src="(.+?)"></div>',
r'<div class="container-embed"[^<]+url\((.+?)\);">',
r'<div class="field-content"><img src="(.+?)"'),
webpage, 'thumbnail', fatal=False, default=None)
duration = parse_duration(self._html_search_regex(
r'<span class="(?:durata|duration)">([\d:]+)</span>',
webpage, 'duration', fatal=False, default=None))
date = self._html_search_regex(
r'class="data">\s*(?:<span>)?([\d\.]+)\s*</',
webpage, 'date', default=None)
date_alt = self._search_regex(
r'(\d+[\./]\d+[\./]\d+)', title, 'date_alt', default=None)
ppn = ppn or self._search_regex(
r'ppN:\s*([\'"])(?P<ppn>.+?)\1',
webpage, 'ppn', group='ppn', default=None)
# if the date is not in the title
# and title is the same as the show_title
# add the date to the title
if date and not date_alt and ppn and ppn.lower() == title.lower():
title += ' del %s' % date
return {
'id': video_id,
'title': title,
'description': description,
'duration': float_or_none(duration),
'formats': formats,
'thumbnail': thumb,
'upload_date': unified_strdate(date),
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
return self._extract_info(webpage, video_id)
class LA7PodcastIE(LA7PodcastEpisodeIE):
IE_NAME = 'la7.it:podcast'
_VALID_URL = r'(https?://)?(www\.)?la7\.it/(?P<id>[^/]+)/podcast/?(?:$|[#?])'
_TESTS = [{
'url': 'https://www.la7.it/propagandalive/podcast',
'info_dict': {
'id': 'propagandalive',
'title': "Propaganda Live",
},
'playlist_count': 10,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
title = (
self._html_search_regex(
r'<h1.*?>(.+?)</h1>', webpage, 'title', fatal=False, default=None)
or self._og_search_title(webpage))
ppn = self._search_regex(
r'window\.ppN\s*=\s*([\'"])(?P<ppn>.+?)\1',
webpage, 'ppn', group='ppn', default=None)
entries = []
for episode in re.finditer(
r'<div class="container-podcast-property">([\s\S]+?)(?:</div>\s*){3}',
webpage):
entries.append(self._extract_info(episode.group(1), ppn=ppn))
return self.playlist_result(entries, playlist_id, title)

View File

@@ -6,10 +6,8 @@ import json
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
)
from ..utils import (
determine_ext,
@@ -23,9 +21,9 @@ from ..utils import (
class LBRYBaseIE(InfoExtractor):
_BASE_URL_REGEX = r'(?:https?://(?:www\.)?(?:lbry\.tv|odysee\.com)/|lbry://)'
_BASE_URL_REGEX = r'https?://(?:www\.)?(?:lbry\.tv|odysee\.com)/'
_CLAIM_ID_REGEX = r'[0-9a-f]{1,40}'
_OPT_CLAIM_ID = '[^:/?#&]+(?:[:#]%s)?' % _CLAIM_ID_REGEX
_OPT_CLAIM_ID = '[^:/?#&]+(?::%s)?' % _CLAIM_ID_REGEX
_SUPPORTED_STREAM_TYPES = ['video', 'audio']
def _call_api_proxy(self, method, display_id, params, resource):
@@ -43,9 +41,7 @@ class LBRYBaseIE(InfoExtractor):
'resolve', display_id, {'urls': url}, resource)[url]
def _permanent_url(self, url, claim_name, claim_id):
return urljoin(
url.replace('lbry://', 'https://lbry.tv/'),
'/%s:%s' % (claim_name, claim_id))
return urljoin(url, '/%s:%s' % (claim_name, claim_id))
def _parse_stream(self, stream, url):
stream_value = stream.get('value') or {}
@@ -64,7 +60,6 @@ class LBRYBaseIE(InfoExtractor):
'description': stream_value.get('description'),
'license': stream_value.get('license'),
'timestamp': int_or_none(stream.get('timestamp')),
'release_timestamp': int_or_none(stream_value.get('release_time')),
'tags': stream_value.get('tags'),
'duration': int_or_none(media.get('duration')),
'channel': try_get(signing_channel, lambda x: x['value']['title']),
@@ -97,8 +92,6 @@ class LBRYIE(LBRYBaseIE):
'description': 'md5:f6cb5c704b332d37f5119313c2c98f51',
'timestamp': 1595694354,
'upload_date': '20200725',
'release_timestamp': 1595340697,
'release_date': '20200721',
'width': 1280,
'height': 720,
}
@@ -113,8 +106,6 @@ class LBRYIE(LBRYBaseIE):
'description': 'md5:661ac4f1db09f31728931d7b88807a61',
'timestamp': 1591312601,
'upload_date': '20200604',
'release_timestamp': 1591312421,
'release_date': '20200604',
'tags': list,
'duration': 2570,
'channel': 'The LBRY Foundation',
@@ -146,9 +137,6 @@ class LBRYIE(LBRYBaseIE):
}, {
'url': 'https://lbry.tv/@lacajadepandora:a/TRUMP-EST%C3%81-BIEN-PUESTO-con-Pilar-Baselga,-Carlos-Senra,-Luis-Palacios-(720p_30fps_H264-192kbit_AAC):1',
'only_matching': True,
}, {
'url': 'lbry://@lbry#3f/odysee#7',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -178,7 +166,7 @@ class LBRYIE(LBRYBaseIE):
class LBRYChannelIE(LBRYBaseIE):
IE_NAME = 'lbry:channel'
_VALID_URL = LBRYBaseIE._BASE_URL_REGEX + r'(?P<id>@%s)/?(?:[?&]|$)' % LBRYBaseIE._OPT_CLAIM_ID
_VALID_URL = LBRYBaseIE._BASE_URL_REGEX + r'(?P<id>@%s)/?(?:[?#&]|$)' % LBRYBaseIE._OPT_CLAIM_ID
_TESTS = [{
'url': 'https://lbry.tv/@LBRYFoundation:0',
'info_dict': {
@@ -190,24 +178,20 @@ class LBRYChannelIE(LBRYBaseIE):
}, {
'url': 'https://lbry.tv/@LBRYFoundation',
'only_matching': True,
}, {
'url': 'lbry://@lbry#3f',
'only_matching': True,
}]
_PAGE_SIZE = 50
def _fetch_page(self, claim_id, url, params, page):
def _fetch_page(self, claim_id, url, page):
page += 1
page_params = {
'channel_ids': [claim_id],
'claim_type': 'stream',
'no_totals': True,
'page': page,
'page_size': self._PAGE_SIZE,
}
page_params.update(params)
result = self._call_api_proxy(
'claim_search', claim_id, page_params, 'page %d' % page)
'claim_search', claim_id, {
'channel_ids': [claim_id],
'claim_type': 'stream',
'no_totals': True,
'page': page,
'page_size': self._PAGE_SIZE,
'stream_types': self._SUPPORTED_STREAM_TYPES,
}, 'page %d' % page)
for item in (result.get('items') or []):
stream_claim_name = item.get('name')
stream_claim_id = item.get('claim_id')
@@ -228,31 +212,8 @@ class LBRYChannelIE(LBRYBaseIE):
result = self._resolve_url(
'lbry://' + display_id, display_id, 'channel')
claim_id = result['claim_id']
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
content = qs.get('content', [None])[0]
params = {
'fee_amount': qs.get('fee_amount', ['>=0'])[0],
'order_by': {
'new': ['release_time'],
'top': ['effective_amount'],
'trending': ['trending_group', 'trending_mixed'],
}[qs.get('order', ['new'])[0]],
'stream_types': [content] if content in ['audio', 'video'] else self._SUPPORTED_STREAM_TYPES,
}
duration = qs.get('duration', [None])[0]
if duration:
params['duration'] = {
'long': '>=1200',
'short': '<=240',
}[duration]
language = qs.get('language', ['all'])[0]
if language != 'all':
languages = [language]
if language == 'en':
languages.append('none')
params['any_languages'] = languages
entries = OnDemandPagedList(
functools.partial(self._fetch_page, claim_id, url, params),
functools.partial(self._fetch_page, claim_id, url),
self._PAGE_SIZE)
result_value = result.get('value') or {}
return self.playlist_result(

View File

@@ -38,8 +38,8 @@ class LinuxAcademyIE(InfoExtractor):
'ext': 'mp4',
'title': 'What Is Data Science',
'description': 'md5:c574a3c20607144fb36cb65bdde76c99',
'timestamp': int, # The timestamp and upload date changes
'upload_date': r're:\d+',
'timestamp': 1607387907,
'upload_date': '20201208',
'duration': 304,
},
'params': {
@@ -59,16 +59,6 @@ class LinuxAcademyIE(InfoExtractor):
},
'playlist_count': 41,
'skip': 'Requires Linux Academy account credentials',
}, {
'url': 'https://linuxacademy.com/cp/modules/view/id/39',
'info_dict': {
'id': '39',
'title': 'Red Hat Certified Systems Administrator - RHCSA (EX200) Exam Prep (legacy)',
'description': 'md5:0f1d3369e90c3fb14a79813b863c902f',
'duration': 89280,
},
'playlist_count': 73,
'skip': 'Requires Linux Academy account credentials',
}]
_AUTHORIZE_URL = 'https://login.linuxacademy.com/authorize'
@@ -112,7 +102,7 @@ class LinuxAcademyIE(InfoExtractor):
'client_id': self._CLIENT_ID,
'redirect_uri': self._ORIGIN_URL,
'tenant': 'lacausers',
'connection': 'Username-Password-ACG-Proxy',
'connection': 'Username-Password-Authentication',
'username': username,
'password': password,
'sso': 'true',
@@ -172,7 +162,7 @@ class LinuxAcademyIE(InfoExtractor):
if course_id:
module = self._parse_json(
self._search_regex(
r'window\.module\s*=\s*({(?:(?!};)[^"]|"([^"]|\\")*")+})\s*;', webpage, 'module'),
r'window\.module\s*=\s*({.+?})\s*;', webpage, 'module'),
item_id)
entries = []
chapter_number = None

View File

@@ -5,7 +5,6 @@ from datetime import datetime
import itertools
import json
import base64
import re
from .common import InfoExtractor
from ..utils import (
@@ -69,7 +68,7 @@ class MildomBaseIE(InfoExtractor):
self._DISPATCHER_CONFIG = self._parse_json(base64.b64decode(tmp['data']), 'initialization')
except ExtractorError:
self._DISPATCHER_CONFIG = self._download_json(
'https://bookish-octo-barnacle.vercel.app/api/mildom/dispatcher_config', 'initialization',
'https://bookish-octo-barnacle.vercel.app/api/dispatcher_config', 'initialization',
note='Downloading dispatcher_config fallback')
return self._DISPATCHER_CONFIG
@@ -111,7 +110,6 @@ class MildomIE(MildomBaseIE):
enterstudio = self._call_api(
'https://cloudac.mildom.com/nonolive/gappserv/live/enterstudio', video_id,
note='Downloading live metadata', query={'user_id': video_id})
result_video_id = enterstudio.get('log_id', video_id)
title = try_get(
enterstudio, (
@@ -130,7 +128,7 @@ class MildomIE(MildomBaseIE):
), compat_str)
servers = self._call_api(
'https://cloudac.mildom.com/nonolive/gappserv/live/liveserver', result_video_id,
'https://cloudac.mildom.com/nonolive/gappserv/live/liveserver', video_id,
note='Downloading live server list', query={
'user_id': video_id,
'live_server_type': 'hls',
@@ -141,7 +139,7 @@ class MildomIE(MildomBaseIE):
'is_lhls': '0',
})
m3u8_url = update_url_query(servers['stream_server'] + '/%s_master.m3u8' % video_id, stream_query)
formats = self._extract_m3u8_formats(m3u8_url, result_video_id, 'mp4', headers={
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', headers={
'Referer': 'https://www.mildom.com/',
'Origin': 'https://www.mildom.com',
}, note='Downloading m3u8 information')
@@ -152,13 +150,13 @@ class MildomIE(MildomBaseIE):
parsed = parsed._replace(
netloc='bookish-octo-barnacle.vercel.app',
query=compat_urllib_parse_urlencode(stream_query, True),
path='/api/mildom' + parsed.path)
path='/api' + parsed.path)
fmt['url'] = compat_urlparse.urlunparse(parsed)
self._sort_formats(formats)
return {
'id': result_video_id,
'id': video_id,
'title': title,
'description': description,
'uploader': uploader,
@@ -174,8 +172,9 @@ class MildomVodIE(MildomBaseIE):
_VALID_URL = r'https?://(?:(?:www|m)\.)mildom\.com/playback/(?P<user_id>\d+)/(?P<id>(?P=user_id)-[a-zA-Z0-9]+)'
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
user_id, video_id = m.group('user_id'), m.group('id')
video_id = self._match_id(url)
m = self._VALID_URL_RE.match(url)
user_id = m.group('user_id')
url = 'https://www.mildom.com/playback/%s/%s' % (user_id, video_id)
webpage = self._download_webpage(url, video_id)
@@ -231,7 +230,7 @@ class MildomVodIE(MildomBaseIE):
parsed = parsed._replace(
netloc='bookish-octo-barnacle.vercel.app',
query=compat_urllib_parse_urlencode(stream_query, True),
path='/api/mildom/vod2/proxy')
path='/api/vod2/proxy')
fmt['url'] = compat_urlparse.urlunparse(parsed)
self._sort_formats(formats)

View File

@@ -1,91 +1,15 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
parse_duration,
parse_iso8601,
try_get,
)
from .nhl import NHLBaseIE
class MLBBaseIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
video = self._download_video_data(display_id)
video_id = video['id']
title = video['title']
feed = self._get_feed(video)
formats = []
for playback in (feed.get('playbacks') or []):
playback_url = playback.get('url')
if not playback_url:
continue
name = playback.get('name')
ext = determine_ext(playback_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
playback_url, video_id, 'mp4',
'm3u8_native', m3u8_id=name, fatal=False))
else:
f = {
'format_id': name,
'url': playback_url,
}
mobj = re.search(r'_(\d+)K_(\d+)X(\d+)', name)
if mobj:
f.update({
'height': int(mobj.group(3)),
'tbr': int(mobj.group(1)),
'width': int(mobj.group(2)),
})
mobj = re.search(r'_(\d+)x(\d+)_(\d+)_(\d+)K\.mp4', playback_url)
if mobj:
f.update({
'fps': int(mobj.group(3)),
'height': int(mobj.group(2)),
'tbr': int(mobj.group(4)),
'width': int(mobj.group(1)),
})
formats.append(f)
self._sort_formats(formats)
thumbnails = []
for cut in (try_get(feed, lambda x: x['image']['cuts'], list) or []):
src = cut.get('src')
if not src:
continue
thumbnails.append({
'height': int_or_none(cut.get('height')),
'url': src,
'width': int_or_none(cut.get('width')),
})
language = (video.get('language') or 'EN').lower()
return {
'id': video_id,
'title': title,
'formats': formats,
'description': video.get('description'),
'duration': parse_duration(feed.get('duration')),
'thumbnails': thumbnails,
'timestamp': parse_iso8601(video.get(self._TIMESTAMP_KEY)),
'subtitles': self._extract_mlb_subtitles(feed, language),
}
class MLBIE(MLBBaseIE):
class MLBIE(NHLBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:[\da-z_-]+\.)*mlb\.com/
(?:[\da-z_-]+\.)*(?P<site>mlb)\.com/
(?:
(?:
(?:[^/]+/)*video/[^/]+/c-|
(?:[^/]+/)*c-|
(?:
shared/video/embed/(?:embed|m-internal-embed)\.html|
(?:[^/]+/)+(?:play|index)\.jsp|
@@ -94,6 +18,7 @@ class MLBIE(MLBBaseIE):
(?P<id>\d+)
)
'''
_CONTENT_DOMAIN = 'content.mlb.com'
_TESTS = [
{
'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933',
@@ -151,6 +76,18 @@ class MLBIE(MLBBaseIE):
'thumbnail': r're:^https?://.*\.jpg$',
},
},
{
'url': 'https://www.mlb.com/news/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer/c-118550098',
'md5': 'e09e37b552351fddbf4d9e699c924d68',
'info_dict': {
'id': '75609783',
'ext': 'mp4',
'title': 'Must C: Pillar climbs for catch',
'description': '4/15/15: Blue Jays outfielder Kevin Pillar continues his defensive dominance by climbing the wall in left to rob Tim Beckham of a home run',
'timestamp': 1429139220,
'upload_date': '20150415',
}
},
{
'url': 'https://www.mlb.com/video/hargrove-homers-off-caldwell/c-1352023483?tid=67793694',
'only_matching': True,
@@ -176,92 +113,8 @@ class MLBIE(MLBBaseIE):
'url': 'http://mlb.mlb.com/shared/video/embed/m-internal-embed.html?content_id=75609783&property=mlb&autoplay=true&hashmode=false&siteSection=mlb/multimedia/article_118550098/article_embed&club=mlb',
'only_matching': True,
},
]
_TIMESTAMP_KEY = 'date'
@staticmethod
def _get_feed(video):
return video
@staticmethod
def _extract_mlb_subtitles(feed, language):
subtitles = {}
for keyword in (feed.get('keywordsAll') or []):
keyword_type = keyword.get('type')
if keyword_type and keyword_type.startswith('closed_captions_location_'):
cc_location = keyword.get('value')
if cc_location:
subtitles.setdefault(language, []).append({
'url': cc_location,
})
return subtitles
def _download_video_data(self, display_id):
return self._download_json(
'http://content.mlb.com/mlb/item/id/v1/%s/details/web-v1.json' % display_id,
display_id)
class MLBVideoIE(MLBBaseIE):
_VALID_URL = r'https?://(?:www\.)?mlb\.com/(?:[^/]+/)*video/(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.mlb.com/mariners/video/ackley-s-spectacular-catch-c34698933',
'md5': '632358dacfceec06bad823b83d21df2d',
'info_dict': {
'id': 'c04a8863-f569-42e6-9f87-992393657614',
'ext': 'mp4',
'title': "Ackley's spectacular catch",
'description': 'md5:7f5a981eb4f3cbc8daf2aeffa2215bf0',
'duration': 66,
'timestamp': 1405995000,
'upload_date': '20140722',
'thumbnail': r're:^https?://.+',
},
}
_TIMESTAMP_KEY = 'timestamp'
@classmethod
def suitable(cls, url):
return False if MLBIE.suitable(url) else super(MLBVideoIE, cls).suitable(url)
@staticmethod
def _get_feed(video):
return video['feeds'][0]
@staticmethod
def _extract_mlb_subtitles(feed, language):
subtitles = {}
for cc_location in (feed.get('closedCaptions') or []):
subtitles.setdefault(language, []).append({
'url': cc_location,
})
def _download_video_data(self, display_id):
# https://www.mlb.com/data-service/en/videos/[SLUG]
return self._download_json(
'https://fastball-gateway.mlb.com/graphql',
display_id, query={
'query': '''{
mediaPlayback(ids: "%s") {
description
feeds(types: CMS) {
closedCaptions
duration
image {
cuts {
width
height
src
{
'url': 'https://www.mlb.com/cut4/carlos-gomez-borrowed-sunglasses-from-an-as-fan/c-278912842',
'only_matching': True,
}
}
playbacks {
name
url
}
}
id
timestamp
title
}
}''' % display_id,
})['data']['mediaPlayback'][0]
]

View File

@@ -14,7 +14,6 @@ from ..utils import (
fix_xml_ampersands,
float_or_none,
HEADRequest,
int_or_none,
RegexNotFoundError,
sanitized_Request,
strip_or_none,
@@ -177,22 +176,6 @@ class MTVServicesInfoExtractor(InfoExtractor):
raise ExtractorError('Could not find video title')
title = title.strip()
series = find_xpath_attr(
itemdoc, './/{http://search.yahoo.com/mrss/}category',
'scheme', 'urn:mtvn:franchise')
season = find_xpath_attr(
itemdoc, './/{http://search.yahoo.com/mrss/}category',
'scheme', 'urn:mtvn:seasonN')
episode = find_xpath_attr(
itemdoc, './/{http://search.yahoo.com/mrss/}category',
'scheme', 'urn:mtvn:episodeN')
series = series.text if series is not None else None
season = season.text if season is not None else None
episode = episode.text if episode is not None else None
if season and episode:
# episode number includes season, so remove it
episode = re.sub(r'^%s' % season, '', episode)
# This a short id that's used in the webpage urls
mtvn_id = None
mtvn_id_node = find_xpath_attr(itemdoc, './/{http://search.yahoo.com/mrss/}category',
@@ -218,9 +201,6 @@ class MTVServicesInfoExtractor(InfoExtractor):
'description': description,
'duration': float_or_none(content_el.attrib.get('duration')),
'timestamp': timestamp,
'series': series,
'season_number': int_or_none(season),
'episode_number': int_or_none(episode),
}
def _get_feed_query(self, uri):
@@ -340,7 +320,7 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media\.mtvnservices\.com/embed/.+?)\1', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media.mtvnservices.com/embed/.+?)\1', webpage)
if mobj:
return mobj.group('url')
@@ -503,152 +483,3 @@ class MTVDEIE(MTVServicesInfoExtractor):
'arcEp': 'mtv.de',
'mgid': uri,
}
class MTVItaliaIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv.it'
_VALID_URL = r'https?://(?:www\.)?mtv\.it/(?:episodi|video|musica)/(?P<id>[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.mtv.it/episodi/24bqab/mario-una-serie-di-maccio-capatonda-cavoli-amario-episodio-completo-S1-E1',
'info_dict': {
'id': '0f0fc78e-45fc-4cce-8f24-971c25477530',
'ext': 'mp4',
'title': 'Cavoli amario (episodio completo)',
'description': 'md5:4962bccea8fed5b7c03b295ae1340660',
'series': 'Mario - Una Serie Di Maccio Capatonda',
'season_number': 1,
'episode_number': 1,
},
'params': {
'skip_download': True,
},
}]
_GEO_COUNTRIES = ['IT']
_FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
def _get_feed_query(self, uri):
return {
'arcEp': 'mtv.it',
'mgid': uri,
}
class MTVItaliaProgrammaIE(MTVItaliaIE):
IE_NAME = 'mtv.it:programma'
_VALID_URL = r'https?://(?:www\.)?mtv\.it/(?:programmi|playlist)/(?P<id>[0-9a-z]+)'
_TESTS = [{
# program page: general
'url': 'http://www.mtv.it/programmi/s2rppv/mario-una-serie-di-maccio-capatonda',
'info_dict': {
'id': 'a6f155bc-8220-4640-aa43-9b95f64ffa3d',
'title': 'Mario - Una Serie Di Maccio Capatonda',
'description': 'md5:72fbffe1f77ccf4e90757dd4e3216153',
},
'playlist_count': 2,
'params': {
'skip_download': True,
},
}, {
# program page: specific season
'url': 'http://www.mtv.it/programmi/d9ncjf/mario-una-serie-di-maccio-capatonda-S2',
'info_dict': {
'id': '4deeb5d8-f272-490c-bde2-ff8d261c6dd1',
'title': 'Mario - Una Serie Di Maccio Capatonda - Stagione 2',
},
'playlist_count': 34,
'params': {
'skip_download': True,
},
}, {
# playlist page + redirect
'url': 'http://www.mtv.it/playlist/sexy-videos/ilctal',
'info_dict': {
'id': 'dee8f9ee-756d-493b-bf37-16d1d2783359',
'title': 'Sexy Videos',
},
'playlist_mincount': 145,
'params': {
'skip_download': True,
},
}]
_GEO_COUNTRIES = ['IT']
_FEED_URL = 'http://www.mtv.it/feeds/triforce/manifest/v8'
def _get_entries(self, title, url):
while True:
pg = self._search_regex(r'/(\d+)$', url, 'entries', '1')
entries = self._download_json(url, title, 'page %s' % pg)
url = try_get(
entries, lambda x: x['result']['nextPageURL'], compat_str)
entries = try_get(
entries, (
lambda x: x['result']['data']['items'],
lambda x: x['result']['data']['seasons']),
list)
for entry in entries or []:
if entry.get('canonicalURL'):
yield self.url_result(entry['canonicalURL'])
if not url:
break
def _real_extract(self, url):
query = {'url': url}
info_url = update_url_query(self._FEED_URL, query)
video_id = self._match_id(url)
info = self._download_json(info_url, video_id).get('manifest')
redirect = try_get(
info, lambda x: x['newLocation']['url'], compat_str)
if redirect:
return self.url_result(redirect)
title = info.get('title')
video_id = try_get(
info, lambda x: x['reporting']['itemId'], compat_str)
parent_id = try_get(
info, lambda x: x['reporting']['parentId'], compat_str)
playlist_url = current_url = None
for z in (info.get('zones') or {}).values():
if z.get('moduleName') in ('INTL_M304', 'INTL_M209'):
info_url = z.get('feed')
if z.get('moduleName') in ('INTL_M308', 'INTL_M317'):
playlist_url = playlist_url or z.get('feed')
if z.get('moduleName') in ('INTL_M300',):
current_url = current_url or z.get('feed')
if not info_url:
raise ExtractorError('No info found')
if video_id == parent_id:
video_id = self._search_regex(
r'([^\/]+)/[^\/]+$', info_url, 'video_id')
info = self._download_json(info_url, video_id, 'Show infos')
info = try_get(info, lambda x: x['result']['data'], dict)
title = title or try_get(
info, (
lambda x: x['title'],
lambda x: x['headline']),
compat_str)
description = try_get(info, lambda x: x['content'], compat_str)
if current_url:
season = try_get(
self._download_json(playlist_url, video_id, 'Seasons info'),
lambda x: x['result']['data'], dict)
current = try_get(
season, lambda x: x['currentSeason'], compat_str)
seasons = try_get(
season, lambda x: x['seasons'], list) or []
if current in [s.get('eTitle') for s in seasons]:
playlist_url = current_url
title = re.sub(
r'[-|]\s*(?:mtv\s*italia|programma|playlist)',
'', title, flags=re.IGNORECASE).strip()
return self.playlist_result(
self._get_entries(title, playlist_url),
video_id, title, description)

View File

@@ -6,122 +6,98 @@ from .common import InfoExtractor
from ..utils import (
ExtractorError,
js_to_json,
qualities,
try_get,
url_or_none,
urljoin,
)
VALID_STREAMS = ('dash', )
class MxplayerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mxplayer\.in/(?:show|movie)/(?:(?P<display_id>[-/a-z0-9]+)-)?(?P<id>[a-z0-9]+)'
_TESTS = [{
_VALID_URL = r'https?://(?:www\.)?mxplayer\.in/movie/(?P<slug>[a-z0-9]+(?:-[a-z0-9]+)*)'
_TEST = {
'url': 'https://www.mxplayer.in/movie/watch-knock-knock-hindi-dubbed-movie-online-b9fa28df3bfb8758874735bbd7d2655a?watch=true',
'info_dict': {
'id': 'b9fa28df3bfb8758874735bbd7d2655a',
'ext': 'mp4',
'title': 'Knock Knock (Hindi Dubbed)',
'title': 'Knock Knock Movie | Watch 2015 Knock Knock Full Movie Online- MX Player',
'description': 'md5:b195ba93ff1987309cfa58e2839d2a5b'
},
'params': {
'skip_download': True,
'format': 'bestvideo'
}
}, {
'url': 'https://www.mxplayer.in/show/watch-shaitaan/season-1/the-infamous-taxi-gang-of-meerut-online-45055d5bcff169ad48f2ad7552a83d6c',
'info_dict': {
'id': '45055d5bcff169ad48f2ad7552a83d6c',
'ext': 'm3u8',
'title': 'The infamous taxi gang of Meerut',
'description': 'md5:033a0a7e3fd147be4fb7e07a01a3dc28',
'season': 'Season 1',
'series': 'Shaitaan'
},
'params': {
'skip_download': True,
}
}, {
'url': 'https://www.mxplayer.in/show/watch-aashram/chapter-1/duh-swapna-online-d445579792b0135598ba1bc9088a84cb',
'info_dict': {
'id': 'd445579792b0135598ba1bc9088a84cb',
'ext': 'mp4',
'title': 'Duh Swapna',
'description': 'md5:35ff39c4bdac403c53be1e16a04192d8',
'season': 'Chapter 1',
'series': 'Aashram'
},
'expected_warnings': ['Unknown MIME type application/mp4 in DASH manifest'],
'params': {
'skip_download': True,
'format': 'bestvideo'
}
}]
}
def _get_best_stream_url(self, stream):
best_stream = list(filter(None, [v for k, v in stream.items()]))
return best_stream.pop(0) if len(best_stream) else None
def _get_stream_urls(self, video_dict):
stream_provider_dict = try_get(
video_dict,
lambda x: x['stream'][x['stream']['provider']])
if not stream_provider_dict:
raise ExtractorError('No stream provider found', expected=True)
stream_dict = video_dict.get('stream', {'provider': {}})
stream_provider = stream_dict.get('provider')
for stream_name, stream in stream_provider_dict.items():
if stream_name in ('hls', 'dash', 'hlsUrl', 'dashUrl'):
stream_type = stream_name.replace('Url', '')
if isinstance(stream, dict):
for quality, stream_url in stream.items():
if stream_url:
yield stream_type, quality, stream_url
else:
yield stream_type, 'base', stream
if not stream_dict[stream_provider]:
message = 'No stream provider found'
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
streams = []
for stream_name, v in stream_dict[stream_provider].items():
if stream_name in VALID_STREAMS:
stream_url = self._get_best_stream_url(v)
if stream_url is None:
continue
streams.append((stream_name, stream_url))
return streams
def _real_extract(self, url):
display_id, video_id = re.match(self._VALID_URL, url).groups()
mobj = re.match(self._VALID_URL, url)
video_slug = mobj.group('slug')
video_id = video_slug.split('-')[-1]
webpage = self._download_webpage(url, video_id)
source = self._parse_json(
js_to_json(self._html_search_regex(
r'(?s)<script>window\.state\s*[:=]\s(\{.+\})\n(\w+).*(</script>).*',
webpage, 'WindowState')),
video_id)
window_state_json = self._html_search_regex(
r'(?s)<script>window\.state\s*[:=]\s(\{.+\})\n(\w+).*(</script>).*',
webpage, 'WindowState')
source = self._parse_json(js_to_json(window_state_json), video_id)
if not source:
raise ExtractorError('Cannot find source', expected=True)
config_dict = source['config']
video_dict = source['entities'][video_id]
stream_urls = self._get_stream_urls(video_dict)
thumbnails = []
for i in video_dict.get('imageInfo') or []:
thumbnails.append({
'url': urljoin(config_dict['imageBaseUrl'], i['url']),
'width': i['width'],
'height': i['height'],
})
title = self._og_search_title(webpage, fatal=True, default=video_dict['title'])
formats = []
get_quality = qualities(['main', 'base', 'high'])
for stream_type, quality, stream_url in self._get_stream_urls(video_dict):
format_url = url_or_none(urljoin(config_dict['videoCdnBaseUrl'], stream_url))
if not format_url:
continue
if stream_type == 'dash':
dash_formats = self._extract_mpd_formats(
format_url, video_id, mpd_id='dash-%s' % quality, headers={'Referer': url})
for frmt in dash_formats:
frmt['quality'] = get_quality(quality)
formats.extend(dash_formats)
elif stream_type == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, fatal=False,
m3u8_id='hls-%s' % quality, quality=get_quality(quality)))
headers = {'Referer': url}
for stream_name, stream_url in stream_urls:
if stream_name == 'dash':
format_url = url_or_none(urljoin(config_dict['videoCdnBaseUrl'], stream_url))
if not format_url:
continue
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', headers=headers))
self._sort_formats(formats)
return {
info = {
'id': video_id,
'display_id': display_id.replace('/', '-'),
'title': video_dict['title'] or self._og_search_title(webpage),
'formats': formats,
'title': title,
'description': video_dict.get('description'),
'season': try_get(video_dict, lambda x: x['container']['title']),
'series': try_get(video_dict, lambda x: x['container']['container']['title']),
'thumbnails': thumbnails,
'formats': formats
}
if video_dict.get('imageInfo'):
info['thumbnails'] = list(map(lambda i: dict(i, **{
'url': urljoin(config_dict['imageBaseUrl'], i['url'])
}), video_dict['imageInfo']))
if video_dict.get('webUrl'):
last_part = video_dict['webUrl'].split("/")[-1]
info['display_id'] = last_part.replace(video_id, "").rstrip("-")
return info

View File

@@ -10,7 +10,6 @@ from .adobepass import AdobePassIE
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
parse_age_limit,
parse_duration,
smuggle_url,
try_get,
@@ -19,7 +18,7 @@ from ..utils import (
)
class NBCIE(ThePlatformIE):
class NBCIE(AdobePassIE):
_VALID_URL = r'https?(?P<permalink>://(?:www\.)?nbc\.com/(?:classic-tv/)?[^/]+/video/[^/]+/(?P<id>n?\d+))'
_TESTS = [
@@ -133,9 +132,7 @@ class NBCIE(ThePlatformIE):
'manifest': 'm3u',
}
video_id = video_data['mpxGuid']
tp_path = 'NnzsPC/media/guid/%s/%s' % (video_data.get('mpxAccountId') or '2410887629', video_id)
tpm = self._download_theplatform_metadata(tp_path, video_id)
title = tpm.get('title') or video_data.get('secondaryTitle')
title = video_data['secondaryTitle']
if video_data.get('locked'):
resource = self._get_mvpd_resource(
video_data.get('resourceId') or 'nbcentertainment',
@@ -145,40 +142,18 @@ class NBCIE(ThePlatformIE):
theplatform_url = smuggle_url(update_url_query(
'http://link.theplatform.com/s/NnzsPC/media/guid/%s/%s' % (video_data.get('mpxAccountId') or '2410887629', video_id),
query), {'force_smil_url': True})
# Empty string or 0 can be valid values for these. So the check must be `is None`
description = video_data.get('description')
if description is None:
description = tpm.get('description')
episode_number = int_or_none(video_data.get('episodeNumber'))
if episode_number is None:
episode_number = int_or_none(tpm.get('nbcu$airOrder'))
rating = video_data.get('rating')
if rating is None:
try_get(tpm, lambda x: x['ratings'][0]['rating'])
season_number = int_or_none(video_data.get('seasonNumber'))
if season_number is None:
season_number = int_or_none(tpm.get('nbcu$seasonNumber'))
series = video_data.get('seriesShortTitle')
if series is None:
series = tpm.get('nbcu$seriesShortTitle')
tags = video_data.get('keywords')
if tags is None or len(tags) == 0:
tags = tpm.get('keywords')
return {
'_type': 'url_transparent',
'age_limit': parse_age_limit(rating),
'description': description,
'episode': title,
'episode_number': episode_number,
'id': video_id,
'ie_key': 'ThePlatform',
'season_number': season_number,
'series': series,
'tags': tags,
'title': title,
'url': theplatform_url,
'description': video_data.get('description'),
'tags': video_data.get('keywords'),
'season_number': int_or_none(video_data.get('seasonNumber')),
'episode_number': int_or_none(video_data.get('episodeNumber')),
'episode': title,
'series': video_data.get('seriesShortTitle'),
'ie_key': 'ThePlatform',
}

View File

@@ -8,7 +8,6 @@ import datetime
from .common import InfoExtractor
from ..postprocessor.ffmpeg import FFmpegPostProcessor
from ..compat import (
compat_str,
compat_parse_qs,
compat_urllib_parse_urlparse,
)
@@ -21,7 +20,6 @@ from ..utils import (
parse_duration,
parse_iso8601,
PostProcessingError,
str_or_none,
remove_start,
try_get,
unified_timestamp,
@@ -36,7 +34,7 @@ class NiconicoIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.nicovideo.jp/watch/sm22312215',
'md5': 'a5bad06f1347452102953f323c69da34s',
'md5': 'd1a75c0823e2f629128c43e1212760f9',
'info_dict': {
'id': 'sm22312215',
'ext': 'mp4',
@@ -205,7 +203,7 @@ class NiconicoIE(InfoExtractor):
'data-api-data="([^"]+)"', webpage,
'API data', default='{}'), video_id)
session_api_data = try_get(api_data, lambda x: x['media']['delivery']['movie']['session'])
session_api_data = try_get(api_data, lambda x: x['video']['dmcInfo']['session_api'])
session_api_endpoint = try_get(session_api_data, lambda x: x['urls'][0])
# ping
@@ -222,7 +220,7 @@ class NiconicoIE(InfoExtractor):
yesno = lambda x: 'yes' if x else 'no'
# m3u8 (encryption)
if 'encryption' in (try_get(api_data, lambda x: x['media']['delivery']['movie']) or {}):
if 'encryption' in try_get(api_data, lambda x: x['video']['dmcInfo']) or {}:
protocol = 'm3u8'
session_api_http_parameters = {
'parameters': {
@@ -246,8 +244,8 @@ class NiconicoIE(InfoExtractor):
session_api_http_parameters = {
'parameters': {
'http_output_download_parameters': {
'use_ssl': yesno(session_api_endpoint['isSsl']),
'use_well_known_port': yesno(session_api_endpoint['isWellKnownPort']),
'use_ssl': yesno(session_api_endpoint['is_ssl']),
'use_well_known_port': yesno(session_api_endpoint['is_well_known_port']),
}
}
}
@@ -260,15 +258,15 @@ class NiconicoIE(InfoExtractor):
data=json.dumps({
'session': {
'client_info': {
'player_id': session_api_data.get('playerId'),
'player_id': session_api_data.get('player_id'),
},
'content_auth': {
'auth_type': try_get(session_api_data, lambda x: x['authTypes'][session_api_data['protocols'][0]]),
'content_key_timeout': session_api_data.get('contentKeyTimeout'),
'auth_type': try_get(session_api_data, lambda x: x['auth_types'][session_api_data['protocols'][0]]),
'content_key_timeout': session_api_data.get('content_key_timeout'),
'service_id': 'nicovideo',
'service_user_id': session_api_data.get('serviceUserId')
'service_user_id': session_api_data.get('service_user_id')
},
'content_id': session_api_data.get('contentId'),
'content_id': session_api_data.get('content_id'),
'content_src_id_sets': [{
'content_src_ids': [{
'src_id_to_mux': {
@@ -281,7 +279,7 @@ class NiconicoIE(InfoExtractor):
'content_uri': '',
'keep_method': {
'heartbeat': {
'lifetime': session_api_data.get('heartbeatLifetime')
'lifetime': session_api_data.get('heartbeat_lifetime')
}
},
'priority': session_api_data.get('priority'),
@@ -291,7 +289,7 @@ class NiconicoIE(InfoExtractor):
'http_parameters': session_api_http_parameters
}
},
'recipe_id': session_api_data.get('recipeId'),
'recipe_id': session_api_data.get('recipe_id'),
'session_operation_auth': {
'session_operation_auth_by_signature': {
'signature': session_api_data.get('signature'),
@@ -310,7 +308,7 @@ class NiconicoIE(InfoExtractor):
'url': session_api_endpoint['url'] + '/' + session_response['data']['session']['id'] + '?_format=json&_method=PUT',
'data': json.dumps(session_response['data']),
# interval, convert milliseconds to seconds, then halve to make a buffer.
'interval': float_or_none(session_api_data.get('heartbeatLifetime'), scale=2000),
'interval': float_or_none(session_api_data.get('heartbeat_lifetime'), scale=2000),
}
return info_dict, heartbeat_info_dict
@@ -329,17 +327,15 @@ class NiconicoIE(InfoExtractor):
format_id = '-'.join(map(lambda s: remove_start(s['id'], 'archive_'), [video_quality, audio_quality]))
vdict = parse_format_id(video_quality['id'])
adict = parse_format_id(audio_quality['id'])
resolution = try_get(video_quality, lambda x: x['metadata']['resolution'], dict) or {'height': vdict.get('res')}
vbr = try_get(video_quality, lambda x: x['metadata']['bitrate'], float)
resolution = video_quality.get('resolution', {'height': vdict.get('res')})
return {
'url': '%s:%s/%s/%s' % (protocol, video_id, video_quality['id'], audio_quality['id']),
'format_id': format_id,
'format_note': 'DMC %s' % try_get(video_quality, lambda x: x['metadata']['label'], compat_str),
'ext': 'mp4', # Session API are used in HTML5, which always serves mp4
'vcodec': vdict.get('codec'),
'acodec': adict.get('codec'),
'vbr': float_or_none(vbr, 1000) or float_or_none(vdict.get('br')),
'vbr': float_or_none(video_quality.get('bitrate'), 1000) or float_or_none(vdict.get('br')),
'abr': float_or_none(audio_quality.get('bitrate'), 1000) or float_or_none(adict.get('br')),
'height': int_or_none(resolution.get('height', vdict.get('res'))),
'width': int_or_none(resolution.get('width')),
@@ -398,93 +394,92 @@ class NiconicoIE(InfoExtractor):
formats = []
# Get HTML5 videos info
quality_info = try_get(api_data, lambda x: x['media']['delivery']['movie'])
if not quality_info:
raise ExtractorError('The video can\'t downloaded.', expected=True)
try:
dmc_info = api_data['video']['dmcInfo']
except KeyError:
raise ExtractorError('The video can\'t downloaded.',
expected=True)
quality_info = dmc_info.get('quality')
for audio_quality in quality_info.get('audios') or {}:
for video_quality in quality_info.get('videos') or {}:
if not audio_quality.get('isAvailable') or not video_quality.get('isAvailable'):
if not audio_quality.get('available') or not video_quality.get('available'):
continue
formats.append(self._extract_format_for_quality(
api_data, video_id, audio_quality, video_quality))
# Get flv/swf info
timestamp = None
video_real_url = try_get(api_data, lambda x: x['video']['smileInfo']['url'])
if not video_real_url:
self.report_warning('Unable to obtain smile video information')
else:
is_economy = video_real_url.endswith('low')
is_economy = video_real_url.endswith('low')
if is_economy:
self.report_warning('Site is currently in economy mode! You will only have access to lower quality streams')
if is_economy:
self.report_warning('Site is currently in economy mode! You will only have access to lower quality streams')
# Invoking ffprobe to determine resolution
pp = FFmpegPostProcessor(self._downloader)
cookies = self._get_cookies('https://nicovideo.jp').output(header='', sep='; path=/; domain=nicovideo.jp;\n')
# Invoking ffprobe to determine resolution
pp = FFmpegPostProcessor(self._downloader)
cookies = self._get_cookies('https://nicovideo.jp').output(header='', sep='; path=/; domain=nicovideo.jp;\n')
self.to_screen('%s: %s' % (video_id, 'Checking smile format with ffprobe'))
self.to_screen('%s: %s' % (video_id, 'Checking smile format with ffprobe'))
try:
metadata = pp.get_metadata_object(video_real_url, ['-cookies', cookies])
except PostProcessingError as err:
raise ExtractorError(err.msg, expected=True)
try:
metadata = pp.get_metadata_object(video_real_url, ['-cookies', cookies])
except PostProcessingError as err:
raise ExtractorError(err.msg, expected=True)
v_stream = a_stream = {}
v_stream = a_stream = {}
# Some complex swf files doesn't have video stream (e.g. nm4809023)
for stream in metadata['streams']:
if stream['codec_type'] == 'video':
v_stream = stream
elif stream['codec_type'] == 'audio':
a_stream = stream
# Some complex swf files doesn't have video stream (e.g. nm4809023)
for stream in metadata['streams']:
if stream['codec_type'] == 'video':
v_stream = stream
elif stream['codec_type'] == 'audio':
a_stream = stream
# Community restricted videos seem to have issues with the thumb API not returning anything at all
filesize = int(
(get_video_info_xml('size_high') if not is_economy else get_video_info_xml('size_low'))
or metadata['format']['size']
)
extension = (
get_video_info_xml('movie_type')
or 'mp4' if 'mp4' in metadata['format']['format_name'] else metadata['format']['format_name']
)
# Community restricted videos seem to have issues with the thumb API not returning anything at all
filesize = int(
(get_video_info_xml('size_high') if not is_economy else get_video_info_xml('size_low'))
or metadata['format']['size']
)
extension = (
get_video_info_xml('movie_type')
or 'mp4' if 'mp4' in metadata['format']['format_name'] else metadata['format']['format_name']
)
# 'creation_time' tag on video stream of re-encoded SMILEVIDEO mp4 files are '1970-01-01T00:00:00.000000Z'.
timestamp = (
parse_iso8601(get_video_info_web('first_retrieve'))
or unified_timestamp(get_video_info_web('postedDateTime'))
)
metadata_timestamp = (
parse_iso8601(try_get(v_stream, lambda x: x['tags']['creation_time']))
or timestamp if extension != 'mp4' else 0
)
# 'creation_time' tag on video stream of re-encoded SMILEVIDEO mp4 files are '1970-01-01T00:00:00.000000Z'.
timestamp = (
parse_iso8601(get_video_info_web('first_retrieve'))
or unified_timestamp(get_video_info_web('postedDateTime'))
)
metadata_timestamp = (
parse_iso8601(try_get(v_stream, lambda x: x['tags']['creation_time']))
or timestamp if extension != 'mp4' else 0
)
# According to compconf, smile videos from pre-2017 are always better quality than their DMC counterparts
smile_threshold_timestamp = parse_iso8601('2016-12-08T00:00:00+09:00')
# According to compconf, smile videos from pre-2017 are always better quality than their DMC counterparts
smile_threshold_timestamp = parse_iso8601('2016-12-08T00:00:00+09:00')
is_source = timestamp < smile_threshold_timestamp or metadata_timestamp > 0
is_source = timestamp < smile_threshold_timestamp or metadata_timestamp > 0
# If movie file size is unstable, old server movie is not source movie.
if filesize > 1:
formats.append({
'url': video_real_url,
'format_id': 'smile' if not is_economy else 'smile_low',
'format_note': 'SMILEVIDEO source' if not is_economy else 'SMILEVIDEO low quality',
'ext': extension,
'container': extension,
'vcodec': v_stream.get('codec_name'),
'acodec': a_stream.get('codec_name'),
# Some complex swf files doesn't have total bit rate metadata (e.g. nm6049209)
'tbr': int_or_none(metadata['format'].get('bit_rate'), scale=1000),
'vbr': int_or_none(v_stream.get('bit_rate'), scale=1000),
'abr': int_or_none(a_stream.get('bit_rate'), scale=1000),
'height': int_or_none(v_stream.get('height')),
'width': int_or_none(v_stream.get('width')),
'source_preference': 5 if not is_economy else -2,
'quality': 5 if is_source and not is_economy else None,
'filesize': filesize
})
# If movie file size is unstable, old server movie is not source movie.
if filesize > 1:
formats.append({
'url': video_real_url,
'format_id': 'smile' if not is_economy else 'smile_low',
'format_note': 'SMILEVIDEO source' if not is_economy else 'SMILEVIDEO low quality',
'ext': extension,
'container': extension,
'vcodec': v_stream.get('codec_name'),
'acodec': a_stream.get('codec_name'),
# Some complex swf files doesn't have total bit rate metadata (e.g. nm6049209)
'tbr': int_or_none(metadata['format'].get('bit_rate'), scale=1000),
'vbr': int_or_none(v_stream.get('bit_rate'), scale=1000),
'abr': int_or_none(a_stream.get('bit_rate'), scale=1000),
'height': int_or_none(v_stream.get('height')),
'width': int_or_none(v_stream.get('width')),
'source_preference': 5 if not is_economy else -2,
'quality': 5 if is_source and not is_economy else None,
'filesize': filesize
})
if len(formats) == 0:
raise ExtractorError('Unable to find video info.')
@@ -492,12 +487,13 @@ class NiconicoIE(InfoExtractor):
self._sort_formats(formats)
# Start extracting information
title = (
get_video_info_web(['originalTitle', 'title'])
or self._og_search_title(webpage, default=None)
or self._html_search_regex(
title = get_video_info_web('originalTitle')
if not title:
title = self._og_search_title(webpage, default=None)
if not title:
title = self._html_search_regex(
r'<span[^>]+class="videoHeaderTitle"[^>]*>([^<]+)</span>',
webpage, 'video title'))
webpage, 'video title')
watch_api_data_string = self._html_search_regex(
r'<div[^>]+id="watchAPIDataContainer"[^>]+>([^<]+)</div>',
@@ -521,7 +517,6 @@ class NiconicoIE(InfoExtractor):
timestamp = parse_iso8601(
video_detail['postedAt'].replace('/', '-'),
delimiter=' ', timezone=datetime.timedelta(hours=9))
timestamp = timestamp or try_get(api_data, lambda x: parse_iso8601(x['video']['registeredAt']))
view_count = int_or_none(get_video_info_web(['view_counter', 'viewCount']))
if not view_count:
@@ -530,16 +525,11 @@ class NiconicoIE(InfoExtractor):
webpage, 'view count', default=None)
if match:
view_count = int_or_none(match.replace(',', ''))
view_count = (
view_count
or video_detail.get('viewCount')
or try_get(api_data, lambda x: x['video']['count']['view']))
comment_count = (
int_or_none(get_video_info_web('comment_num'))
or video_detail.get('commentCount')
or try_get(api_data, lambda x: x['video']['count']['comment']))
view_count = view_count or video_detail.get('viewCount')
comment_count = (int_or_none(get_video_info_web('comment_num'))
or video_detail.get('commentCount')
or try_get(api_data, lambda x: x['thread']['commentCount']))
if not comment_count:
match = self._html_search_regex(
r'>Comments: <strong[^>]*>([^<]+)</strong>',
@@ -569,7 +559,7 @@ class NiconicoIE(InfoExtractor):
# Note: cannot use api_data.get('owner', {}) because owner may be set to "null"
# in the JSON, which will cause None to be returned instead of {}.
owner = try_get(api_data, lambda x: x.get('owner'), dict) or {}
uploader_id = str_or_none(
uploader_id = (
get_video_info_web(['ch_id', 'user_id'])
or owner.get('id')
or channel_id
@@ -599,7 +589,7 @@ class NiconicoIE(InfoExtractor):
class NiconicoPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/(?:user/\d+/|my/)?mylist/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/(?:user/\d+/)?mylist/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.nicovideo.jp/mylist/27411728',
@@ -657,40 +647,3 @@ class NiconicoPlaylistIE(InfoExtractor):
'uploader_id': uploader_id,
'entries': OnDemandPagedList(pagefunc, 25),
}
class NiconicoUserIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/user/(?P<id>\d+)/?(?:$|[#?])'
_TEST = {
'url': 'https://www.nicovideo.jp/user/419948',
'info_dict': {
'id': '419948',
},
'playlist_mincount': 101,
}
_API_URL = "https://nvapi.nicovideo.jp/v1/users/%s/videos?sortKey=registeredAt&sortOrder=desc&pageSize=%s&page=%s"
_api_headers = {
'X-Frontend-ID': '6',
'X-Frontend-Version': '0',
'X-Niconico-Language': 'en-us'
}
_PAGE_SIZE = 100
def _entries(self, list_id, ):
total_count = 1
count = page_num = 0
while count < total_count:
json_parsed = self._download_json(
self._API_URL % (list_id, self._PAGE_SIZE, page_num + 1), list_id,
headers=self._api_headers,
note='Downloading JSON metadata%s' % (' page %d' % page_num if page_num else ''))
if not page_num:
total_count = int_or_none(json_parsed['data'].get('totalCount'))
for entry in json_parsed["data"]["items"]:
count += 1
yield self.url_result('https://www.nicovideo.jp/watch/%s' % entry['id'])
page_num += 1
def _real_extract(self, url):
list_id = self._match_id(url)
return self.playlist_result(self._entries(list_id), list_id, ie=NiconicoIE.ie_key())

View File

@@ -1,148 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
str_or_none,
try_get,
)
class PalcoMP3BaseIE(InfoExtractor):
_GQL_QUERY_TMPL = '''{
artist(slug: "%s") {
%s
}
}'''
_ARTIST_FIELDS_TMPL = '''music(slug: "%%s") {
%s
}'''
_MUSIC_FIELDS = '''duration
hls
mp3File
musicID
plays
title'''
def _call_api(self, artist_slug, artist_fields):
return self._download_json(
'https://www.palcomp3.com.br/graphql/', artist_slug, query={
'query': self._GQL_QUERY_TMPL % (artist_slug, artist_fields),
})['data']
def _parse_music(self, music):
music_id = compat_str(music['musicID'])
title = music['title']
formats = []
hls_url = music.get('hls')
if hls_url:
formats.append({
'url': hls_url,
'protocol': 'm3u8_native',
'ext': 'mp4',
})
mp3_file = music.get('mp3File')
if mp3_file:
formats.append({
'url': mp3_file,
})
return {
'id': music_id,
'title': title,
'formats': formats,
'duration': int_or_none(music.get('duration')),
'view_count': int_or_none(music.get('plays')),
}
def _real_initialize(self):
self._ARTIST_FIELDS_TMPL = self._ARTIST_FIELDS_TMPL % self._MUSIC_FIELDS
def _real_extract(self, url):
artist_slug, music_slug = re.match(self._VALID_URL, url).groups()
artist_fields = self._ARTIST_FIELDS_TMPL % music_slug
music = self._call_api(artist_slug, artist_fields)['artist']['music']
return self._parse_music(music)
class PalcoMP3IE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:song'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/nossas-composicoes-cuida-bem-dela/',
'md5': '99fd6405b2d8fd589670f6db1ba3b358',
'info_dict': {
'id': '3162927',
'ext': 'mp3',
'title': 'Nossas Composições - CUIDA BEM DELA',
'duration': 210,
'view_count': int,
}
}]
@classmethod
def suitable(cls, url):
return False if PalcoMP3VideoIE.suitable(url) else super(PalcoMP3IE, cls).suitable(url)
class PalcoMP3ArtistIE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:artist'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.palcomp3.com.br/condedoforro/',
'info_dict': {
'id': '358396',
'title': 'Conde do Forró',
},
'playlist_mincount': 188,
}]
_ARTIST_FIELDS_TMPL = '''artistID
musics {
nodes {
%s
}
}
name'''
@ classmethod
def suitable(cls, url):
return False if re.match(PalcoMP3IE._VALID_URL, url) else super(PalcoMP3ArtistIE, cls).suitable(url)
def _real_extract(self, url):
artist_slug = self._match_id(url)
artist = self._call_api(artist_slug, self._ARTIST_FIELDS_TMPL)['artist']
def entries():
for music in (try_get(artist, lambda x: x['musics']['nodes'], list) or []):
yield self._parse_music(music)
return self.playlist_result(
entries(), str_or_none(artist.get('artistID')), artist.get('name'))
class PalcoMP3VideoIE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:video'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)/?#clipe'
_TESTS = [{
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/maiara-e-maraisa-voce-faz-falta-aqui-ao-vivo-em-vicosa-mg/#clipe',
'add_ie': ['Youtube'],
'info_dict': {
'id': '_pD1nR2qqPg',
'ext': 'mp4',
'title': 'Maiara e Maraisa - Você Faz Falta Aqui - DVD Ao Vivo Em Campo Grande',
'description': 'md5:7043342c09a224598e93546e98e49282',
'upload_date': '20161107',
'uploader_id': 'maiaramaraisaoficial',
'uploader': 'Maiara e Maraisa',
}
}]
_MUSIC_FIELDS = 'youtubeID'
def _parse_music(self, music):
youtube_id = music['youtubeID']
return self.url_result(youtube_id, 'Youtube', youtube_id)

View File

@@ -599,13 +599,11 @@ class PeerTubeIE(InfoExtractor):
else:
age_limit = None
webpage_url = 'https://%s/videos/watch/%s' % (host, video_id)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': urljoin(webpage_url, video.get('thumbnailPath')),
'thumbnail': urljoin(url, video.get('thumbnailPath')),
'timestamp': unified_timestamp(video.get('publishedAt')),
'uploader': account_data('displayName', compat_str),
'uploader_id': str_or_none(account_data('id', int)),
@@ -623,6 +621,5 @@ class PeerTubeIE(InfoExtractor):
'tags': try_get(video, lambda x: x['tags'], list),
'categories': categories,
'formats': formats,
'subtitles': subtitles,
'webpage_url': webpage_url,
'subtitles': subtitles
}

View File

@@ -1,15 +1,22 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
js_to_json,
try_get,
update_url_query,
urlencode_postdata,
)
class PicartoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)(?:/(?P<token>[a-zA-Z0-9]+))?'
_TEST = {
'url': 'https://picarto.tv/Setz',
'info_dict': {
@@ -27,46 +34,65 @@ class PicartoIE(InfoExtractor):
return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url)
def _real_extract(self, url):
channel_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
channel_id = mobj.group('id')
data = self._download_json(
'https://ptvintern.picarto.tv/ptvapi', channel_id, query={
'query': '''{
channel(name: "%s") {
adult
id
online
stream_name
title
}
getLoadBalancerUrl(channel_name: "%s") {
url
}
}''' % (channel_id, channel_id),
})['data']
metadata = data['channel']
metadata = self._download_json(
'https://api.picarto.tv/v1/channel/name/' + channel_id,
channel_id)
if metadata.get('online') == 0:
if metadata.get('online') is False:
raise ExtractorError('Stream is offline', expected=True)
title = metadata['title']
cdn_data = self._download_json(
data['getLoadBalancerUrl']['url'] + '/stream/json_' + metadata['stream_name'] + '.js',
channel_id, 'Downloading load balancing info')
'https://picarto.tv/process/channel', channel_id,
data=urlencode_postdata({'loadbalancinginfo': channel_id}),
note='Downloading load balancing info')
token = mobj.group('token') or 'public'
params = {
'con': int(time.time() * 1000),
'token': token,
}
prefered_edge = cdn_data.get('preferedEdge')
formats = []
for source in (cdn_data.get('source') or []):
source_url = source.get('url')
if not source_url:
for edge in cdn_data['edges']:
edge_ep = edge.get('ep')
if not edge_ep or not isinstance(edge_ep, compat_str):
continue
source_type = source.get('type')
if source_type == 'html5/application/vnd.apple.mpegurl':
formats.extend(self._extract_m3u8_formats(
source_url, channel_id, 'mp4', m3u8_id='hls', fatal=False))
elif source_type == 'html5/video/mp4':
formats.append({
'url': source_url,
})
edge_id = edge.get('id')
for tech in cdn_data['techs']:
tech_label = tech.get('label')
tech_type = tech.get('type')
preference = 0
if edge_id == prefered_edge:
preference += 1
format_id = []
if edge_id:
format_id.append(edge_id)
if tech_type == 'application/x-mpegurl' or tech_label == 'HLS':
format_id.append('hls')
formats.extend(self._extract_m3u8_formats(
update_url_query(
'https://%s/hls/%s/index.m3u8'
% (edge_ep, channel_id), params),
channel_id, 'mp4', quality=preference,
m3u8_id='-'.join(format_id), fatal=False))
continue
elif tech_type == 'video/mp4' or tech_label == 'MP4':
format_id.append('mp4')
formats.append({
'url': update_url_query(
'https://%s/mp4/%s.mp4' % (edge_ep, channel_id),
params),
'format_id': '-'.join(format_id),
'quality': preference,
})
else:
# rtmp format does not seem to work
continue
self._sort_formats(formats)
mature = metadata.get('adult')
@@ -77,10 +103,10 @@ class PicartoIE(InfoExtractor):
return {
'id': channel_id,
'title': self._live_title(title.strip()),
'title': self._live_title(metadata.get('title') or channel_id),
'is_live': True,
'thumbnail': try_get(metadata, lambda x: x['thumbnails']['web']),
'channel': channel_id,
'channel_id': metadata.get('id'),
'channel_url': 'https://picarto.tv/%s' % channel_id,
'age_limit': age_limit,
'formats': formats,

View File

@@ -31,7 +31,6 @@ class PinterestBaseIE(InfoExtractor):
title = (data.get('title') or data.get('grid_title') or video_id).strip()
urls = []
formats = []
duration = None
if extract_formats:
@@ -39,9 +38,8 @@ class PinterestBaseIE(InfoExtractor):
if not isinstance(format_dict, dict):
continue
format_url = url_or_none(format_dict.get('url'))
if not format_url or format_url in urls:
if not format_url:
continue
urls.append(format_url)
duration = float_or_none(format_dict.get('duration'), scale=1000)
ext = determine_ext(format_url)
if 'hls' in format_id.lower() or ext == 'm3u8':

View File

@@ -1,164 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import uuid
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
try_get,
url_or_none,
)
class PlutoTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pluto\.tv/on-demand/(?P<video_type>movies|series)/(?P<slug>.*)/?$'
_INFO_URL = 'https://service-vod.clusters.pluto.tv/v3/vod/slugs/'
_INFO_QUERY_PARAMS = {
'appName': 'web',
'appVersion': 'na',
'clientID': compat_str(uuid.uuid1()),
'clientModelNumber': 'na',
'serverSideAds': 'false',
'deviceMake': 'unknown',
'deviceModel': 'web',
'deviceType': 'web',
'deviceVersion': 'unknown',
'sid': compat_str(uuid.uuid1()),
}
_TESTS = [
{
'url': 'https://pluto.tv/on-demand/series/i-love-money/season/2/episode/its-in-the-cards-2009-2-3',
'md5': 'ebcdd8ed89aaace9df37924f722fd9bd',
'info_dict': {
'id': '5de6c598e9379ae4912df0a8',
'ext': 'mp4',
'title': 'It\'s In The Cards',
'episode': 'It\'s In The Cards',
'description': 'The teams face off against each other in a 3-on-2 soccer showdown. Strategy comes into play, though, as each team gets to select their opposing teams two defenders.',
'series': 'I Love Money',
'season_number': 2,
'episode_number': 3,
'duration': 3600,
}
},
{
'url': 'https://pluto.tv/on-demand/series/i-love-money/season/1/',
'playlist_count': 11,
'info_dict': {
'id': '5de6c582e9379ae4912dedbd',
'title': 'I Love Money - Season 1',
}
},
{
'url': 'https://pluto.tv/on-demand/series/i-love-money/',
'playlist_count': 26,
'info_dict': {
'id': '5de6c582e9379ae4912dedbd',
'title': 'I Love Money',
}
},
{
'url': 'https://pluto.tv/on-demand/movies/arrival-2015-1-1',
'md5': '3cead001d317a018bf856a896dee1762',
'info_dict': {
'id': '5e83ac701fa6a9001bb9df24',
'ext': 'mp4',
'title': 'Arrival',
'description': 'When mysterious spacecraft touch down across the globe, an elite team - led by expert translator Louise Banks (Academy Award® nominee Amy Adams) races against time to decipher their intent.',
'duration': 9000,
}
},
]
def _to_ad_free_formats(self, video_id, formats):
ad_free_formats = []
m3u8_urls = set()
for format in formats:
res = self._download_webpage(
format.get('url'), video_id, note='Downloading m3u8 playlist',
fatal=False)
if not res:
continue
first_segment_url = re.search(
r'^(https?://.*/)0\-(end|[0-9]+)/[^/]+\.ts$', res,
re.MULTILINE)
if not first_segment_url:
continue
m3u8_urls.add(
compat_urlparse.urljoin(first_segment_url.group(1), '0-end/master.m3u8'))
for m3u8_url in m3u8_urls:
ad_free_formats.extend(
self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(ad_free_formats)
return ad_free_formats
def _get_video_info(self, video_json, slug, series_name=None):
video_id = video_json.get('_id', slug)
formats = []
for video_url in try_get(video_json, lambda x: x['stitched']['urls'], list) or []:
if video_url.get('type') != 'hls':
continue
url = url_or_none(video_url.get('url'))
formats.extend(
self._extract_m3u8_formats(
url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
info = {
'id': video_id,
'formats': self._to_ad_free_formats(video_id, formats),
'title': video_json.get('name'),
'description': video_json.get('description'),
'duration': float_or_none(video_json.get('duration'), scale=1000),
}
if series_name:
info.update({
'series': series_name,
'episode': video_json.get('name'),
'season_number': int_or_none(video_json.get('season')),
'episode_number': int_or_none(video_json.get('number')),
})
return info
def _real_extract(self, url):
path = compat_urlparse.urlparse(url).path
path_components = path.split('/')
video_type = path_components[2]
info_slug = path_components[3]
video_json = self._download_json(self._INFO_URL + info_slug, info_slug,
query=self._INFO_QUERY_PARAMS)
if video_type == 'series':
series_name = video_json.get('name', info_slug)
season_number = int_or_none(try_get(path_components, lambda x: x[5]))
episode_slug = try_get(path_components, lambda x: x[7])
videos = []
for season in video_json['seasons']:
if season_number is not None and season_number != int_or_none(season.get('number')):
continue
for episode in season['episodes']:
if episode_slug is not None and episode_slug != episode.get('slug'):
continue
videos.append(self._get_video_info(episode, episode_slug, series_name))
if not videos:
raise ExtractorError('Failed to find any videos to extract')
if episode_slug is not None and len(videos) == 1:
return videos[0]
playlist_title = series_name
if season_number is not None:
playlist_title += ' - Season %d' % season_number
return self.playlist_result(videos,
playlist_id=video_json.get('_id', info_slug),
playlist_title=playlist_title)
return self._get_video_info(video_json, info_slug)

View File

@@ -167,7 +167,6 @@ class PornHubIE(PornHubBaseIE):
'params': {
'skip_download': True,
},
'skip': 'Video has been flagged for verification in accordance with our trust and safety policy',
}, {
# subtitles
'url': 'https://www.pornhub.com/view_video.php?viewkey=ph5af5fef7c2aa7',
@@ -266,8 +265,7 @@ class PornHubIE(PornHubBaseIE):
webpage = dl_webpage('pc')
error_msg = self._html_search_regex(
(r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
r'(?s)<section[^>]+class=["\']noVideo["\'][^>]*>(?P<error>.+?)</section>'),
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
webpage, 'error message', default=None, group='error')
if error_msg:
error_msg = re.sub(r'\s+', ' ', error_msg)
@@ -396,21 +394,6 @@ class PornHubIE(PornHubBaseIE):
upload_date = None
formats = []
def add_format(format_url, height=None):
tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', format_url)
if mobj:
if not height:
height = int(mobj.group('height'))
tbr = int(mobj.group('tbr'))
formats.append({
'url': format_url,
'format_id': '%dp' % height if height else None,
'height': height,
'tbr': tbr,
})
for video_url, height in video_urls:
if not upload_date:
upload_date = self._search_regex(
@@ -427,19 +410,18 @@ class PornHubIE(PornHubBaseIE):
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
if '/video/get_media' in video_url:
medias = self._download_json(video_url, video_id, fatal=False)
if isinstance(medias, list):
for media in medias:
if not isinstance(media, dict):
continue
video_url = url_or_none(media.get('videoUrl'))
if not video_url:
continue
height = int_or_none(media.get('quality'))
add_format(video_url, height)
continue
add_format(video_url)
tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
if mobj:
if not height:
height = int(mobj.group('height'))
tbr = int(mobj.group('tbr'))
formats.append({
'url': video_url,
'format_id': '%dp' % height if height else None,
'height': height,
'tbr': tbr,
})
self._sort_formats(formats)
video_uploader = self._html_search_regex(

View File

@@ -158,10 +158,6 @@ class RaiPlayIE(RaiBaseIE):
# subtitles at 'subtitlesArray' key (see #27698)
'url': 'https://www.raiplay.it/video/2020/12/Report---04-01-2021-2e90f1de-8eee-4de4-ac0e-78d21db5b600.html',
'only_matching': True,
}, {
# DRM protected
'url': 'https://www.raiplay.it/video/2020/09/Lo-straordinario-mondo-di-Zoey-S1E1-Lo-straordinario-potere-di-Zoey-ed493918-1d32-44b7-8454-862e473d00ff.html',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -170,14 +166,6 @@ class RaiPlayIE(RaiBaseIE):
media = self._download_json(
base + '.json', video_id, 'Downloading video JSON')
if not self._downloader.params.get('allow_unplayable_formats'):
if try_get(
media,
(lambda x: x['rights_management']['rights']['drm'],
lambda x: x['program_info']['rights_management']['rights']['drm']),
dict):
raise ExtractorError('This video is DRM protected.', expected=True)
title = media['name']
video = media['video']

View File

@@ -15,9 +15,6 @@ from ..utils import (
class RCSBaseIE(InfoExtractor):
# based on VideoPlayerLoader.prototype.getVideoSrc
# and VideoPlayerLoader.prototype.transformSrc from
# https://js2.corriereobjects.it/includes2013/LIBS/js/corriere_video.sjs
_ALL_REPLACE = {
'media2vam.corriere.it.edgesuite.net':
'media2vam-corriere-it.akamaized.net',
@@ -194,10 +191,10 @@ class RCSBaseIE(InfoExtractor):
urls.get('m3u8'), video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)
if urls.get('mp4'):
if not formats:
formats.append({
'format_id': 'http-mp4',
'url': urls['mp4']
'url': urls.get('mp4')
})
self._sort_formats(formats)
return formats
@@ -219,12 +216,10 @@ class RCSBaseIE(InfoExtractor):
video_data = None
# look for json video data url
json = self._search_regex(
r'''(?x)url\s*=\s*(["'])
(?P<url>
(?:https?:)?//video\.rcs\.it
/fragment-includes/video-includes/.+?\.json
)\1;''',
page, video_id, group='url', default=None)
r'''(?x)var url\s*=\s*["']((?:https?:)?
//video\.rcs\.it
/fragment-includes/video-includes/.+?\.json)["'];''',
page, video_id, default=None)
if json:
if json.startswith('//'):
json = 'https:%s' % json
@@ -232,16 +227,13 @@ class RCSBaseIE(InfoExtractor):
# if json url not found, look for json video data directly in the page
else:
# RCS normal pages and most of the embeds
json = self._search_regex(
r'[\s;]video\s*=\s*({[\s\S]+?})(?:;|,playlist=)',
page, video_id, default=None)
if not json and 'video-embed' in url:
page = self._download_webpage(url.replace('video-embed', 'video-json'), video_id)
json = self._search_regex(
r'##start-video##({[\s\S]+?})##end-video##',
page, video_id, default=None)
if not json:
if json:
video_data = self._parse_json(
json, video_id, transform_source=js_to_json)
else:
# if no video data found try search for iframes
emb = RCSEmbedsIE._extract_url(page)
if emb:
@@ -250,9 +242,6 @@ class RCSBaseIE(InfoExtractor):
'url': emb,
'ie_key': RCSEmbedsIE.ie_key()
}
if json:
video_data = self._parse_json(
json, video_id, transform_source=js_to_json)
if not video_data:
raise ExtractorError('Video data not found in the page')
@@ -261,8 +250,7 @@ class RCSBaseIE(InfoExtractor):
self._get_video_src(video_data), video_id)
description = (video_data.get('description')
or clean_html(video_data.get('htmlDescription'))
or self._html_search_meta('description', page))
or clean_html(video_data.get('htmlDescription')))
uploader = video_data.get('provider') or mobj.group('cdn')
return {
@@ -295,7 +283,6 @@ class RCSEmbedsIE(RCSBaseIE):
'uploader': 'rcs.it',
}
}, {
# redownload the page changing 'video-embed' in 'video-json'
'url': 'https://video.gazzanet.gazzetta.it/video-embed/gazzanet-mo05-0000260789',
'md5': 'a043e3fecbe4d9ed7fc5d888652a5440',
'info_dict': {
@@ -372,7 +359,6 @@ class RCSIE(RCSBaseIE):
'uploader': 'Corriere Tv',
}
}, {
# video data inside iframe
'url': 'https://viaggi.corriere.it/video/norvegia-il-nuovo-ponte-spettacolare-sopra-la-cascata-di-voringsfossen/',
'md5': 'da378e4918d2afbf7d61c35abb948d4c',
'info_dict': {
@@ -403,15 +389,15 @@ class RCSVariousIE(RCSBaseIE):
(?P<cdn>
leitv\.it|
youreporter\.it
)/(?:[^/]+/)?(?P<id>[^/]+?)(?:$|\?|/)'''
)/(?:video/)?(?P<id>[^/]+?)(?:$|\?|/)'''
_TESTS = [{
'url': 'https://www.leitv.it/benessere/mal-di-testa-come-combatterlo-ed-evitarne-la-comparsa/',
'md5': '92b4e63667b8f95acb0a04da25ae28a1',
'url': 'https://www.leitv.it/video/marmellata-di-ciliegie-fatta-in-casa/',
'md5': '618aaabac32152199c1af86784d4d554',
'info_dict': {
'id': 'mal-di-testa-come-combatterlo-ed-evitarne-la-comparsa',
'id': 'marmellata-di-ciliegie-fatta-in-casa',
'ext': 'mp4',
'title': 'Cervicalgia e mal di testa, il video con i suggerimenti dell\'esperto',
'description': 'md5:ae21418f34cee0b8d02a487f55bcabb5',
'title': 'Marmellata di ciliegie fatta in casa',
'description': 'md5:89133864d6aad456dbcf6e7a29f86263',
'uploader': 'leitv.it',
}
}, {

View File

@@ -2,9 +2,8 @@
from __future__ import unicode_literals
import base64
import io
import re
import sys
import time
from .common import InfoExtractor
from ..compat import (
@@ -15,13 +14,56 @@ from ..utils import (
determine_ext,
ExtractorError,
float_or_none,
qualities,
remove_end,
remove_start,
sanitized_Request,
std_headers,
)
_bytes_to_chr = (lambda x: x) if sys.version_info[0] == 2 else (lambda x: map(chr, x))
def _decrypt_url(png):
encrypted_data = compat_b64decode(png)
text_index = encrypted_data.find(b'tEXt')
text_chunk = encrypted_data[text_index - 4:]
length = compat_struct_unpack('!I', text_chunk[:4])[0]
# Use bytearray to get integers when iterating in both python 2.x and 3.x
data = bytearray(text_chunk[8:8 + length])
data = [chr(b) for b in data if b != 0]
hash_index = data.index('#')
alphabet_data = data[:hash_index]
url_data = data[hash_index + 1:]
if url_data[0] == 'H' and url_data[3] == '%':
# remove useless HQ%% at the start
url_data = url_data[4:]
alphabet = []
e = 0
d = 0
for l in alphabet_data:
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in url_data:
if f == 0:
l = int(letter) * 10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
return url
class RTVEALaCartaIE(InfoExtractor):
@@ -37,31 +79,28 @@ class RTVEALaCartaIE(InfoExtractor):
'ext': 'mp4',
'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia',
'duration': 5024.566,
'series': 'Balonmano',
},
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}, {
'note': 'Live stream',
'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/',
'info_dict': {
'id': '1694255',
'ext': 'mp4',
'title': 're:^24H LIVE [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True,
},
'params': {
'skip_download': 'live stream',
'ext': 'flv',
'title': 'TODO',
},
'skip': 'The f4m manifest can\'t be used yet',
}, {
'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/',
'md5': 'd850f3c8731ea53952ebab489cf81cbf',
'md5': 'e55e162379ad587e9640eda4f7353c0f',
'info_dict': {
'id': '4236788',
'ext': 'mp4',
'title': 'Servir y proteger - Capítulo 104',
'title': 'Servir y proteger - Capítulo 104 ',
'duration': 3222.0,
},
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
'params': {
'skip_download': True, # requires ffmpeg
},
}, {
'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve',
'only_matching': True,
@@ -72,102 +111,58 @@ class RTVEALaCartaIE(InfoExtractor):
def _real_initialize(self):
user_agent_b64 = base64.b64encode(std_headers['User-Agent'].encode('utf-8')).decode('utf-8')
self._manager = self._download_json(
manager_info = self._download_json(
'http://www.rtve.es/odin/loki/' + user_agent_b64,
None, 'Fetching manager info')['manager']
@staticmethod
def _decrypt_url(png):
encrypted_data = io.BytesIO(compat_b64decode(png)[8:])
while True:
length = compat_struct_unpack('!I', encrypted_data.read(4))[0]
chunk_type = encrypted_data.read(4)
if chunk_type == b'IEND':
break
data = encrypted_data.read(length)
if chunk_type == b'tEXt':
alphabet_data, text = data.split(b'\0')
quality, url_data = text.split(b'%%')
alphabet = []
e = 0
d = 0
for l in _bytes_to_chr(alphabet_data):
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in _bytes_to_chr(url_data):
if f == 0:
l = int(letter) * 10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
yield quality.decode(), url
encrypted_data.read(4) # CRC
def _extract_png_formats(self, video_id):
png = self._download_webpage(
'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id),
video_id, 'Downloading url information', query={'q': 'v2'})
q = qualities(['Media', 'Alta', 'HQ', 'HD_READY', 'HD_FULL'])
formats = []
for quality, video_url in self._decrypt_url(png):
ext = determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, 'dash', fatal=False))
else:
formats.append({
'format_id': quality,
'quality': q(quality),
'url': video_url,
})
self._sort_formats(formats)
return formats
None, 'Fetching manager info')
self._manager = manager_info['manager']
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info = self._download_json(
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
video_id)['page']['items'][0]
if info['state'] == 'DESPU':
raise ExtractorError('The video is no longer available', expected=True)
title = info['title'].strip()
formats = self._extract_png_formats(video_id)
title = info['title']
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id)
png_request = sanitized_Request(png_url)
png_request.add_header('Referer', url)
png = self._download_webpage(png_request, video_id, 'Downloading url information')
video_url = _decrypt_url(png)
ext = determine_ext(video_url)
formats = []
if not video_url.endswith('.f4m') and ext != 'm3u8':
if '?' not in video_url:
video_url = video_url.replace('resources/', 'auth/resources/')
video_url = video_url.replace('.net.rtve', '.multimedia.cdn.rtve')
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id='hds', fatal=False))
else:
formats.append({
'url': video_url,
})
self._sort_formats(formats)
subtitles = None
sbt_file = info.get('sbtFile')
if sbt_file:
subtitles = self.extract_subtitles(video_id, sbt_file)
is_live = info.get('live') is True
if info.get('sbtFile') is not None:
subtitles = self.extract_subtitles(video_id, info['sbtFile'])
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'title': title,
'formats': formats,
'thumbnail': info.get('image'),
'page_url': url,
'subtitles': subtitles,
'duration': float_or_none(info.get('duration'), 1000),
'is_live': is_live,
'series': info.get('programTitle'),
'duration': float_or_none(info.get('duration'), scale=1000),
}
def _get_subtitles(self, video_id, sub_file):
@@ -179,26 +174,48 @@ class RTVEALaCartaIE(InfoExtractor):
for s in subs)
class RTVEInfantilIE(RTVEALaCartaIE):
class RTVEInfantilIE(InfoExtractor):
IE_NAME = 'rtve.es:infantil'
IE_DESC = 'RTVE infantil'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/[^/]+/video/[^/]+/(?P<id>[0-9]+)/'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/(?P<show>[^/]*)/video/(?P<short_title>[^/]*)/(?P<id>[0-9]+)/'
_TESTS = [{
'url': 'http://www.rtve.es/infantil/serie/cleo/video/maneras-vivir/3040283/',
'md5': '5747454717aedf9f9fdf212d1bcfc48d',
'md5': '915319587b33720b8e0357caaa6617e6',
'info_dict': {
'id': '3040283',
'ext': 'mp4',
'title': 'Maneras de vivir',
'thumbnail': r're:https?://.+/1426182947956\.JPG',
'thumbnail': 'http://www.rtve.es/resources/jpg/6/5/1426182947956.JPG',
'duration': 357.958,
},
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
}]
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._download_json(
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
video_id)['page']['items'][0]
class RTVELiveIE(RTVEALaCartaIE):
webpage = self._download_webpage(url, video_id)
vidplayer_id = self._search_regex(
r' id="vidplayer([0-9]+)"', webpage, 'internal video ID')
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % vidplayer_id
png = self._download_webpage(png_url, video_id, 'Downloading url information')
video_url = _decrypt_url(png)
return {
'id': video_id,
'ext': 'mp4',
'title': info['title'],
'url': video_url,
'thumbnail': info.get('image'),
'duration': float_or_none(info.get('duration'), scale=1000),
}
class RTVELiveIE(InfoExtractor):
IE_NAME = 'rtve.es:live'
IE_DESC = 'RTVE.es live streams'
_VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
@@ -208,7 +225,7 @@ class RTVELiveIE(RTVEALaCartaIE):
'info_dict': {
'id': 'la-1',
'ext': 'mp4',
'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2}Z[0-9]{6}$',
},
'params': {
'skip_download': 'live stream',
@@ -217,22 +234,29 @@ class RTVELiveIE(RTVEALaCartaIE):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
start_time = time.gmtime()
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es')
title = remove_start(title, 'Estoy viendo ')
title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time)
vidplayer_id = self._search_regex(
(r'playerId=player([0-9]+)',
r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)',
r'data-id=["\'](\d+)'),
webpage, 'internal video ID')
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/amonet/videos/%s.png' % vidplayer_id
png = self._download_webpage(png_url, video_id, 'Downloading url information')
m3u8_url = _decrypt_url(png)
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title(title),
'formats': self._extract_png_formats(vidplayer_id),
'title': title,
'formats': formats,
'is_live': True,
}

View File

@@ -10,7 +10,7 @@ from ..utils import (
class SBSIE(InfoExtractor):
IE_DESC = 'sbs.com.au'
_VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand(?:/video/(?:single/)?|.*?\bplay=|/watch/)|news/(?:embeds/)?video/)(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand(?:/video/(?:single/)?|.*?\bplay=)|news/(?:embeds/)?video/)(?P<id>[0-9]+)'
_TESTS = [{
# Original URL is handled by the generic IE which finds the iframe:
@@ -43,9 +43,6 @@ class SBSIE(InfoExtractor):
}, {
'url': 'https://www.sbs.com.au/news/embeds/video/1840778819866',
'only_matching': True,
}, {
'url': 'https://www.sbs.com.au/ondemand/watch/1698704451971',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -2,18 +2,12 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
get_element_by_class,
int_or_none,
remove_start,
strip_or_none,
unified_strdate,
)
from ..utils import js_to_json
class ScreencastOMaticIE(InfoExtractor):
_VALID_URL = r'https?://screencast-o-matic\.com/(?:(?:watch|player)/|embed\?.*?\bsc=)(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)'
_TEST = {
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',
'md5': '483583cb80d92588f15ccbedd90f0c18',
'info_dict': {
@@ -22,30 +16,22 @@ class ScreencastOMaticIE(InfoExtractor):
'title': 'Welcome to 3-4 Philosophy @ DECV!',
'thumbnail': r're:^https?://.*\.jpg$',
'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
'duration': 369,
'upload_date': '20141216',
'duration': 369.163,
}
}, {
'url': 'http://screencast-o-matic.com/player/c2lD3BeOPl',
'only_matching': True,
}, {
'url': 'http://screencast-o-matic.com/embed?ff=true&sc=cbV2r4Q5TL&fromPH=true&a=1',
'only_matching': True,
}]
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://screencast-o-matic.com/player/' + video_id, video_id)
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
info.update({
'id': video_id,
'title': get_element_by_class('overlayTitle', webpage),
'description': strip_or_none(get_element_by_class('overlayDescription', webpage)) or None,
'duration': int_or_none(self._search_regex(
r'player\.duration\s*=\s*function\(\)\s*{\s*return\s+(\d+);\s*};',
webpage, 'duration', default=None)),
'upload_date': unified_strdate(remove_start(
get_element_by_class('overlayPublished', webpage), 'Published: ')),
webpage = self._download_webpage(url, video_id)
jwplayer_data = self._parse_json(
self._search_regex(
r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);", webpage, 'setup code'),
video_id, transform_source=js_to_json)
info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False)
info_dict.update({
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
})
return info
return info_dict

View File

@@ -51,16 +51,13 @@ class ShahidIE(ShahidBaseIE):
_NETRC_MACHINE = 'shahid'
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)'
_TESTS = [{
'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AA%D8%AD%D9%81-%D8%A7%D9%84%D8%AF%D8%AD%D9%8A%D8%AD-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-816924',
'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AC%D9%84%D8%B3-%D8%A7%D9%84%D8%B4%D8%A8%D8%A7%D8%A8-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-275286',
'info_dict': {
'id': '816924',
'id': '275286',
'ext': 'mp4',
'title': 'متحف الدحيح الموسم 1 كليب 1',
'timestamp': 1602806400,
'upload_date': '20201016',
'description': 'برومو',
'duration': 22,
'categories': ['كوميديا'],
'title': 'مجلس الشباب الموسم 1 كليب 1',
'timestamp': 1506988800,
'upload_date': '20171003',
},
'params': {
# m3u8 download
@@ -112,15 +109,12 @@ class ShahidIE(ShahidBaseIE):
page_type = 'episode'
playout = self._call_api(
'playout/new/url/' + video_id, video_id)['playout']
'playout/url/' + video_id, video_id)['playout']
if not self._downloader.params.get('allow_unplayable_formats') and playout.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
formats = self._extract_m3u8_formats(re.sub(
# https://docs.aws.amazon.com/mediapackage/latest/ug/manifest-filtering.html
r'aws\.manifestfilter=[\w:;,-]+&?',
'', playout['url']), video_id, 'mp4')
formats = self._extract_m3u8_formats(playout['url'], video_id, 'mp4')
self._sort_formats(formats)
# video = self._call_api(

View File

@@ -6,9 +6,9 @@ from .mtv import MTVServicesInfoExtractor
class SouthParkIE(MTVServicesInfoExtractor):
IE_NAME = 'southpark.cc.com'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark(?:\.cc|studios)\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
_TESTS = [{
'url': 'http://southpark.cc.com/clips/104437/bat-daded#tab=featured',
@@ -23,20 +23,8 @@ class SouthParkIE(MTVServicesInfoExtractor):
}, {
'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1',
'only_matching': True,
}, {
'url': 'https://www.southparkstudios.com/episodes/h4o269/south-park-stunning-and-brave-season-19-ep-1',
'only_matching': True,
}]
def _get_feed_query(self, uri):
return {
'accountOverride': 'intl.mtvi.com',
'arcEp': 'shared.southpark.global',
'ep': '90877963',
'imageEp': 'shared.southpark.global',
'mgid': uri,
}
class SouthParkEsIE(SouthParkIE):
IE_NAME = 'southpark.cc.com:español'

View File

@@ -1,105 +1,82 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
clean_html,
float_or_none,
int_or_none,
parse_iso8601,
strip_or_none,
try_get,
sanitized_Request,
)
class SportDeutschlandIE(InfoExtractor):
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<id>(?:[^/]+/)?[^?#/&]+)'
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])'
_TESTS = [{
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': {
'id': '5318cac0275701382770543d7edaf0a0',
'id': 're-live-deutsche-meisterschaften-2020-halbfinals',
'ext': 'mp4',
'title': 'Re-live: Deutsche Meisterschaften 2020 - Halbfinals - Teil 1',
'duration': 16106.36,
'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals',
'categories': ['Badminton-Deutschland'],
'view_count': int,
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'timestamp': int,
'upload_date': '20200201',
'description': 're:.*', # meaningless description for THIS video
},
'params': {
'noplaylist': True,
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': {
'id': 'c6e2fdd01f63013854c47054d2ab776f',
'title': 'Re-live: Deutsche Meisterschaften 2020 - Halbfinals',
'description': 'md5:5263ff4c31c04bb780c9f91130b48530',
'duration': 31397,
},
'playlist_count': 2,
}, {
'url': 'https://sportdeutschland.tv/freeride-world-tour-2021-fieberbrunn-oesterreich',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
data = self._download_json(
'https://backend.sportdeutschland.tv/api/permalinks/' + display_id,
display_id, query={'access_token': 'true'})
asset = data['asset']
title = (asset.get('title') or asset['label']).strip()
asset_id = asset.get('id') or asset.get('uuid')
info = {
'id': asset_id,
'title': title,
'description': clean_html(asset.get('body') or asset.get('description')) or asset.get('teaser'),
'duration': int_or_none(asset.get('seconds')),
}
videos = asset.get('videos') or []
if len(videos) > 1:
playlist_id = compat_parse_qs(compat_urllib_parse_urlparse(url).query).get('playlistId', [None])[0]
if playlist_id:
if self._downloader.params.get('noplaylist'):
videos = [videos[int(playlist_id)]]
self.to_screen('Downloading just a single video because of --no-playlist')
else:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % asset_id)
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
sport_id = mobj.group('sport')
def entries():
for i, video in enumerate(videos, 1):
video_id = video.get('uuid')
video_url = video.get('url')
if not (video_id and video_url):
continue
formats = self._extract_m3u8_formats(
video_url.replace('.smil', '.m3u8'), video_id, 'mp4', fatal=False)
if not formats:
continue
yield {
'id': video_id,
'formats': formats,
'title': title + ' - ' + (video.get('label') or 'Teil %d' % i),
'duration': float_or_none(video.get('duration')),
}
info.update({
'_type': 'multi_video',
'entries': entries(),
})
api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
sport_id, video_id)
req = sanitized_Request(api_url, headers={
'Accept': 'application/vnd.vidibus.v2.html+json',
'Referer': url,
})
data = self._download_json(req, video_id)
asset = data['asset']
categories = [data['section']['title']]
formats = []
smil_url = asset['video']
if '.smil' in smil_url:
m3u8_url = smil_url.replace('.smil', '.m3u8')
formats.extend(
self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4'))
smil_doc = self._download_xml(
smil_url, video_id, note='Downloading SMIL metadata')
base_url_el = smil_doc.find('./head/meta')
if base_url_el:
base_url = base_url_el.attrib['base']
formats.extend([{
'format_id': 'rmtp',
'url': base_url if base_url_el else n.attrib['src'],
'play_path': n.attrib['src'],
'ext': 'flv',
'preference': -100,
'format_note': 'Seems to fail at example stream',
} for n in smil_doc.findall('./body/video')])
else:
formats = self._extract_m3u8_formats(
videos[0]['url'].replace('.smil', '.m3u8'), asset_id, 'mp4')
section_title = strip_or_none(try_get(data, lambda x: x['section']['title']))
info.update({
'formats': formats,
'display_id': asset.get('permalink'),
'thumbnail': try_get(asset, lambda x: x['images'][0]),
'categories': [section_title] if section_title else None,
'view_count': int_or_none(asset.get('views')),
'is_live': asset.get('is_live') is True,
'timestamp': parse_iso8601(asset.get('date') or asset.get('published_at')),
})
return info
formats.append({'url': smil_url})
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'title': asset['title'],
'thumbnail': asset.get('image'),
'description': asset.get('teaser'),
'duration': asset.get('duration'),
'categories': categories,
'view_count': asset.get('views'),
'rtmp_live': asset.get('live'),
'timestamp': parse_iso8601(asset.get('date')),
}

View File

@@ -1,61 +1,19 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
clean_podcast_url,
ExtractorError,
int_or_none,
str_or_none,
try_get,
url_or_none,
)
class StitcherBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?stitcher\.com/(?:podcast|show)/'
def _call_api(self, path, video_id, query):
resp = self._download_json(
'https://api.prod.stitcher.com/' + path,
video_id, query=query)
error_massage = try_get(resp, lambda x: x['errors'][0]['message'])
if error_massage:
raise ExtractorError(error_massage, expected=True)
return resp['data']
def _extract_description(self, data):
return clean_html(data.get('html_description') or data.get('description'))
def _extract_audio_url(self, episode):
return url_or_none(episode.get('audio_url') or episode.get('guid'))
def _extract_show_info(self, show):
return {
'thumbnail': show.get('image_base_url'),
'series': show.get('title'),
}
def _extract_episode(self, episode, audio_url, show_info):
info = {
'id': compat_str(episode['id']),
'display_id': episode.get('slug'),
'title': episode['title'].strip(),
'description': self._extract_description(episode),
'duration': int_or_none(episode.get('duration')),
'url': clean_podcast_url(audio_url),
'vcodec': 'none',
'timestamp': int_or_none(episode.get('date_published')),
'season_number': int_or_none(episode.get('season')),
'season_id': str_or_none(episode.get('season_id')),
}
info.update(show_info)
return info
class StitcherIE(StitcherBaseIE):
_VALID_URL = StitcherBaseIE._VALID_URL_BASE + r'(?:[^/]+/)+e(?:pisode)?/(?:[^/#?&]+-)?(?P<id>\d+)'
class StitcherIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?stitcher\.com/(?:podcast|show)/(?:[^/]+/)+e(?:pisode)?/(?:(?P<display_id>[^/#?&]+?)-)?(?P<id>\d+)(?:[/#?&]|$)'
_TESTS = [{
'url': 'http://www.stitcher.com/podcast/the-talking-machines/e/40789481?autoplay=true',
'md5': 'e9635098e0da10b21a0e2b85585530f6',
@@ -66,9 +24,8 @@ class StitcherIE(StitcherBaseIE):
'description': 'md5:547adb4081864be114ae3831b4c2b42f',
'duration': 1604,
'thumbnail': r're:^https?://.*\.jpg',
'upload_date': '20151008',
'timestamp': 1444285800,
'series': 'Talking Machines',
'upload_date': '20180126',
'timestamp': 1516989316,
},
}, {
'url': 'http://www.stitcher.com/podcast/panoply/vulture-tv/e/the-rare-hourlong-comedy-plus-40846275?autoplay=true',
@@ -98,47 +55,33 @@ class StitcherIE(StitcherBaseIE):
}]
def _real_extract(self, url):
audio_id = self._match_id(url)
data = self._call_api(
'shows/episodes', audio_id, {'episode_ids': audio_id})
episode = data['episodes'][0]
audio_url = self._extract_audio_url(episode)
if not audio_url:
self.raise_login_required()
show = try_get(data, lambda x: x['shows'][0], dict) or {}
return self._extract_episode(
episode, audio_url, self._extract_show_info(show))
display_id, audio_id = re.match(self._VALID_URL, url).groups()
resp = self._download_json(
'https://api.prod.stitcher.com/episode/' + audio_id,
display_id or audio_id)
episode = try_get(resp, lambda x: x['data']['episodes'][0], dict)
if not episode:
raise ExtractorError(resp['errors'][0]['message'], expected=True)
class StitcherShowIE(StitcherBaseIE):
_VALID_URL = StitcherBaseIE._VALID_URL_BASE + r'(?P<id>[^/#?&]+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'http://www.stitcher.com/podcast/the-talking-machines',
'info_dict': {
'id': 'the-talking-machines',
'title': 'Talking Machines',
'description': 'md5:831f0995e40f26c10231af39cf1ebf0b',
},
'playlist_mincount': 106,
}, {
'url': 'https://www.stitcher.com/show/the-talking-machines',
'only_matching': True,
}]
title = episode['title'].strip()
audio_url = episode['audio_url']
def _real_extract(self, url):
show_slug = self._match_id(url)
data = self._call_api(
'search/show/%s/allEpisodes' % show_slug, show_slug, {'count': 10000})
show = try_get(data, lambda x: x['shows'][0], dict) or {}
show_info = self._extract_show_info(show)
thumbnail = None
show_id = episode.get('show_id')
if show_id and episode.get('classic_id') != -1:
thumbnail = 'https://stitcher-classic.imgix.net/feedimages/%s.jpg' % show_id
entries = []
for episode in (data.get('episodes') or []):
audio_url = self._extract_audio_url(episode)
if not audio_url:
continue
entries.append(self._extract_episode(episode, audio_url, show_info))
return self.playlist_result(
entries, show_slug, show.get('title'),
self._extract_description(show))
return {
'id': audio_id,
'display_id': display_id,
'title': title,
'description': clean_html(episode.get('html_description') or episode.get('description')),
'duration': int_or_none(episode.get('duration')),
'thumbnail': thumbnail,
'url': audio_url,
'vcodec': 'none',
'timestamp': int_or_none(episode.get('date_created')),
'season_number': int_or_none(episode.get('season')),
'season_id': str_or_none(episode.get('season_id')),
}

View File

@@ -146,19 +146,18 @@ class SVTPlayIE(SVTPlayBaseIE):
)
(?P<svt_id>[^/?#&]+)|
https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp|kanaler)/(?P<id>[^/?#&]+)
(?:.*?modalId=(?P<modal_id>[\da-zA-Z-]+))?
)
'''
_TESTS = [{
'url': 'https://www.svtplay.se/video/30479064',
'url': 'https://www.svtplay.se/video/26194546/det-har-ar-himlen',
'md5': '2382036fd6f8c994856c323fe51c426e',
'info_dict': {
'id': '8zVbDPA',
'id': 'jNwpV9P',
'ext': 'mp4',
'title': 'Designdrömmar i Stenungsund',
'timestamp': 1615770000,
'upload_date': '20210315',
'duration': 3519,
'title': 'Det här är himlen',
'timestamp': 1586044800,
'upload_date': '20200405',
'duration': 3515,
'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$',
'age_limit': 0,
'subtitles': {
@@ -174,9 +173,6 @@ class SVTPlayIE(SVTPlayBaseIE):
# AssertionError: Expected test_SVTPlay_jNwpV9P.mp4 to be at least 9.77KiB, but it's only 864.00B
'skip_download': True,
},
}, {
'url': 'https://www.svtplay.se/video/30479064/husdrommar/husdrommar-sasong-8-designdrommar-i-stenungsund?modalId=8zVbDPA',
'only_matching': True,
}, {
# geo restricted to Sweden
'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten',
@@ -223,8 +219,7 @@ class SVTPlayIE(SVTPlayBaseIE):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
svt_id = mobj.group('svt_id') or mobj.group('modal_id')
video_id, svt_id = mobj.group('id', 'svt_id')
if svt_id:
return self._extract_by_video_id(svt_id)
@@ -259,7 +254,6 @@ class SVTPlayIE(SVTPlayBaseIE):
if not svt_id:
svt_id = self._search_regex(
(r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
r'<[^>]+\bdata-rt=["\']top-area-play-button["\'][^>]+\bhref=["\'][^"\']*video/%s/[^"\']*\bmodalId=([\da-zA-Z-]+)' % re.escape(video_id),
r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)',
r'["\']videoSvtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)',
r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"',

View File

@@ -143,10 +143,7 @@ class TikTokIE(TikTokBaseIE):
props_data = try_get(json_data, lambda x: x['props'], expected_type=dict)
# Chech statusCode for success
status = props_data.get('pageProps').get('statusCode')
if status == 0:
if props_data.get('pageProps').get('statusCode') == 0:
return self._extract_aweme(props_data, webpage, url)
elif status == 10216:
raise ExtractorError('This video is private', expected=True)
raise ExtractorError('Video not available', video_id=video_id)

View File

@@ -14,7 +14,6 @@ from ..utils import (
class TrovoBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?trovo\.live/'
_HEADERS = {'Origin': 'https://trovo.live'}
def _extract_streamer_info(self, data):
streamer_info = data.get('streamerInfo') or {}
@@ -69,7 +68,6 @@ class TrovoIE(TrovoBaseIE):
'format_id': format_id,
'height': int_or_none(format_id[:-1]) if format_id else None,
'url': play_url,
'http_headers': self._HEADERS,
})
self._sort_formats(formats)
@@ -155,7 +153,6 @@ class TrovoVodIE(TrovoBaseIE):
'protocol': 'm3u8_native',
'tbr': int_or_none(play_info.get('bitrate')),
'url': play_url,
'http_headers': self._HEADERS,
})
self._sort_formats(formats)

View File

@@ -9,7 +9,6 @@ from ..utils import (
int_or_none,
remove_start,
smuggle_url,
strip_or_none,
try_get,
)
@@ -26,10 +25,6 @@ class TVerIE(InfoExtractor):
}, {
'url': 'https://tver.jp/episode/79622438',
'only_matching': True,
}, {
# subtitle = ' '
'url': 'https://tver.jp/corner/f0068870',
'only_matching': True,
}]
_TOKEN = None
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
@@ -52,12 +47,8 @@ class TVerIE(InfoExtractor):
}
if service == 'cx':
title = main['title']
subtitle = strip_or_none(main.get('subtitle'))
if subtitle:
title += ' - ' + subtitle
info.update({
'title': title,
'title': main.get('subtitle') or main['title'],
'url': 'https://i.fod.fujitv.co.jp/plus7/web/%s/%s.html' % (p_id[:4], p_id),
'ie_key': 'FujiTVFODPlus7',
})

View File

@@ -23,8 +23,6 @@ class VGTVIE(XstreamIE):
'fvn.no/fvntv': 'fvntv',
'aftenposten.no/webtv': 'aptv',
'ap.vgtv.no/webtv': 'aptv',
'tv.aftonbladet.se': 'abtv',
# obsolete URL schemas, kept in order to save one HTTP redirect
'tv.aftonbladet.se/abtv': 'abtv',
'www.aftonbladet.se/tv': 'abtv',
}
@@ -142,10 +140,6 @@ class VGTVIE(XstreamIE):
'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk',
'only_matching': True,
},
{
'url': 'https://tv.aftonbladet.se/video/36015/vulkanutbrott-i-rymden-nu-slapper-nasa-bilderna',
'only_matching': True,
},
{
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'only_matching': True,

View File

@@ -24,7 +24,6 @@ from ..utils import (
merge_dicts,
OnDemandPagedList,
parse_filesize,
parse_iso8601,
RegexNotFoundError,
sanitized_Request,
smuggle_url,
@@ -75,28 +74,25 @@ class VimeoBaseInfoExtractor(InfoExtractor):
expected=True)
raise ExtractorError('Unable to log in')
def _get_video_password(self):
def _verify_video_password(self, url, video_id, webpage):
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError(
'This video is protected by a password, use the --video-password option',
expected=True)
return password
def _verify_video_password(self, url, video_id, password, token, vuid):
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
token, vuid = self._extract_xsrft_and_vuid(webpage)
data = urlencode_postdata({
'password': password,
'token': token,
})
if url.startswith('http://'):
# vimeo only supports https now, but the user can give an http url
url = url.replace('http://', 'https://')
password_request = sanitized_Request(url + '/password', data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
password_request.add_header('Referer', url)
self._set_vimeo_cookie('vuid', vuid)
return self._download_webpage(
url + '/password', video_id, 'Verifying the password',
'Wrong password', data=urlencode_postdata({
'password': password,
'token': token,
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': url,
})
password_request, video_id,
'Verifying the password', 'Wrong password')
def _extract_xsrft_and_vuid(self, webpage):
xsrft = self._search_regex(
@@ -277,7 +273,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
)?
(?:videos?/)?
(?P<id>[0-9]+)
(?:/(?P<unlisted_hash>[\da-f]{10}))?
(?:/[\da-f]+)?
/?(?:[?&].*)?(?:[#].*)?$
'''
IE_NAME = 'vimeo'
@@ -330,9 +326,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
'id': '54469442',
'ext': 'mp4',
'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
'uploader': 'Business of Software',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/businessofsoftware',
'uploader_id': 'businessofsoftware',
'uploader': 'The BLN & Business of Software',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware',
'uploader_id': 'theblnbusinessofsoftware',
'duration': 3610,
'description': None,
},
@@ -467,7 +463,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
'skip_download': True,
},
'expected_warnings': ['Unable to download JSON metadata'],
'skip': 'this page is no longer available.',
},
{
'url': 'http://player.vimeo.com/video/68375962',
@@ -503,24 +498,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
'url': 'https://vimeo.com/album/2632481/video/79010983',
'only_matching': True,
},
{
'url': 'https://vimeo.com/showcase/3253534/video/119195465',
'note': 'A video in a password protected album (showcase)',
'info_dict': {
'id': '119195465',
'ext': 'mp4',
'title': 'youtube-dl test video \'ä"BaW_jenozKc',
'uploader': 'Philipp Hagemeister',
'uploader_id': 'user20132939',
'description': 'md5:fa7b6c6d8db0bdc353893df2f111855b',
'upload_date': '20150209',
'timestamp': 1423518307,
},
'params': {
'format': 'best[protocol=https]',
'videopassword': 'youtube-dl',
},
},
{
# source file returns 403: Forbidden
'url': 'https://vimeo.com/7809605',
@@ -568,7 +545,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
return urls[0] if urls else None
def _verify_player_video_password(self, url, video_id, headers):
password = self._get_video_password()
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
data = urlencode_postdata({
'password': base64.b64encode(password.encode()),
})
@@ -585,44 +564,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
def _real_initialize(self):
self._login()
def _try_album_password(self, url):
album_id = self._search_regex(
r'vimeo\.com/(?:album|showcase)/([^/]+)', url, 'album id', default=None)
if not album_id:
return
viewer = self._download_json(
'https://vimeo.com/_rv/viewer', album_id, fatal=False)
if not viewer:
webpage = self._download_webpage(url, album_id)
viewer = self._parse_json(self._search_regex(
r'bootstrap_data\s*=\s*({.+?})</script>',
webpage, 'bootstrap data'), album_id)['viewer']
jwt = viewer['jwt']
album = self._download_json(
'https://api.vimeo.com/albums/' + album_id,
album_id, headers={'Authorization': 'jwt ' + jwt},
query={'fields': 'description,name,privacy'})
if try_get(album, lambda x: x['privacy']['view']) == 'password':
password = self._downloader.params.get('videopassword')
if not password:
raise ExtractorError(
'This album is protected by a password, use the --video-password option',
expected=True)
self._set_vimeo_cookie('vuid', viewer['vuid'])
try:
self._download_json(
'https://vimeo.com/showcase/%s/auth' % album_id,
album_id, 'Verifying the password', data=urlencode_postdata({
'password': password,
'token': viewer['xsrft'],
}), headers={
'X-Requested-With': 'XMLHttpRequest',
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
raise ExtractorError('Wrong password', expected=True)
raise
def _real_extract(self, url):
url, data = unsmuggle_url(url, {})
headers = std_headers.copy()
@@ -631,37 +572,11 @@ class VimeoIE(VimeoBaseInfoExtractor):
if 'Referer' not in headers:
headers['Referer'] = url
# Extract ID from URL
video_id, unlisted_hash = re.match(self._VALID_URL, url).groups()
if unlisted_hash:
token = self._download_json(
'https://vimeo.com/_rv/jwt', video_id, headers={
'X-Requested-With': 'XMLHttpRequest'
})['token']
video = self._download_json(
'https://api.vimeo.com/videos/%s:%s' % (video_id, unlisted_hash),
video_id, headers={
'Authorization': 'jwt ' + token,
}, query={
'fields': 'config_url,created_time,description,license,metadata.connections.comments.total,metadata.connections.likes.total,release_time,stats.plays',
})
info = self._parse_config(self._download_json(
video['config_url'], video_id), video_id)
self._vimeo_sort_formats(info['formats'])
get_timestamp = lambda x: parse_iso8601(video.get(x + '_time'))
info.update({
'description': video.get('description'),
'license': video.get('license'),
'release_timestamp': get_timestamp('release'),
'timestamp': get_timestamp('created'),
'view_count': int_or_none(try_get(video, lambda x: x['stats']['plays'])),
})
connections = try_get(
video, lambda x: x['metadata']['connections'], dict) or {}
for k in ('comment', 'like'):
info[k + '_count'] = int_or_none(try_get(connections, lambda x: x[k + 's']['total']))
return info
channel_id = self._search_regex(
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
# Extract ID from URL
video_id = self._match_id(url)
orig_url = url
is_pro = 'vimeopro.com/' in url
is_player = '://player.vimeo.com/video/' in url
@@ -676,7 +591,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
url = 'https://vimeo.com/' + video_id
self._try_album_password(url)
try:
# Retrieve video webpage to extract further information
webpage, urlh = self._download_webpage_handle(
@@ -751,10 +665,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None:
if '_video_password_verified' in data:
raise ExtractorError('video password verification failed!')
video_password = self._get_video_password()
token, vuid = self._extract_xsrft_and_vuid(webpage)
self._verify_video_password(
redirect_url, video_id, video_password, token, vuid)
self._verify_video_password(redirect_url, video_id, webpage)
return self._real_extract(
smuggle_url(redirect_url, {'_video_password_verified': 'verified'}))
else:
@@ -840,8 +751,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
webpage, 'license', default=None, group='license')
channel_id = self._search_regex(
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
channel_url = 'https://vimeo.com/channels/%s' % channel_id if channel_id else None
info_dict = {
@@ -1025,15 +934,11 @@ class VimeoAlbumIE(VimeoBaseInfoExtractor):
}
if hashed_pass:
query['_hashed_pass'] = hashed_pass
try:
videos = self._download_json(
'https://api.vimeo.com/albums/%s/videos' % album_id,
album_id, 'Downloading page %d' % api_page, query=query, headers={
'Authorization': 'jwt ' + authorization,
})['data']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
return
videos = self._download_json(
'https://api.vimeo.com/albums/%s/videos' % album_id,
album_id, 'Downloading page %d' % api_page, query=query, headers={
'Authorization': 'jwt ' + authorization,
})['data']
for video in videos:
link = video.get('link')
if not link:
@@ -1148,23 +1053,10 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
def _real_extract(self, url):
page_url, video_id = re.match(self._VALID_URL, url).groups()
data = self._download_json(
page_url.replace('/review/', '/review/data/'), video_id)
if data.get('isLocked') is True:
video_password = self._get_video_password()
viewer = self._download_json(
'https://vimeo.com/_rv/viewer', video_id)
webpage = self._verify_video_password(
'https://vimeo.com/' + video_id, video_id,
video_password, viewer['xsrft'], viewer['vuid'])
clip_page_config = self._parse_json(self._search_regex(
r'window\.vimeo\.clip_page_config\s*=\s*({.+?});',
webpage, 'clip page config'), video_id)
config_url = clip_page_config['player']['config_url']
clip_data = clip_page_config.get('clip') or {}
else:
clip_data = data['clipData']
config_url = clip_data['configUrl']
clip_data = self._download_json(
page_url.replace('/review/', '/review/data/'),
video_id)['clipData']
config_url = clip_data['configUrl']
config = self._download_json(config_url, video_id)
info_dict = self._parse_config(config, video_id)
source_format = self._extract_original_format(

View File

@@ -113,7 +113,7 @@ class VLiveIE(VLiveBaseIE):
raise ExtractorError('Unable to log in', expected=True)
def _call_api(self, path_template, video_id, fields=None, limit=None):
query = {'appId': self._APP_ID, 'gcc': 'KR', 'platformType': 'PC'}
query = {'appId': self._APP_ID, 'gcc': 'KR'}
if fields:
query['fields'] = fields
if limit:

View File

@@ -7,8 +7,6 @@ from ..compat import compat_urllib_parse_unquote
from ..utils import (
ExtractorError,
int_or_none,
try_get,
unified_timestamp,
)
@@ -21,17 +19,14 @@ class VoxMediaVolumeIE(OnceIE):
setup = self._parse_json(self._search_regex(
r'setup\s*=\s*({.+});', webpage, 'setup'), video_id)
player_setup = setup.get('player_setup') or setup
video_data = player_setup.get('video') or {}
formatted_metadata = video_data.get('formatted_metadata') or {}
video_data = setup.get('video') or {}
info = {
'id': video_id,
'title': player_setup.get('title') or video_data.get('title_short'),
'title': video_data.get('title_short'),
'description': video_data.get('description_long') or video_data.get('description_short'),
'thumbnail': formatted_metadata.get('thumbnail') or video_data.get('brightcove_thumbnail'),
'timestamp': unified_timestamp(formatted_metadata.get('video_publish_date')),
'thumbnail': video_data.get('brightcove_thumbnail')
}
asset = try_get(setup, lambda x: x['embed_assets']['chorus'], dict) or {}
asset = setup.get('asset') or setup.get('params') or {}
formats = []
hls_url = asset.get('hls_url')
@@ -52,7 +47,6 @@ class VoxMediaVolumeIE(OnceIE):
if formats:
self._sort_formats(formats)
info['formats'] = formats
info['duration'] = int_or_none(asset.get('duration'))
return info
for provider_video_type in ('ooyala', 'youtube', 'brightcove'):
@@ -90,7 +84,7 @@ class VoxMediaIE(InfoExtractor):
}, {
# Volume embed, Youtube
'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
'md5': 'fd19aa0cf3a0eea515d4fd5c8c0e9d68',
'md5': '4c8f4a0937752b437c3ebc0ed24802b5',
'info_dict': {
'id': 'Gy8Md3Eky38',
'ext': 'mp4',
@@ -99,7 +93,6 @@ class VoxMediaIE(InfoExtractor):
'uploader_id': 'TheVerge',
'upload_date': '20141021',
'uploader': 'The Verge',
'timestamp': 1413907200,
},
'add_ie': ['Youtube'],
'skip': 'similar to the previous test',
@@ -107,13 +100,13 @@ class VoxMediaIE(InfoExtractor):
# Volume embed, Youtube
'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
'info_dict': {
'id': '22986359b',
'id': 'YCjDnX-Xzhg',
'ext': 'mp4',
'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination",
'description': 'md5:fc1317922057de31cd74bce91eb1c66c',
'uploader_id': 'voxdotcom',
'upload_date': '20150915',
'timestamp': 1442332800,
'duration': 285,
'uploader': 'Vox',
},
'add_ie': ['Youtube'],
'skip': 'similar to the previous test',
@@ -167,9 +160,6 @@ class VoxMediaIE(InfoExtractor):
'ext': 'mp4',
'title': 'Post-Post-PC CEO: The Full Code Conference Video of Microsoft\'s Satya Nadella',
'description': 'The longtime veteran was chosen earlier this year as the software giant\'s third leader in its history.',
'timestamp': 1402938000,
'upload_date': '20140616',
'duration': 4114,
},
'add_ie': ['VoxMediaVolume'],
}]

View File

@@ -182,20 +182,17 @@ class VVVVIDIE(InfoExtractor):
if not embed_code:
continue
embed_code = ds(embed_code)
if video_type == 'video/kenc':
embed_code = re.sub(r'https?(://[^/]+)/z/', r'https\1/i/', embed_code).replace('/manifest.f4m', '/master.m3u8')
kenc = self._download_json(
'https://www.vvvvid.it/kenc', video_id, query={
'action': 'kt',
'conn_id': self._conn_id,
'url': embed_code,
}, fatal=False) or {}
kenc_message = kenc.get('message')
if kenc_message:
embed_code += '?' + ds(kenc_message)
formats.extend(self._extract_m3u8_formats(
embed_code, video_id, 'mp4', m3u8_id='hls', fatal=False))
elif video_type == 'video/rcs':
if video_type in ('video/rcs', 'video/kenc'):
if video_type == 'video/kenc':
kenc = self._download_json(
'https://www.vvvvid.it/kenc', video_id, query={
'action': 'kt',
'conn_id': self._conn_id,
'url': embed_code,
}, fatal=False) or {}
kenc_message = kenc.get('message')
if kenc_message:
embed_code += '?' + ds(kenc_message)
formats.extend(self._extract_akamai_formats(embed_code, video_id))
elif video_type == 'video/youtube':
info.update({

View File

@@ -1,163 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_duration,
urlencode_postdata,
ExtractorError,
)
class WimTVIE(InfoExtractor):
_player = None
_UUID_RE = r'[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}'
_VALID_URL = r'''(?x)
https?://platform.wim.tv/
(?:
(?:embed/)?\?
|\#/webtv/.+?/
)
(?P<type>vod|live|cast)[=/]
(?P<id>%s).*?''' % _UUID_RE
_TESTS = [{
# vod stream
'url': 'https://platform.wim.tv/embed/?vod=db29fb32-bade-47b6-a3a6-cb69fe80267a',
'md5': 'db29fb32-bade-47b6-a3a6-cb69fe80267a',
'info_dict': {
'id': 'db29fb32-bade-47b6-a3a6-cb69fe80267a',
'ext': 'mp4',
'title': 'AMA SUPERCROSS 2020 - R2 ST. LOUIS',
'duration': 6481,
'thumbnail': r're:https?://.+?/thumbnail/.+?/720$'
},
'params': {
'skip_download': True,
},
}, {
# live stream
'url': 'https://platform.wim.tv/embed/?live=28e22c22-49db-40f3-8c37-8cbb0ff44556&autostart=true',
'info_dict': {
'id': '28e22c22-49db-40f3-8c37-8cbb0ff44556',
'ext': 'mp4',
'title': 'Streaming MSmotorTV',
'is_live': True,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://platform.wim.tv/#/webtv/automotornews/vod/422492b6-539e-474d-9c6b-68c9d5893365',
'only_matching': True,
}, {
'url': 'https://platform.wim.tv/#/webtv/renzoarborechannel/cast/f47e0d15-5b45-455e-bf0d-dba8ffa96365',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<iframe[^>]+src=["\'](?P<url>%s)' % WimTVIE._VALID_URL,
webpage)]
def _real_initialize(self):
if not self._player:
self._get_player_data()
def _get_player_data(self):
msg_id = 'Player data'
self._player = {}
datas = [{
'url': 'https://platform.wim.tv/common/libs/player/wimtv/wim-rest.js',
'vars': [{
'regex': r'appAuth = "(.+?)"',
'variable': 'app_auth',
}]
}, {
'url': 'https://platform.wim.tv/common/config/endpointconfig.js',
'vars': [{
'regex': r'PRODUCTION_HOSTNAME_THUMB = "(.+?)"',
'variable': 'thumb_server',
}, {
'regex': r'PRODUCTION_HOSTNAME_THUMB\s*\+\s*"(.+?)"',
'variable': 'thumb_server_path',
}]
}]
for data in datas:
temp = self._download_webpage(data['url'], msg_id)
for var in data['vars']:
val = self._search_regex(var['regex'], temp, msg_id)
if not val:
raise ExtractorError('%s not found' % var['variable'])
self._player[var['variable']] = val
def _generate_token(self):
json = self._download_json(
'https://platform.wim.tv/wimtv-server/oauth/token', 'Token generation',
headers={'Authorization': 'Basic %s' % self._player['app_auth']},
data=urlencode_postdata({'grant_type': 'client_credentials'}))
token = json.get('access_token')
if not token:
raise ExtractorError('access token not generated')
return token
def _generate_thumbnail(self, thumb_id, width='720'):
if not thumb_id or not self._player.get('thumb_server'):
return None
if not self._player.get('thumb_server_path'):
self._player['thumb_server_path'] = ''
return '%s%s/asset/thumbnail/%s/%s' % (
self._player['thumb_server'],
self._player['thumb_server_path'],
thumb_id, width)
def _real_extract(self, url):
urlc = re.match(self._VALID_URL, url).groupdict()
video_id = urlc['id']
stream_type = is_live = None
if urlc['type'] in {'live', 'cast'}:
stream_type = urlc['type'] + '/channel'
is_live = True
else:
stream_type = 'vod'
is_live = False
token = self._generate_token()
json = self._download_json(
'https://platform.wim.tv/wimtv-server/api/public/%s/%s/play' % (
stream_type, video_id), video_id,
headers={'Authorization': 'Bearer %s' % token,
'Content-Type': 'application/json'},
data=bytes('{}', 'utf-8'))
formats = []
for src in json.get('srcs') or []:
if src.get('mimeType') == 'application/x-mpegurl':
formats.extend(
self._extract_m3u8_formats(
src.get('uniqueStreamer'), video_id, 'mp4'))
if src.get('mimeType') == 'video/flash':
formats.append({
'format_id': 'rtmp',
'url': src.get('uniqueStreamer'),
'ext': determine_ext(src.get('uniqueStreamer'), 'flv'),
'rtmp_live': is_live,
})
json = json.get('resource')
thumb = self._generate_thumbnail(json.get('thumbnailId'))
self._sort_formats(formats)
return {
'id': video_id,
'title': json.get('title') or json.get('name'),
'duration': parse_duration(json.get('duration')),
'formats': formats,
'thumbnail': thumb,
'is_live': is_live,
}

File diff suppressed because it is too large Load Diff

View File

@@ -4,7 +4,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_age_limit,
@@ -17,34 +16,24 @@ from ..utils import (
class Zee5IE(InfoExtractor):
_VALID_URL = r'''(?x)
(?:
zee5:|
(?:https?://)(?:www\.)?zee5\.com/(?:[^#?]+/)?
(?:
(?:tvshows|kids|zee5originals)(?:/[^#/?]+){3}
|movies/[^#/?]+
)/(?P<display_id>[^#/?]+)/
)
(?P<id>[^#/?]+)/?(?:$|[?#])
'''
_VALID_URL = r'https?://(?:www\.)?zee5\.com/[^#?]*/(?P<display_id>[-\w]+)/(?P<id>[-\d]+)'
_TESTS = [{
'url': 'https://www.zee5.com/movies/details/krishna-the-birth/0-0-63098',
'info_dict': {
'id': '0-0-63098',
'ext': 'mp4',
'display_id': 'krishna-the-birth',
'title': 'Krishna - The Birth',
'duration': 4368,
'average_rating': 4,
'description': str,
'alt_title': 'Krishna - The Birth',
'uploader': 'Zee Entertainment Enterprises Ltd',
'release_date': '20060101',
'upload_date': '20060101',
'timestamp': 1136073600,
'thumbnail': 'https://akamaividz.zee5.com/resources/0-0-63098/list/270x152/0063098_list_80888170.jpg',
'tags': list
"id": "0-0-63098",
"ext": "m3u8",
"display_id": "krishna-the-birth",
"title": "Krishna - The Birth",
"duration": 4368,
"average_rating": 4,
"description": str,
"alt_title": "Krishna - The Birth",
"uploader": "Zee Entertainment Enterprises Ltd",
"release_date": "20060101",
"upload_date": "20060101",
"timestamp": 1136073600,
"thumbnail": "https://akamaividz.zee5.com/resources/0-0-63098/list/270x152/0063098_list_80888170.jpg",
"tags": list
},
'params': {
'format': 'bv',
@@ -52,43 +41,37 @@ class Zee5IE(InfoExtractor):
}, {
'url': 'https://zee5.com/tvshows/details/krishna-balram/0-6-1871/episode-1-the-test-of-bramha/0-1-233402',
'info_dict': {
'id': '0-1-233402',
'ext': 'mp4',
'display_id': 'episode-1-the-test-of-bramha',
'title': 'Episode 1 - The Test Of Bramha',
'duration': 1336,
'average_rating': 4,
'description': str,
'alt_title': 'Episode 1 - The Test Of Bramha',
'uploader': 'Green Gold',
'release_date': '20090101',
'upload_date': '20090101',
'timestamp': 1230768000,
'thumbnail': 'https://akamaividz.zee5.com/resources/0-1-233402/list/270x152/01233402_list.jpg',
'series': 'Krishna Balram',
'season_number': 1,
'episode_number': 1,
'tags': list,
"id": "0-1-233402",
'ext': 'm3u8',
"display_id": "episode-1-the-test-of-bramha",
"title": "Episode 1 - The Test Of Bramha",
"duration": 1336,
"average_rating": 4,
"description": str,
"alt_title": "Episode 1 - The Test Of Bramha",
"uploader": "Green Gold",
"release_date": "20090101",
"upload_date": "20090101",
"timestamp": 1230768000,
"thumbnail": "https://akamaividz.zee5.com/resources/0-1-233402/list/270x152/01233402_list.jpg",
"series": "Krishna Balram",
"season_number": 1,
"episode_number": 1,
"tags": list,
},
'params': {
'format': 'bv',
},
}, {
'url': 'https://www.zee5.com/hi/tvshows/details/kundali-bhagya/0-6-366/kundali-bhagya-march-08-2021/0-1-manual_7g9jv1os7730?country=IN',
'only_matching': True
}, {
'url': 'https://www.zee5.com/global/hi/tvshows/details/kundali-bhagya/0-6-366/kundali-bhagya-march-08-2021/0-1-manual_7g9jv1os7730',
'only_matching': True
}]
def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).group('id', 'display_id')
access_token_request = self._download_json(
'https://useraction.zee5.com/token/platform_tokens.php?platform_name=web_app',
video_id, note='Downloading access token')
video_id, note="Downloading access token")
token_request = self._download_json(
'https://useraction.zee5.com/tokennd',
video_id, note='Downloading video token')
video_id, note="Downloading video token")
json_data = self._download_json(
'https://gwapi.zee5.com/content/details/{}?translation=en&country=IN'.format(video_id),
video_id, headers={'X-Access-Token': access_token_request['token']})
@@ -128,78 +111,3 @@ class Zee5IE(InfoExtractor):
'episode_number': int_or_none(try_get(json_data, lambda x: x['index'])),
'tags': try_get(json_data, lambda x: x['tags'], list)
}
class Zee5SeriesIE(InfoExtractor):
IE_NAME = 'zee5:series'
_VALID_URL = r'''(?x)
(?:
zee5:series:|
(?:https?://)(?:www\.)?zee5\.com/(?:[^#?]+/)?
(?:tvshows|kids|zee5originals)(?:/[^#/?]+){2}/
)
(?P<id>[^#/?]+)/?(?:$|[?#])
'''
_TESTS = [{
'url': 'https://www.zee5.com/kids/kids-shows/krishna-balram/0-6-1871',
'playlist_mincount': 43,
'info_dict': {
'id': '0-6-1871',
},
}, {
'url': 'https://www.zee5.com/tvshows/details/bhabi-ji-ghar-par-hai/0-6-199',
'playlist_mincount': 1500,
'info_dict': {
'id': '0-6-199',
},
}, {
'url': 'https://www.zee5.com/tvshows/details/agent-raghav-crime-branch/0-6-965',
'playlist_mincount': 25,
'info_dict': {
'id': '0-6-965',
},
}, {
'url': 'https://www.zee5.com/ta/tvshows/details/nagabhairavi/0-6-3201',
'playlist_mincount': 3,
'info_dict': {
'id': '0-6-3201',
},
}, {
'url': 'https://www.zee5.com/global/hi/tvshows/details/khwaabon-ki-zamin-par/0-6-270',
'playlist_mincount': 150,
'info_dict': {
'id': '0-6-270',
},
}
]
def _entries(self, show_id):
access_token_request = self._download_json(
'https://useraction.zee5.com/token/platform_tokens.php?platform_name=web_app',
show_id, note='Downloading access token')
headers = {
'X-Access-Token': access_token_request['token'],
'Referer': 'https://www.zee5.com/',
}
show_url = 'https://gwapi.zee5.com/content/tvshow/{}?translation=en&country=IN'.format(show_id)
page_num = 0
show_json = self._download_json(show_url, video_id=show_id, headers=headers)
for season in show_json.get('seasons') or []:
season_id = try_get(season, lambda x: x['id'], compat_str)
next_url = 'https://gwapi.zee5.com/content/tvshow/?season_id={}&type=episode&translation=en&country=IN&on_air=false&asset_subtype=tvshow&page=1&limit=100'.format(season_id)
while next_url:
page_num += 1
episodes_json = self._download_json(
next_url, video_id=show_id, headers=headers,
note='Downloading JSON metadata page %d' % page_num)
for episode in try_get(episodes_json, lambda x: x['episode'], list) or []:
video_id = episode.get('id')
yield self.url_result(
'zee5:%s' % video_id,
ie=Zee5IE.ie_key(), video_id=video_id)
next_url = url_or_none(episodes_json.get('next_episode_api'))
def _real_extract(self, url):
show_id = self._match_id(url)
return self.playlist_result(self._entries(show_id), playlist_id=show_id)

View File

@@ -1,94 +1,93 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
update_url_query,
)
class ZingMp3BaseIE(InfoExtractor):
_VALID_URL_TMPL = r'https?://(?:mp3\.zing|zingmp3)\.vn/(?:%s)/[^/]+/(?P<id>\w+)\.html'
_GEO_COUNTRIES = ['VN']
class ZingMp3BaseInfoExtractor(InfoExtractor):
def _extract_item(self, item, fatal):
item_id = item['id']
title = item.get('name') or item['title']
formats = []
for k, v in (item.get('source') or {}).items():
if not v:
continue
if k in ('mp4', 'hls'):
for res, video_url in v.items():
if not video_url:
continue
if k == 'hls':
formats.extend(self._extract_m3u8_formats(
video_url, item_id, 'mp4',
'm3u8_native', m3u8_id=k, fatal=False))
elif k == 'mp4':
formats.append({
'format_id': 'mp4-' + res,
'url': video_url,
'height': int_or_none(self._search_regex(
r'^(\d+)p', res, 'resolution', default=None)),
})
else:
formats.append({
'ext': 'mp3',
'format_id': k,
'tbr': int_or_none(k),
'url': self._proto_relative_url(v),
'vcodec': 'none',
})
if not formats:
def _extract_item(self, item, page_type, fatal=True):
error_message = item.get('msg')
if error_message:
if not fatal:
return
msg = item['msg']
if msg == 'Sorry, this content is not available in your country.':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
raise ExtractorError(msg, expected=True)
self._sort_formats(formats)
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message),
expected=True)
subtitles = None
lyric = item.get('lyric')
if lyric:
subtitles = {
'origin': [{
'url': lyric,
}],
formats = []
for quality, source_url in zip(item.get('qualities') or item.get('quality', []), item.get('source_list') or item.get('source', [])):
if not source_url or source_url == 'require vip':
continue
if not re.match(r'https?://', source_url):
source_url = '//' + source_url
source_url = self._proto_relative_url(source_url, 'http:')
quality_num = int_or_none(quality)
f = {
'format_id': quality,
'url': source_url,
}
if page_type == 'video':
f.update({
'height': quality_num,
'ext': 'mp4',
})
else:
f.update({
'abr': quality_num,
'ext': 'mp3',
})
formats.append(f)
album = item.get('album') or {}
cover = item.get('cover')
return {
'id': item_id,
'title': title,
'title': (item.get('name') or item.get('title')).strip(),
'formats': formats,
'thumbnail': item.get('thumbnail'),
'subtitles': subtitles,
'duration': int_or_none(item.get('duration')),
'track': title,
'artist': item.get('artists_names'),
'album': album.get('name') or album.get('title'),
'album_artist': album.get('artists_names'),
'thumbnail': 'http:/' + cover if cover else None,
'artist': item.get('artist'),
}
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(
url.replace('://zingmp3.vn/', '://mp3.zing.vn/'),
page_id, query={'play_song': 1})
data_path = self._search_regex(
r'data-xml="([^"]+)', webpage, 'data path')
return self._process_data(self._download_json(
'https://mp3.zing.vn/xhr' + data_path, page_id)['data'])
def _extract_player_json(self, player_json_url, id, page_type, playlist_title=None):
player_json = self._download_json(player_json_url, id, 'Downloading Player JSON')
items = player_json['data']
if 'item' in items:
items = items['item']
if len(items) == 1:
# one single song
data = self._extract_item(items[0], page_type)
data['id'] = id
return data
else:
# playlist of songs
entries = []
for i, item in enumerate(items, 1):
entry = self._extract_item(item, page_type, fatal=False)
if not entry:
continue
entry['id'] = '%s-%d' % (id, i)
entries.append(entry)
return {
'_type': 'playlist',
'id': id,
'title': playlist_title,
'entries': entries,
}
class ZingMp3IE(ZingMp3BaseIE):
_VALID_URL = ZingMp3BaseIE._VALID_URL_TMPL % 'bai-hat|video-clip'
class ZingMp3IE(ZingMp3BaseInfoExtractor):
_VALID_URL = r'https?://mp3\.zing\.vn/(?:bai-hat|album|playlist|video-clip)/[^/]+/(?P<id>\w+)\.html'
_TESTS = [{
'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
'md5': 'ead7ae13693b3205cbc89536a077daed',
@@ -96,66 +95,49 @@ class ZingMp3IE(ZingMp3BaseIE):
'id': 'ZWZB9WAB',
'title': 'Xa Mãi Xa',
'ext': 'mp3',
'thumbnail': r're:^https?://.+\.jpg',
'subtitles': {
'origin': [{
'ext': 'lrc',
}]
},
'duration': 255,
'track': 'Xa Mãi Xa',
'artist': 'Bảo Thy',
'album': 'Special Album',
'album_artist': 'Bảo Thy',
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'https://mp3.zing.vn/video-clip/Suong-Hoa-Dua-Loi-K-ICM-RYO/ZO8ZF7C7.html',
'md5': 'e9c972b693aa88301ef981c8151c4343',
'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html',
'md5': '870295a9cd8045c0e15663565902618d',
'info_dict': {
'id': 'ZO8ZF7C7',
'title': 'Sương Hoa Đưa Lối',
'id': 'ZW6BAEA0',
'title': 'Let It Go (Frozen OST)',
'ext': 'mp4',
'thumbnail': r're:^https?://.+\.jpg',
'duration': 207,
'track': 'Sương Hoa Đưa Lối',
'artist': 'K-ICM, RYO',
},
}, {
'url': 'https://zingmp3.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'info_dict': {
'_type': 'playlist',
'id': 'ZWZBWDAF',
'title': 'Lâu Đài Tình Ái - Bằng Kiều,Minh Tuyết | Album 320 lossless',
},
'playlist_count': 10,
'skip': 'removed at the request of the owner',
}, {
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
'only_matching': True,
}]
IE_NAME = 'zingmp3'
IE_DESC = 'mp3.zing.vn'
def _process_data(self, data):
return self._extract_item(data, True)
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id)
class ZingMp3AlbumIE(ZingMp3BaseIE):
_VALID_URL = ZingMp3BaseIE._VALID_URL_TMPL % 'album|playlist'
_TESTS = [{
'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'info_dict': {
'_type': 'playlist',
'id': 'ZWZBWDAF',
'title': 'Lâu Đài Tình Ái',
},
'playlist_count': 10,
}, {
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
'only_matching': True,
}, {
'url': 'https://zingmp3.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
'only_matching': True,
}]
IE_NAME = 'zingmp3:album'
player_json_url = self._search_regex([
r'data-xml="([^"]+)',
r'&amp;xmlURL=([^&]+)&'
], webpage, 'player xml url')
def _process_data(self, data):
def entries():
for item in (data.get('items') or []):
entry = self._extract_item(item, False)
if entry:
yield entry
info = data.get('info') or {}
return self.playlist_result(
entries(), info.get('id'), info.get('name') or info.get('title'))
playlist_title = None
page_type = self._search_regex(r'/(?:html5)?xml/([^/-]+)', player_json_url, 'page type')
if page_type == 'video':
player_json_url = update_url_query(player_json_url, {'format': 'json'})
else:
player_json_url = player_json_url.replace('/xml/', '/html5xml/')
if page_type == 'album':
playlist_title = self._og_search_title(webpage)
return self._extract_player_json(player_json_url, page_id, page_type, playlist_title)

View File

@@ -1,68 +1,82 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
url_or_none,
parse_filesize,
urlencode_postdata,
urlencode_postdata
)
class ZoomIE(InfoExtractor):
IE_NAME = 'zoom'
_VALID_URL = r'(?P<base_url>https?://(?:[^.]+\.)?zoom.us/)rec(?:ording)?/(?:play|share)/(?P<id>[A-Za-z0-9_.-]+)'
_VALID_URL = r'https://(?:.*).?zoom.us/rec(?:ording)?/(play|share)/(?P<id>[A-Za-z0-9\-_.]+)'
_TEST = {
'url': 'https://economist.zoom.us/rec/play/dUk_CNBETmZ5VA2BwEl-jjakPpJ3M1pcfVYAPRsoIbEByGsLjUZtaa4yCATQuOL3der8BlTwxQePl_j0.EImBkXzTIaPvdZO5',
'md5': 'ab445e8c911fddc4f9adc842c2c5d434',
'url': 'https://zoom.us/recording/play/SILVuCL4bFtRwWTtOCFQQxAsBQsJljFtm9e4Z_bvo-A8B-nzUSYZRNuPl3qW5IGK',
'info_dict': {
'id': 'dUk_CNBETmZ5VA2BwEl-jjakPpJ3M1pcfVYAPRsoIbEByGsLjUZtaa4yCATQuOL3der8BlTwxQePl_j0.EImBkXzTIaPvdZO5',
'ext': 'mp4',
'title': 'China\'s "two sessions" and the new five-year plan',
'md5': '031a5b379f1547a8b29c5c4c837dccf2',
'title': "GAZ Transformational Tuesdays W/ Landon & Stapes",
'id': "SILVuCL4bFtRwWTtOCFQQxAsBQsJljFtm9e4Z_bvo-A8B-nzUSYZRNuPl3qW5IGK",
'ext': "mp4"
}
}
def _real_extract(self, url):
base_url, play_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, play_id)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
try:
form = self._form_hidden_inputs('password_form', webpage)
except ExtractorError:
form = None
if form:
password = self._downloader.params.get('videopassword')
if not password:
raise ExtractorError(
'This video is protected by a passcode, use the --video-password option', expected=True)
is_meeting = form.get('useWhichPasswd') == 'meeting'
validation = self._download_json(
base_url + 'rec/validate%s_passwd' % ('_meet' if is_meeting else ''),
play_id, 'Validating passcode', 'Wrong passcode', data=urlencode_postdata({
'id': form[('meet' if is_meeting else 'file') + 'Id'],
'passwd': password,
'action': form.get('action'),
}))
if not validation.get('status'):
raise ExtractorError(validation['errorMessage'], expected=True)
webpage = self._download_webpage(url, play_id)
password_protected = self._search_regex(r'<form[^>]+?id="(password_form)"', webpage, 'password field', fatal=False, default=None)
if password_protected is not None:
self._verify_video_password(url, display_id, webpage)
webpage = self._download_webpage(url, display_id)
data = self._parse_json(self._search_regex(
r'(?s)window\.__data__\s*=\s*({.+?});',
webpage, 'data'), play_id, js_to_json)
video_url = self._search_regex(r"viewMp4Url: \'(.*)\'", webpage, 'video url')
title = self._html_search_regex([r"topic: \"(.*)\",", r"<title>(.*) - Zoom</title>"], webpage, 'title')
viewResolvtionsWidth = self._search_regex(r"viewResolvtionsWidth: (\d*)", webpage, 'res width', fatal=False)
viewResolvtionsHeight = self._search_regex(r"viewResolvtionsHeight: (\d*)", webpage, 'res height', fatal=False)
fileSize = parse_filesize(self._search_regex(r"fileSize: \'(.+)\'", webpage, 'fileSize', fatal=False))
urlprefix = url.split("zoom.us")[0] + "zoom.us/"
formats = []
formats.append({
'url': url_or_none(video_url),
'width': int_or_none(viewResolvtionsWidth),
'height': int_or_none(viewResolvtionsHeight),
'http_headers': {'Accept': 'video/webm,video/ogg,video/*;q=0.9,application/ogg;q=0.7,audio/*;q=0.6,*/*;q=0.5',
'Referer': urlprefix},
'ext': "mp4",
'filesize_approx': int_or_none(fileSize)
})
self._sort_formats(formats)
return {
'id': play_id,
'title': data['topic'],
'url': data['viewMp4Url'],
'width': int_or_none(data.get('viewResolvtionsWidth')),
'height': int_or_none(data.get('viewResolvtionsHeight')),
'http_headers': {
'Referer': base_url,
},
'filesize_approx': parse_filesize(data.get('fileSize')),
'id': display_id,
'title': title,
'formats': formats
}
def _verify_video_password(self, url, video_id, webpage):
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
meetId = self._search_regex(r'<input[^>]+?id="meetId" value="([^\"]+)"', webpage, 'meetId')
data = urlencode_postdata({
'id': meetId,
'passwd': password,
'action': "viewdetailedpage",
'recaptcha': ""
})
validation_url = url.split("zoom.us")[0] + "zoom.us/rec/validate_meet_passwd"
validation_response = self._download_json(
validation_url, video_id,
note='Validating Password...',
errnote='Wrong password?',
data=data)
if validation_response['errorCode'] != 0:
raise ExtractorError('Login failed, %s said: %r' % (self.IE_NAME, validation_response['errorMessage']))

View File

@@ -214,11 +214,12 @@ def parseOpts(overrideArguments=None):
help='Mark videos watched (YouTube only)')
general.add_option(
'--no-mark-watched',
action='store_false', dest='mark_watched',
help='Do not mark videos watched (default)')
action='store_false', dest='mark_watched', default=False,
help='Do not mark videos watched')
general.add_option(
'--no-colors',
action='store_true', dest='no_color', default=False,
action='store_true', dest='no_color',
default=False,
help='Do not emit color codes in output')
network = optparse.OptionGroup(parser, 'Network Options')
@@ -533,11 +534,11 @@ def parseOpts(overrideArguments=None):
subtitles.add_option(
'--write-auto-subs', '--write-automatic-subs',
action='store_true', dest='writeautomaticsub', default=False,
help='Write automatically generated subtitle file (Alias: --write-automatic-subs)')
help='Write automatically generated subtitle file (YouTube only)')
subtitles.add_option(
'--no-write-auto-subs', '--no-write-automatic-subs',
action='store_false', dest='writeautomaticsub', default=False,
help='Do not write auto-generated subtitles (default) (Alias: --no-write-automatic-subs)')
help='Do not write automatically generated subtitle file (default)')
subtitles.add_option(
'--all-subs',
action='store_true', dest='allsubtitles', default=False,
@@ -551,16 +552,12 @@ def parseOpts(overrideArguments=None):
action='store', dest='subtitlesformat', metavar='FORMAT', default='best',
help='Subtitle format, accepts formats preference, for example: "srt" or "ass/srt/best"')
subtitles.add_option(
'--sub-langs', '--srt-langs',
'--sub-lang', '--sub-langs', '--srt-lang',
action='callback', dest='subtitleslangs', metavar='LANGS', type='str',
default=[], callback=_comma_separated_values_options_callback,
help='Languages of the subtitles to download (optional) separated by commas, use --list-subs for available language tags')
downloader = optparse.OptionGroup(parser, 'Download Options')
downloader.add_option(
'-N', '--concurrent-fragments',
dest='concurrent_fragment_downloads', metavar='N', default=1, type=int,
help='Number of fragments to download concurrently (default is %default)')
downloader.add_option(
'-r', '--limit-rate', '--rate-limit',
dest='ratelimit', metavar='RATE',
@@ -679,7 +676,7 @@ def parseOpts(overrideArguments=None):
workarounds.add_option(
'--prefer-insecure', '--prefer-unsecure',
action='store_true', dest='prefer_insecure',
help='Use an unencrypted connection to retrieve information about the video (Currently supported only for YouTube)')
help='Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)')
workarounds.add_option(
'--user-agent',
metavar='UA', dest='user_agent',
@@ -707,13 +704,17 @@ def parseOpts(overrideArguments=None):
'--sleep-interval', '--min-sleep-interval', metavar='SECONDS',
dest='sleep_interval', type=float,
help=(
'Number of seconds to sleep before each download. '
'This is the minimum time to sleep when used along with --max-sleep-interval '
'(Alias: --min-sleep-interval)'))
'Number of seconds to sleep before each download when used alone '
'or a lower bound of a range for randomized sleep before each download '
'(minimum possible number of seconds to sleep) when used along with '
'--max-sleep-interval'))
workarounds.add_option(
'--max-sleep-interval', metavar='SECONDS',
dest='max_sleep_interval', type=float,
help='Maximum number of seconds to sleep. Can only be used along with --min-sleep-interval')
help=(
'Upper bound of a range for randomized sleep before each download '
'(maximum possible number of seconds to sleep). Must only be used '
'along with --min-sleep-interval'))
workarounds.add_option(
'--sleep-subtitles', metavar='SECONDS',
dest='sleep_interval_subtitles', default=0, type=int,
@@ -735,7 +736,7 @@ def parseOpts(overrideArguments=None):
verbosity.add_option(
'--skip-download', '--no-download',
action='store_true', dest='skip_download', default=False,
help='Do not download the video but write all related files (Alias: --no-download)')
help='Do not download the video')
verbosity.add_option(
'-g', '--get-url',
action='store_true', dest='geturl', default=False,
@@ -862,7 +863,7 @@ def parseOpts(overrideArguments=None):
callback_kwargs={
'allowed_keys': '|'.join(OUTTMPL_TYPES.keys()),
'default_key': 'default', 'process': lambda x: x.strip()},
help='Output filename template; see "OUTPUT TEMPLATE" for details')
help='Output filename template, see "OUTPUT TEMPLATE" for details')
filesystem.add_option(
'--output-na-placeholder',
dest='outtmpl_na_placeholder', metavar='TEXT', default='NA',
@@ -978,17 +979,9 @@ def parseOpts(overrideArguments=None):
filesystem.add_option(
'--no-write-playlist-metafiles',
action='store_false', dest='allow_playlist_files',
help='Do not write playlist metadata when using --write-info-json, --write-description etc.')
filesystem.add_option(
'--clean-infojson',
action='store_true', dest='clean_infojson', default=True,
help=(
'Remove some private fields such as filenames from the infojson. '
'Note that it could still contain some personal information (default)'))
filesystem.add_option(
'--no-clean-infojson',
action='store_false', dest='clean_infojson',
help='Write all fields to the infojson')
'Do not write playlist metadata when using '
'--write-info-json, --write-description etc.'))
filesystem.add_option(
'--get-comments',
action='store_true', dest='getcomments', default=False,
@@ -1090,12 +1083,12 @@ def parseOpts(overrideArguments=None):
'Specify the postprocessor/executable name and the arguments separated by a colon ":" '
'to give the argument to the specified postprocessor/executable. Supported postprocessors are: '
'SponSkrub, ExtractAudio, VideoRemuxer, VideoConvertor, EmbedSubtitle, Metadata, Merger, '
'FixupStretched, FixupM4a, FixupM3u8, SubtitlesConvertor, EmbedThumbnail and SplitChapters. '
'FixupStretched, FixupM4a, FixupM3u8, SubtitlesConvertor and EmbedThumbnail. '
'The supported executables are: SponSkrub, FFmpeg, FFprobe, and AtomicParsley. '
'You can also specify "PP+EXE:ARGS" to give the arguments to the specified executable '
'only when being used by the specified postprocessor. Additionally, for ffmpeg/ffprobe, '
'"_i"/"_o" can be appended to the prefix optionally followed by a number to pass the argument '
'before the specified input/output file. Eg: --ppa "Merger+ffmpeg_i1:-v quiet". '
'a number can be appended to the exe name seperated by "_i" to pass the argument '
'before the specified input file. Eg: --ppa "Merger+ffmpeg_i1:-v quiet". '
'You can use this option multiple times to give different arguments to different '
'postprocessors. (Alias: --ppa)'))
postproc.add_option(
@@ -1144,17 +1137,24 @@ def parseOpts(overrideArguments=None):
help=optparse.SUPPRESS_HELP)
postproc.add_option(
'--parse-metadata',
metavar='FROM:TO', dest='metafromfield', action='append',
metavar='FIELD:FORMAT', dest='metafromfield', action='append',
help=(
'Parse additional metadata like title/artist from other fields; '
'see "MODIFYING METADATA" for details'))
'Parse additional metadata like title/artist from other fields. '
'Give field name to extract data from, and format of the field seperated by a ":". '
'Either regular expression with named capture groups or a '
'similar syntax to the output template can also be used. '
'The parsed parameters replace any existing values and can be use in output template. '
'This option can be used multiple times. '
'Example: --parse-metadata "title:%(artist)s - %(title)s" matches a title like '
'"Coldplay - Paradise". '
'Example (regex): --parse-metadata "description:Artist - (?P<artist>.+?)"'))
postproc.add_option(
'--xattrs',
action='store_true', dest='xattrs', default=False,
help='Write metadata to the video file\'s xattrs (using dublin core and xdg standards)')
postproc.add_option(
'--fixup',
metavar='POLICY', dest='fixup', default=None,
metavar='POLICY', dest='fixup', default='detect_or_warn',
help=(
'Automatically correct known faults of the file. '
'One of never (do nothing), warn (only emit a warning), '
@@ -1176,20 +1176,9 @@ def parseOpts(overrideArguments=None):
metavar='CMD', dest='exec_cmd',
help='Execute a command on the file after downloading and post-processing, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
postproc.add_option(
'--convert-subs', '--convert-sub', '--convert-subtitles',
'--convert-subs', '--convert-subtitles',
metavar='FORMAT', dest='convertsubtitles', default=None,
help='Convert the subtitles to another format (currently supported: srt|ass|vtt|lrc) (Alias: --convert-subtitles)')
postproc.add_option(
'--split-chapters', '--split-tracks',
dest='split_chapters', action='store_true', default=False,
help=(
'Split video into multiple files based on internal chapters. '
'The "chapter:" prefix can be used with "--paths" and "--output" to '
'set the output filename for the split files. See "OUTPUT TEMPLATE" for details'))
postproc.add_option(
'--no-split-chapters', '--no-split-tracks',
dest='split_chapters', action='store_false',
help='Do not split video based on chapters (default)')
help='Convert the subtitles to other format (currently supported: srt|ass|vtt|lrc)')
sponskrub = optparse.OptionGroup(parser, 'SponSkrub (SponsorBlock) Options', description=(
'SponSkrub (https://github.com/yt-dlp/SponSkrub) is a utility to mark/remove sponsor segments '

View File

@@ -13,7 +13,6 @@ from .ffmpeg import (
FFmpegVideoConvertorPP,
FFmpegVideoRemuxerPP,
FFmpegSubtitlesConvertorPP,
FFmpegSplitChaptersPP,
)
from .xattrpp import XAttrMetadataPP
from .execafterdownload import ExecAfterDownloadPP
@@ -32,7 +31,6 @@ __all__ = [
'ExecAfterDownloadPP',
'FFmpegEmbedSubtitlePP',
'FFmpegExtractAudioPP',
'FFmpegSplitChaptersPP',
'FFmpegFixupM3u8PP',
'FFmpegFixupM4aPP',
'FFmpegFixupStretchedPP',

View File

@@ -91,18 +91,10 @@ class PostProcessor(object):
except Exception:
self.report_warning(errnote)
def _configuration_args(self, exe, keys=None, default=[], use_compat=True):
pp_key = self.pp_key().lower()
exe = exe.lower()
root_key = exe if pp_key == exe else '%s+%s' % (pp_key, exe)
keys = ['%s%s' % (root_key, k) for k in (keys or [''])]
if root_key in keys:
keys += [root_key] + ([] if pp_key == exe else [(self.pp_key(), exe)]) + ['default']
else:
use_compat = False
def _configuration_args(self, *args, **kwargs):
return cli_configuration_args(
self._downloader.params.get('postprocessor_args'),
keys, default, use_compat)
self.pp_key().lower(), *args, **kwargs)
class AudioConversionError(PostProcessingError):

View File

@@ -47,7 +47,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
self.to_screen('There aren\'t any thumbnails to embed')
return [], info
initial_thumbnail = original_thumbnail = thumbnail_filename = info['thumbnails'][-1]['filepath']
original_thumbnail = thumbnail_filename = info['thumbnails'][-1]['filename']
if not os.path.exists(encodeFilename(thumbnail_filename)):
self.report_warning('Skipping embedding the thumbnail because the file is missing.')
@@ -65,8 +65,6 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
if thumbnail_ext != 'webp' and is_webp(thumbnail_filename):
self.to_screen('Correcting extension to webp and escaping path for thumbnail "%s"' % thumbnail_filename)
thumbnail_webp_filename = replace_extension(thumbnail_filename, 'webp')
if os.path.exists(thumbnail_webp_filename):
os.remove(thumbnail_webp_filename)
os.rename(encodeFilename(thumbnail_filename), encodeFilename(thumbnail_webp_filename))
original_thumbnail = thumbnail_filename = thumbnail_webp_filename
thumbnail_ext = 'webp'
@@ -87,8 +85,6 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
thumbnail_filename = thumbnail_jpg_filename
thumbnail_ext = 'jpg'
mtime = os.stat(encodeFilename(filename)).st_mtime
success = True
if info['ext'] == 'mp3':
options = [
@@ -135,7 +131,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
x for x in ['AtomicParsley', 'atomicparsley']
if check_executable(x, ['-v'])), None)
if atomicparsley is None:
raise EmbedThumbnailPPError('AtomicParsley was not found. Please install')
raise EmbedThumbnailPPError('AtomicParsley was not found. Please install.')
cmd = [encodeFilename(atomicparsley, True),
encodeFilename(filename, True),
@@ -143,7 +139,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
encodeFilename(thumbnail_filename, True),
encodeArgument('-o'),
encodeFilename(temp_filename, True)]
cmd += [encodeArgument(o) for o in self._configuration_args('AtomicParsley')]
cmd += [encodeArgument(o) for o in self._configuration_args(exe='AtomicParsley')]
self.to_screen('Adding thumbnail to "%s"' % filename)
self.write_debug('AtomicParsley command line: %s' % shell_quote(cmd))
@@ -191,13 +187,10 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
self.try_utime(filename, mtime, mtime)
files_to_delete = [thumbnail_filename]
if self._already_have_thumbnail:
info['__files_to_move'][original_thumbnail] = replace_extension(
info['__files_to_move'][initial_thumbnail],
os.path.splitext(original_thumbnail)[1][1:])
info['__thumbnail_filename'], os.path.splitext(original_thumbnail)[1][1:])
if original_thumbnail == thumbnail_filename:
files_to_delete = []
elif original_thumbnail != thumbnail_filename:

View File

@@ -10,7 +10,6 @@ import json
from .common import AudioConversionError, PostProcessor
from ..compat import compat_str, compat_numeric_types
from ..utils import (
encodeArgument,
encodeFilename,
@@ -19,6 +18,7 @@ from ..utils import (
PostProcessingError,
prepend_extension,
shell_quote,
subtitles_filename,
dfxp2srt,
ISO639Utils,
process_communicate_or_kill,
@@ -61,7 +61,7 @@ class FFmpegPostProcessor(PostProcessor):
def check_version(self):
if not self.available:
raise FFmpegPostProcessorError('ffmpeg not found. Please install or provide the path using --ffmpeg-location')
raise FFmpegPostProcessorError('ffmpeg not found. Please install')
required_version = '10-0' if self.basename == 'avconv' else '1.0'
if is_outdated_version(
@@ -165,7 +165,7 @@ class FFmpegPostProcessor(PostProcessor):
def get_audio_codec(self, path):
if not self.probe_available and not self.available:
raise PostProcessingError('ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location')
raise PostProcessingError('ffprobe and ffmpeg not found. Please install')
try:
if self.probe_available:
cmd = [
@@ -207,7 +207,7 @@ class FFmpegPostProcessor(PostProcessor):
if self.probe_basename != 'ffprobe':
if self.probe_available:
self.report_warning('Only ffprobe is supported for metadata extraction')
raise PostProcessingError('ffprobe not found. Please install or provide the path using --ffmpeg-location')
raise PostProcessingError('ffprobe not found. Please install.')
self.check_version()
cmd = [
@@ -234,35 +234,25 @@ class FFmpegPostProcessor(PostProcessor):
return num, len(streams)
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):
return self.real_run_ffmpeg(
[(path, []) for path in input_paths],
[(out_path, opts)])
def real_run_ffmpeg(self, input_path_opts, output_path_opts):
self.check_version()
oldest_mtime = min(
os.stat(encodeFilename(path)).st_mtime for path, _ in input_path_opts)
os.stat(encodeFilename(path)).st_mtime for path in input_paths)
cmd = [encodeFilename(self.executable, True), encodeArgument('-y')]
# avconv does not have repeat option
if self.basename == 'ffmpeg':
cmd += [encodeArgument('-loglevel'), encodeArgument('repeat+info')]
def make_args(file, args, name, number):
keys = ['_%s%d' % (name, number), '_%s' % name]
if name == 'o' and number == 1:
keys.append('')
args += self._configuration_args(self.basename, keys)
if name == 'i':
args.append('-i')
def make_args(file, pre=[], post=[], *args, **kwargs):
args = pre + self._configuration_args(*args, **kwargs) + post
return (
[encodeArgument(arg) for arg in args]
[encodeArgument(o) for o in args]
+ [encodeFilename(self._ffmpeg_filename_argument(file), True)])
for arg_type, path_opts in (('i', input_path_opts), ('o', output_path_opts)):
cmd += [arg for i, o in enumerate(path_opts)
for arg in make_args(o[0], o[1], arg_type, i + 1)]
for i, path in enumerate(input_paths):
cmd += make_args(path, post=['-i'], exe='%s_i%d' % (self.basename, i + 1), use_default_arg=False)
cmd += make_args(out_path, pre=opts, exe=self.basename)
self.write_debug('ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
@@ -272,8 +262,7 @@ class FFmpegPostProcessor(PostProcessor):
if self.get_param('verbose', False):
self.report_error(stderr)
raise FFmpegPostProcessorError(stderr.split('\n')[-1])
for out_path, _ in output_path_opts:
self.try_utime(out_path, oldest_mtime, oldest_mtime)
self.try_utime(out_path, oldest_mtime, oldest_mtime)
return stderr.decode('utf-8', 'replace')
def run_ffmpeg(self, path, out_path, opts):
@@ -485,7 +474,7 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
self.report_warning('JSON subtitles cannot be embedded')
elif ext != 'webm' or ext == 'webm' and sub_ext == 'vtt':
sub_langs.append(lang)
sub_filenames.append(sub_info['filepath'])
sub_filenames.append(subtitles_filename(filename, lang, sub_ext, ext))
else:
if not webm_vtt_warn and ext == 'webm' and sub_ext != 'vtt':
webm_vtt_warn = True
@@ -530,8 +519,6 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
metadata = {}
def add(meta_list, info_list=None):
if not meta_list:
return
if not info_list:
info_list = meta_list
if not isinstance(meta_list, (list, tuple)):
@@ -539,7 +526,7 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
if not isinstance(info_list, (list, tuple)):
info_list = (info_list,)
for info_f in info_list:
if isinstance(info.get(info_f), (compat_str, compat_numeric_types)):
if info.get(info_f) is not None:
for meta_f in meta_list:
metadata[meta_f] = info[info_f]
break
@@ -552,8 +539,8 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
add('title', ('track', 'title'))
add('date', 'upload_date')
add(('description', 'synopsis'), 'description')
add(('purl', 'comment'), 'webpage_url')
add(('description', 'comment'), 'description')
add('purl', 'webpage_url')
add('track', 'track_number')
add('artist', ('artist', 'creator', 'uploader', 'uploader_id'))
add('genre')
@@ -565,10 +552,6 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
add('episode_id', ('episode', 'episode_id'))
add('episode_sort', 'episode_number')
prefix = 'meta_'
for key in filter(lambda k: k.startswith(prefix), info.keys()):
add(key[len(prefix):], key)
if not metadata:
self.to_screen('There isn\'t any metadata to add')
return [], info
@@ -583,7 +566,7 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
else:
options.extend(['-c', 'copy'])
for name, value in metadata.items():
for (name, value) in metadata.items():
options.extend(['-metadata', '%s=%s' % (name, value)])
chapters = info.get('chapters', [])
@@ -717,6 +700,7 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
def run(self, info):
subs = info.get('requested_subtitles')
filename = info['filepath']
new_ext = self.format
new_format = new_ext
if new_format == 'vtt':
@@ -736,9 +720,9 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
'You have requested to convert json subtitles into another format, '
'which is currently not possible')
continue
old_file = sub['filepath']
old_file = subtitles_filename(filename, lang, ext, info.get('ext'))
sub_filenames.append(old_file)
new_file = replace_extension(old_file, new_ext)
new_file = subtitles_filename(filename, lang, new_ext, info.get('ext'))
if ext in ('dfxp', 'ttml', 'tt'):
self.report_warning(
@@ -746,7 +730,7 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
'which results in style information loss')
dfxp_file = old_file
srt_file = replace_extension(old_file, 'srt')
srt_file = subtitles_filename(filename, lang, 'srt', info.get('ext'))
with open(dfxp_file, 'rb') as f:
srt_data = dfxp2srt(f.read())
@@ -757,8 +741,7 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
subs[lang] = {
'ext': 'srt',
'data': srt_data,
'filepath': srt_file,
'data': srt_data
}
if new_ext == 'srt':
@@ -772,47 +755,6 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
subs[lang] = {
'ext': new_ext,
'data': f.read(),
'filepath': new_file,
}
info['__files_to_move'][new_file] = replace_extension(
info['__files_to_move'][old_file], new_ext)
return sub_filenames, info
class FFmpegSplitChaptersPP(FFmpegPostProcessor):
def _prepare_filename(self, number, chapter, info):
info = info.copy()
info.update({
'section_number': number,
'section_title': chapter.get('title'),
'section_start': chapter.get('start_time'),
'section_end': chapter.get('end_time'),
})
return self._downloader.prepare_filename(info, 'chapter')
def _ffmpeg_args_for_chapter(self, number, chapter, info):
destination = self._prepare_filename(number, chapter, info)
if not self._downloader._ensure_dir_exists(encodeFilename(destination)):
return
chapter['filepath'] = destination
self.to_screen('Chapter %03d; Destination: %s' % (number, destination))
return (
destination,
['-ss', compat_str(chapter['start_time']),
'-t', compat_str(chapter['end_time'] - chapter['start_time'])])
def run(self, info):
chapters = info.get('chapters') or []
if not chapters:
self.report_warning('Chapter information is unavailable')
return [], info
self.to_screen('Splitting video by chapters; %d chapters found' % len(chapters))
for idx, chapter in enumerate(chapters):
destination, opts = self._ffmpeg_args_for_chapter(idx + 1, chapter, info)
self.real_run_ffmpeg([(info['filepath'], opts)], [(destination, ['-c', 'copy'])])
return [], info

View File

@@ -4,10 +4,11 @@ import re
from .common import PostProcessor
from ..compat import compat_str
from ..utils import str_or_none
class MetadataFromFieldPP(PostProcessor):
regex = r'(?P<in>.+):(?P<out>.+)$'
regex = r'(?P<field>\w+):(?P<format>.+)$'
def __init__(self, downloader, formats):
PostProcessor.__init__(self, downloader)
@@ -18,20 +19,11 @@ class MetadataFromFieldPP(PostProcessor):
match = re.match(self.regex, f)
assert match is not None
self._data.append({
'in': match.group('in'),
'out': match.group('out'),
'tmpl': self.field_to_template(match.group('in')),
'regex': self.format_to_regex(match.group('out')),
})
'field': match.group('field'),
'format': match.group('format'),
'regex': self.format_to_regex(match.group('format'))})
@staticmethod
def field_to_template(tmpl):
if re.match(r'\w+$', tmpl):
return '%%(%s)s' % tmpl
return tmpl
@staticmethod
def format_to_regex(fmt):
def format_to_regex(self, fmt):
r"""
Converts a string like
'%(title)s - %(artist)s'
@@ -45,7 +37,7 @@ class MetadataFromFieldPP(PostProcessor):
# replace %(..)s with regex group and escape other string parts
for match in re.finditer(r'%\((\w+)\)s', fmt):
regex += re.escape(fmt[lastpos:match.start()])
regex += r'(?P<%s>.+)' % match.group(1)
regex += r'(?P<' + match.group(1) + r'>[^\r\n]+)'
lastpos = match.end()
if lastpos < len(fmt):
regex += re.escape(fmt[lastpos:])
@@ -53,16 +45,22 @@ class MetadataFromFieldPP(PostProcessor):
def run(self, info):
for dictn in self._data:
tmpl, info_copy = self._downloader.prepare_outtmpl(dictn['tmpl'], info)
data_to_parse = tmpl % info_copy
self.write_debug('Searching for r"%s" in %s' % (dictn['regex'], tmpl))
match = re.search(dictn['regex'], data_to_parse)
field, regex = dictn['field'], dictn['regex']
if field not in info:
self.report_warning('Video doesnot have a %s' % field)
continue
data_to_parse = str_or_none(info[field])
if data_to_parse is None:
self.report_warning('Field %s cannot be parsed' % field)
continue
self.write_debug('Searching for r"%s" in %s' % (regex, field))
match = re.search(regex, data_to_parse)
if match is None:
self.report_warning('Could not interpret video %s as "%s"' % (dictn['in'], dictn['out']))
self.report_warning('Could not interpret video %s as "%s"' % (field, dictn['format']))
continue
for attribute, value in match.groupdict().items():
info[attribute] = value
self.to_screen('parsed %s from "%s": %s' % (attribute, dictn['in'], value if value is not None else 'NA'))
self.to_screen('parsed %s from %s: %s' % (attribute, field, value if value is not None else 'NA'))
return [], info

View File

@@ -13,6 +13,10 @@ from ..utils import (
class MoveFilesAfterDownloadPP(PostProcessor):
def __init__(self, downloader, files_to_move):
PostProcessor.__init__(self, downloader)
self.files_to_move = files_to_move
@classmethod
def pp_key(cls):
return 'MoveFiles'
@@ -21,10 +25,11 @@ class MoveFilesAfterDownloadPP(PostProcessor):
dl_path, dl_name = os.path.split(encodeFilename(info['filepath']))
finaldir = info.get('__finaldir', dl_path)
finalpath = os.path.join(finaldir, dl_name)
info['__files_to_move'][info['filepath']] = decodeFilename(finalpath)
self.files_to_move.update(info['__files_to_move'])
self.files_to_move[info['filepath']] = decodeFilename(finalpath)
make_newfilename = lambda old: decodeFilename(os.path.join(finaldir, os.path.basename(encodeFilename(old))))
for oldfile, newfile in info['__files_to_move'].items():
for oldfile, newfile in self.files_to_move.items():
if not newfile:
newfile = make_newfilename(oldfile)
if os.path.abspath(encodeFilename(oldfile)) == os.path.abspath(encodeFilename(newfile)):

View File

@@ -6,7 +6,6 @@ from .common import PostProcessor
from ..compat import compat_shlex_split
from ..utils import (
check_executable,
cli_option,
encodeArgument,
encodeFilename,
shell_quote,
@@ -32,7 +31,7 @@ class SponSkrubPP(PostProcessor):
if path:
raise PostProcessingError('sponskrub not found in "%s"' % path)
else:
raise PostProcessingError('sponskrub not found. Please install or provide the path using --sponskrub-path')
raise PostProcessingError('sponskrub not found. Please install or provide the path using --sponskrub-path.')
def get_exe(self, path=''):
if not path or not check_executable(path, ['-h']):
@@ -71,9 +70,8 @@ class SponSkrubPP(PostProcessor):
cmd = [self.path]
if not self.cutout:
cmd += ['-chapter']
cmd += cli_option(self._downloader.params, '-proxy', 'proxy')
cmd += compat_shlex_split(self.args) # For backward compatibility
cmd += self._configuration_args(self._exe_name, use_compat=False)
cmd += self._configuration_args(exe=self._exe_name, use_default_arg='no_compat')
cmd += ['--', information['id'], filename, temp_filename]
cmd = [encodeArgument(i) for i in cmd]

View File

@@ -49,16 +49,12 @@ def update_self(to_screen, verbose, opener):
h.update(mv[:n])
return h.hexdigest()
to_screen('Current Build Hash %s' % calc_sha256sum(sys.executable))
if not isinstance(globals().get('__loader__'), zipimporter) and not hasattr(sys, 'frozen'):
to_screen('It looks like you installed yt-dlp with a package manager, pip, setup.py or a tarball. Please use that to update.')
return
# sys.executable is set to the full pathname of the exe-file for py2exe
# though symlinks are not followed so that we need to do this manually
# with help of realpath
filename = compat_realpath(sys.executable if hasattr(sys, 'frozen') else sys.argv[0])
to_screen('Current Build Hash %s' % calc_sha256sum(filename))
# Download and check versions info
try:
version_info = opener.open(JSON_URL).read().decode('utf-8')
@@ -107,6 +103,11 @@ def update_self(to_screen, verbose, opener):
(i[1] for i in hashes if i[0] == 'yt-dlp%s' % label),
None)
# sys.executable is set to the full pathname of the exe-file for py2exe
# though symlinks are not followed so that we need to do this manually
# with help of realpath
filename = compat_realpath(sys.executable if hasattr(sys, 'frozen') else sys.argv[0])
if not os.access(filename, os.W_OK):
to_screen('ERROR: no write permissions on %s' % filename)
return
@@ -197,18 +198,28 @@ def update_self(to_screen, verbose, opener):
to_screen('Visit https://github.com/yt-dlp/yt-dlp/releases/latest')
return
expected_sum = get_sha256sum('zip', py_ver)
if expected_sum and hashlib.sha256(newcontent).hexdigest() != expected_sum:
to_screen('ERROR: unable to verify the new zip')
to_screen('Visit https://github.com/yt-dlp/yt-dlp/releases/latest')
return
try:
with open(filename, 'wb') as outf:
with open(filename + '.new', 'wb') as outf:
outf.write(newcontent)
except (IOError, OSError):
if verbose:
to_screen(encode_compat_str(traceback.format_exc()))
to_screen('ERROR: unable to write the new version')
return
expected_sum = get_sha256sum('zip', py_ver)
if expected_sum and calc_sha256sum(filename + '.new') != expected_sum:
to_screen('ERROR: unable to verify the new zip')
to_screen('Visit https://github.com/yt-dlp/yt-dlp/releases/latest')
try:
os.remove(filename + '.new')
except OSError:
to_screen('ERROR: unable to remove corrupt zip')
return
try:
os.rename(filename + '.new', filename)
except OSError:
to_screen('ERROR: unable to overwrite current version')
return

View File

@@ -1836,7 +1836,7 @@ def write_json_file(obj, fn):
try:
with tf:
json.dump(obj, tf, default=repr)
json.dump(obj, tf)
if sys.platform == 'win32':
# Need to remove existing file on Windows, else os.rename raises
# WindowsError or FileExistsError.
@@ -2423,15 +2423,6 @@ class DownloadError(YoutubeDLError):
self.exc_info = exc_info
class EntryNotInPlaylist(YoutubeDLError):
"""Entry not in playlist exception.
This exception will be thrown by YoutubeDL when a requested entry
is not found in the playlist info_dict
"""
pass
class SameFileError(YoutubeDLError):
"""Same File exception.
@@ -4115,7 +4106,6 @@ def parse_age_limit(s):
m = re.match(r'^(?P<age>\d{1,2})\+?$', s)
if m:
return int(m.group('age'))
s = s.upper()
if s in US_RATINGS:
return US_RATINGS[s]
m = re.match(r'^TV[_-]?(%s)$' % '|'.join(k[3:] for k in TV_PARENTAL_GUIDELINES), s)
@@ -4192,10 +4182,8 @@ def qualities(quality_ids):
DEFAULT_OUTTMPL = {
'default': '%(title)s [%(id)s].%(ext)s',
'chapter': '%(title)s - %(section_number)03d %(section_title)s [%(id)s].%(ext)s',
}
OUTTMPL_TYPES = {
'chapter': None,
'subtitle': None,
'thumbnail': None,
'description': 'description',
@@ -4205,20 +4193,6 @@ OUTTMPL_TYPES = {
'pl_infojson': 'info.json',
}
# As of [1] format syntax is:
# %[mapping_key][conversion_flags][minimum_width][.precision][length_modifier]type
# 1. https://docs.python.org/2/library/stdtypes.html#string-formatting
FORMAT_RE = r'''(?x)
(?<!%)
%
\({0}\) # mapping key
(?:[#0\-+ ]+)? # conversion flags (optional)
(?:\d+)? # minimum field width (optional)
(?:\.\d+)? # precision (optional)
[hlL]? # length modifier (optional)
(?P<type>[diouxXeEfFgGcrs%]) # conversion type
'''
def limit_length(s, length):
""" Add ellipses to overly long strings """
@@ -4718,26 +4692,36 @@ def cli_valueless_option(params, command_option, param, expected_value=True):
return [command_option] if param == expected_value else []
def cli_configuration_args(argdict, keys, default=[], use_compat=True):
def cli_configuration_args(argdict, key, default=[], exe=None, use_default_arg=True):
# use_default_arg can be True, False, or 'no_compat'
if isinstance(argdict, (list, tuple)): # for backward compatibility
if use_compat:
if use_default_arg is True:
return argdict
else:
argdict = None
if argdict is None:
return default
assert isinstance(argdict, dict)
assert isinstance(keys, (list, tuple))
for key_list in keys:
if isinstance(key_list, compat_str):
key_list = (key_list,)
arg_list = list(filter(
lambda x: x is not None,
[argdict.get(key.lower()) for key in key_list]))
if arg_list:
return [arg for args in arg_list for arg in args]
return default
key = key.lower()
args = exe_args = None
if exe is not None:
assert isinstance(exe, compat_str)
exe = exe.lower()
args = argdict.get('%s+%s' % (key, exe))
if args is None:
exe_args = argdict.get(exe)
if args is None:
args = argdict.get(key) if key != exe else None
if args is None and exe_args is None:
args = argdict.get('default', default) if use_default_arg else default
args, exe_args = args or [], exe_args or []
assert isinstance(args, (list, tuple))
assert isinstance(exe_args, (list, tuple))
return args + exe_args
class ISO639Utils(object):

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2021.03.24.1'
__version__ = '2021.03.01'