Compare commits

..

97 Commits

Author SHA1 Message Date
pukkandan
418964fa91 Release 2021.08.10 2021-08-10 20:10:39 +05:30
jhwgh1968
c196640ff1 [eroprofile] Add album downloader (#658)
Authored by: jhwgh1968
2021-08-10 19:21:12 +05:30
SsSsS
60c8fc73c6 [instagram] Fix comments extraction (#660)
Authored-by: u-spec-png <miloradkalabasdt@gmail.com>
2021-08-10 18:45:32 +05:30
Ashish
bc8745480e [BandCamp] Add BandcampMusicIE (#668)
Authored by Ashish0804
2021-08-10 18:42:11 +05:30
The Hatsune Daishi
ff5e16f2f6 [mirrativ] Add extractors (#657)
Authored by: nao20010128nao
2021-08-10 08:54:58 +05:30
pukkandan
be2fc5b212 [extractor] Detect sttp as subtitles in MPD
Closes #656
Solution by: fstirlitz
2021-08-10 04:46:48 +05:30
pukkandan
7be9ccff0b [utils] Fix InAdvancePagedList.__getitem__
Since it didn't have any cache, the page was re-fetched for each video.
* Also generalized the cache code
2021-08-10 04:45:25 +05:30
funniray
245d43cacf [crunchyroll] Fix thumbnail (#650)
Authored by: funniray
2021-08-10 03:09:20 +05:30
mzbaulhaque
246fb276e0 [blackboardcollaborate] Add new extractor (#646)
Authored by: Ashish0804
2021-08-10 02:03:12 +05:30
shirt
6e6e0d95b3 [paramountplus] Separate extractor and fix some titles (#652)
Co-authored-by: shirt, pukkandan
2021-08-10 01:54:50 +05:30
Felix S
25a3f4f5d6 [webvtt] Merge daisy-chained duplicate cues (#638)
Fixes: https://github.com/yt-dlp/yt-dlp/issues/631#issuecomment-893338552

Previous deduplication algorithm only removed duplicate cues with
identical text, styles and timestamps.  This change also merges
cues that come in ‘daisy chains’, where sequences of cues with
identical text and styles appear in which the ending timestamp of
one equals the starting timestamp of the next.

This deduplication algorithm has the somewhat unfortunate side effect
that NOTE blocks between cues, if found, will be emitted in a different
order relative to their original cues.  This may be unwanted if perfect
fidelity is desired, but then so is daisy-chain deduplication itself.
NOTE blocks ought to be ignored by WebVTT players in any case.

Authored by: fstirlitz
2021-08-10 01:52:30 +05:30
pukkandan
ad3dc496bb Misc fixes - See desc
* Remove unnecessary uses of _list_from_options_callback
* Fix download tests - Bug from 6e84b21559
* Rename ExecAfterDownloadPP to ExecPP and refactor its tests
* Ensure _write_ytdl_file closes file handle on error - Potential fix for #517
2021-08-10 01:22:55 +05:30
pukkandan
2831b4686c Show libraries present in verbose head 2021-08-10 01:22:55 +05:30
pukkandan
8c0ae192a4 [ffmpeg] Fix --ffmpeg-location when directory is given
Bug introduced in 89efdc15dd
Closes #654
2021-08-10 01:22:55 +05:30
pukkandan
e9f4ccd19e Add option --replace-in-metadata 2021-08-10 01:22:55 +05:30
pukkandan
a38bd1defa [viki] Print error message from API request
Closes #651
2021-08-10 01:21:22 +05:30
shirt
476febeb3a [build] Use custom build of pyinstaller (#663)
Related: #25 

Authored-by: shirt
2021-08-10 01:21:02 +05:30
Ashish
b6a35ad83b [HotStar] Use API for metadata and extract subtitles (#640)
The API is not rate-limited unlike the webpage

Authored by: Ashish0804
2021-08-08 09:45:06 +05:30
SsSsS
bfd56b74b9 [peertube] Fix videos without description (#639)
Authored by: u-spec-png
2021-08-08 09:26:44 +05:30
PSlava
858a65ecc1 [youtube] Improve signature function detection (#641)
Authored by: PSlava (Slava <slash@i-slash.com>)
2021-08-08 09:24:37 +05:30
Wes
3b34e38813 [aenetworks] Update _THEPLATFORM_KEY and _THEPLATFORM_SECRET (#643)
Original PR: https://github.com/ytdl-org/youtube-dl/pull/29749
Fixes: https://github.com/ytdl-org/youtube-dl/issues/29300

Authored by: wesnm
2021-08-08 09:22:31 +05:30
pukkandan
3448870205 [docs] Fix some mistakes and improve doc 2021-08-07 21:41:48 +05:30
pukkandan
b868936cd6 [cleanup] Misc 2021-08-07 21:17:07 +05:30
pukkandan
c681cb5d93 Allow multiple --exec and --exec-before-download 2021-08-07 21:17:07 +05:30
pukkandan
379e44ed3c [youtube] Raise appropriate error when API pages can't be downloaded 2021-08-07 21:17:06 +05:30
pukkandan
243c57cfe8 [tests:download] Add batch testing for extractors
Use `test_YourExtractor_all` to invoke them
2021-08-07 21:17:06 +05:30
pukkandan
28f436bad0 [extractor] Reset non-repeating warnings per video 2021-08-07 21:17:05 +05:30
pukkandan
2b8a2973bd Allow entire infodict to be printed using %()s
Makes `--dump-json` redundant
2021-08-07 21:17:04 +05:30
pukkandan
b7b04c782e Add option --no-simulate to not simulate even when --print or --list... are used
* Deprecates `--print-json`
* Some listings like `--list-extractors` are handled by `yt_dlp` and so are not affected by this. These have been documented as such

Addresses: https://github.com/ytdl-org/youtube-dl/issues/29675, https://github.com/ytdl-org/youtube-dl/issues/29580#issuecomment-882046305
2021-08-07 21:17:03 +05:30
pukkandan
6e84b21559 Fix bugs related to sanitize_info
Related: 8012d892bd (r54555230)
2021-08-07 21:16:55 +05:30
pukkandan
575e17a1b9 [utils] Fix traverse_obj depth when is_user_input 2021-08-07 20:08:22 +05:30
pukkandan
57015a4a3f [youtube] extractor-arg to show live dash formats
If replay is enabled, these formats can be used to download the last 4 hours
2021-08-07 12:47:54 +05:30
pukkandan
9cc1a3130a Fix resuming when using --no-part
Closes #576
2021-08-06 00:55:04 +05:30
pukkandan
b51d2ae3ca Add compat-option no-keep-subs
Closes #630
2021-08-06 00:55:04 +05:30
Jesse
fee5f0c909 [adobepass] Add MSO Cablevision (#635)
Authored by: Jessecar96
2021-08-06 00:53:37 +05:30
funniray
7bb6434767 [vrv] Fix thumbnail extraction (#634)
Authored by: funniray
2021-08-05 21:49:28 +05:30
pukkandan
124bc071ee Fix wrong extension for intermediate files
Closes #632
2021-08-05 19:51:14 +05:30
pukkandan
a047eeb6d2 Add regex to --match-filter
This does not fully deprecate `--match-title`/`--reject-title`
since `--match-filter` is only checked after the extraction is complete,
while `--match-title` can often be checked from the flat playlist.

Fixes: https://github.com/ytdl-org/youtube-dl/issues/9092, https://github.com/ytdl-org/youtube-dl/issues/23035
2021-08-05 04:10:26 +05:30
Max Teegen
77b87f0519 Add all format filtering operators also to --match-filter
PR: https://github.com/ytdl-org/youtube-dl/pull/27361

Authored by: max-te
2021-08-05 03:37:20 +05:30
pukkandan
678da2f21b [twitch:clips] Extract display_id
PR: https://github.com/ytdl-org/youtube-dl/pull/29684
Fixes: https://github.com/ytdl-org/youtube-dl/issues/29666

Authored by: dirkf
2021-08-05 03:37:20 +05:30
pukkandan
cc3fa8d39d Handle BrokenPipeError
PR: https://github.com/ytdl-org/youtube-dl/pull/29505
Fixes: https://github.com/ytdl-org/youtube-dl/issues/29082

Authored by: kikuyan
2021-08-05 03:37:20 +05:30
pukkandan
89efdc15dd [ffpmeg] Allow --ffmpeg-location to be a file with different name 2021-08-05 03:37:18 +05:30
pukkandan
8012d892bd Ensure sanitization of infodict before printing to stdout
* `filter_requested_info` is renamed to a more appropriate name `sanitize_info`
2021-08-05 03:37:16 +05:30
Stavros Ntentos
9d65e7bd6d Fix --compat-options filename (#629)
The correct default filename is `%(title)s-%(id)s.%(ext)s`

Authored by: stdedos
2021-08-04 23:31:37 +05:30
SsSsS
36576d7c4c [Newgrounds] Improve extractor and fix playlist (#627)
Authored by: u-spec-png
2021-08-04 21:18:54 +05:30
nikhil
bb36a55c41 [nbcolympics:stream] Fix extractor
PR: https://github.com/ytdl-org/youtube-dl/pull/29688
Closes: #617, https://github.com/ytdl-org/youtube-dl/issues/29665

* Livestreams are untested
* If using ffmpeg as downloader, v4.3+ is needed since `-http_seekable` option is necessary
* Instead of making a seperate key for each arg that needs to be passed to ffmpeg, I made `_ffmpeg_args`
* This deprecates `_seekable`, but the option is kept for compatibility

Authored by: nchilada, pukkandan
2021-08-04 20:41:59 +05:30
MinePlayersPE
3dbb2a9dcb [RCTIPlus] Support events and TV (#625)
Authored by: MinePlayersPE
2021-08-04 18:42:15 +05:30
The Hatsune Daishi
9997eee4af [openrec] Add extractors (#624)
Authored by: nao20010128nao
2021-08-04 14:44:37 +05:30
Wes
3e376d183e [nbcolympics] Update extractor for 2020 olympics (#621)
Fixes: https://github.com/yt-dlp/yt-dlp/issues/617#issuecomment-891834323

Authored by: wesnm
2021-08-04 09:49:44 +05:30
Sam
888299e6ca [VrtNU] Fix XSRF token (#588)
PR: https://github.com/ytdl-org/youtube-dl/pull/29614
Authored-by: pgaig
2021-08-04 00:11:26 +05:30
pukkandan
c31be5b009 [docs] Document which fields --add-metadata adds to the file
:ci skip all
2021-08-03 01:34:28 +05:30
pukkandan
e5611e8eda [ffmpeg] Fix streaming mp4 to stdout 2021-08-03 00:05:16 +05:30
SsSsS
8e6cc12c80 [Vine] Remove invalid formats (#614)
Authored by: u-spec-png
2021-08-02 23:37:59 +05:30
pukkandan
e980017ac8 [doc] Fix banner URL 2021-08-02 10:45:02 +05:30
pukkandan
e9d9efc0f2 [version] update
:ci skip all
2021-08-02 10:41:58 +05:30
pukkandan
6ccf351a87 Release 2021.08.02 2021-08-02 10:37:10 +05:30
pukkandan
28dff70b51 Add donate links 2021-08-02 08:51:23 +05:30
pukkandan
1aebc0f79e Add logo and banner 2021-08-02 08:51:22 +05:30
pukkandan
cf87314d4e [youtube] Extract SAPISID only once 2021-08-02 08:00:08 +05:30
pukkandan
1bd3639f69 [tenplay] Add MA15+ age limit (#606)
Authored by: pento
2021-08-02 07:52:11 +05:30
LE
68f5867cf0 [CBS] Add fallback (#579)
Related: https://github.com/ytdl-org/youtube-dl/issues/29564
Authored-by: llacb47, pukkandan
2021-08-02 07:46:12 +05:30
Ashish
605cad0be7 [Vimeo] Better extraction of original file (#599)
Authored by: Ashish0804
2021-08-02 07:23:12 +05:30
pukkandan
0855702f3f [test:download] Support testing with ignore_no_formats_error 2021-08-02 03:47:31 +05:30
Ashish
e8384376c0 [CBS] Add ParamountPlusSeriesIE (#603)
Authored by: Ashish0804
2021-08-02 02:58:47 +05:30
David
e7e94f2a5c [youtube] Add age-gate bypass for unverified accounts (#600)
Adds `_creator` variants for each client

Authored by: zerodytrash, colethedj, pukkandan
2021-08-02 02:43:46 +05:30
pukkandan
a46a815b05 [cleanup] Fix linter in 96fccc101f 2021-08-01 12:52:09 +05:30
pukkandan
96fccc101f [downloader] Allow streaming unmerged formats to stdout using ffmpeg
For this to work:
1. The downloader must be ffmpeg
2. The selected formats must have the same protocol
3. The formats must be downloadable by ffmpeg to stdout

Partial solution for: https://github.com/ytdl-org/youtube-dl/issues/28146, https://github.com/ytdl-org/youtube-dl/issues/27265
2021-08-01 12:38:06 +05:30
pukkandan
dbf5416a20 [cleanup] Refactor some code 2021-08-01 12:38:05 +05:30
pukkandan
d74a58a186 Set home: as the default key for -P 2021-08-01 12:13:40 +05:30
pukkandan
f5510afef0 [FormatSort] Fix bug for audio with unknown codec 2021-08-01 12:13:40 +05:30
pukkandan
e4f0275711 Add compat-option no-clean-infojson 2021-08-01 12:13:40 +05:30
pukkandan
e0f2b4b47d [utils] Fix slicing of reversed LazyList
Closes #589
2021-08-01 12:13:40 +05:30
coletdjnz
eca330cb88 [youtube] Fix default global API key
bug introduced in 000c15a4ca
2021-08-01 06:12:26 +00:00
Wes
d24734daea [adobepass] Add MSO Sling TV (#596)
Original PR: ytdl-org/youtube-dl#29686
Closes: #300, ytdl-org/youtube-dl#18132

Authored by: wesnm
2021-07-31 03:35:56 +05:30
MinePlayersPE
d9e6e9481e [RCTIPlus] Remove PhantomJS dependency (#595)
Authored by: MinePlayersPE
2021-07-31 03:22:52 +05:30
pukkandan
3619f78d2c [youtube] Misc cleanup (#577)
Authored by: pukkandan, colethedj
2021-07-31 03:01:49 +05:30
pukkandan
65c2fde23f [youtube] Add thirdParty to agegate clients (#577)
* This allows more videos like `tf2U5Vyj0oU` to become embeddable
    See https://github.com/yt-dlp/yt-dlp/pull/575#issuecomment-888837000
* Also added tests for all types of age-gate

Closes #581
2021-07-31 02:20:21 +05:30
pukkandan
000c15a4ca [youtube] simplify and de-duplicate client definitions (#577) 2021-07-31 02:14:15 +05:30
colethedj
9275f62cf8 [youtube] Improve age-gate detection (#577)
Authored by: colethedj
2021-07-31 02:13:55 +05:30
coletdjnz
6552469433 [youtube] Force hl=en for comments (#594)
Closes #532
2021-07-31 01:06:00 +05:30
MinePlayersPE
11cc45718c [vidio] Fix login error detection (#582)
Authored by: MinePlayersPE
2021-07-29 10:11:05 +05:30
Ashish
fe07e2c69f [Hotstar] Support cookies (#584)
Closes #583 
Authored by: Ashish0804
2021-07-29 10:06:38 +05:30
Ashish
89ce723edd [Mxplayer] Add h265 formats (#572)
Authored by: Ashish0804
2021-07-29 09:57:09 +05:30
Sipherdrakon
45d1f15725 [dplay] Add ScienceChannelIE (#567)
Authored by: Sipherdrakon
2021-07-29 09:55:00 +05:30
rigstot
a318f59d14 [generic] Support KVS player (#549)
* Replaces the extractor for thisvid

Fixes: https://github.com/ytdl-org/youtube-dl/issues/2077
Authored-by: rigstot
2021-07-29 09:33:01 +05:30
pukkandan
7d1eb38af1 Add format types j, l, q for outtmpl
Closes #345
2021-07-29 08:47:25 +05:30
pukkandan
901130bbcf Expand and escape environment variables correctly in outtmpl
Fixes: https://www.reddit.com/r/youtubedl/comments/otfmq3/ytdlp_same_parameters_different_results
2021-07-29 08:38:18 +05:30
MinePlayersPE
c0bc527bca [YouTube] Age-gate bypass implementation (#575)
* Calling the API with `clientScreen=EMBED` allows access to most age-gated videos - discovered by @ccdffddfddfdsfedeee (https://github.com/yt-dlp/yt-dlp/issues/574#issuecomment-887171136)
* Adds clients: (web/android/ios)_(embedded/agegate), mweb_embedded
* Renamed mobile_web to mweb

Closes #574

Authored by pukkandan, MinePlayersPE
2021-07-27 15:10:44 +05:30
pukkandan
2a9c6dcd22 [youtube] Fix format sorting when using alternate clients 2021-07-26 03:50:13 +05:30
coletdjnz
5a1fc62b41 [youtube] Add mobile_web client (#557)
Authored by: colethedj
2021-07-26 03:48:36 +05:30
pukkandan
b4c055bac2 [youtube] Add player_client=all 2021-07-26 03:38:18 +05:30
pukkandan
ea05b3020d Remove asr appearing twice in -F 2021-07-26 03:38:15 +05:30
pukkandan
9536bc072d [bilibili] Improve _VALID_URL 2021-07-26 03:38:10 +05:30
Ashish
8242bf220d [HotStarSeriesIE] Fix regex (#569)
Authored by: Ashish0804
2021-07-25 22:43:43 +05:30
Ashish
4bfa401d40 [UtreonIE] Add extractor (#562)
Authored by: Ashish0804
2021-07-25 22:41:45 +05:30
nixxo
0222620725 [mediaset] Fix extraction (#564)
Closes #365
Authored by: nixxo
2021-07-24 20:06:55 +05:30
pukkandan
1fe3c4c27e [version] update
:ci skip all
2021-07-24 20:02:12 +05:30
80 changed files with 3530 additions and 1687 deletions

13
.github/FUNDING.yml vendored Normal file
View File

@@ -0,0 +1,13 @@
# These are supported funding model platforms
github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
custom: ['https://github.com/yt-dlp/yt-dlp/blob/master/Collaborators.md#collaborators']

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.07.21. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.08.02. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running yt-dlp version **2021.07.21**
- [ ] I've verified that I'm running yt-dlp version **2021.08.02**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@@ -44,7 +44,7 @@ Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your com
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.07.21
[debug] yt-dlp version 2021.08.02
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.07.21. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.08.02. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/yt-dlp/yt-dlp. yt-dlp does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -29,7 +29,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running yt-dlp version **2021.07.21**
- [ ] I've verified that I'm running yt-dlp version **2021.08.02**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -21,13 +21,13 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.07.21. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.08.02. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running yt-dlp version **2021.07.21**
- [ ] I've verified that I'm running yt-dlp version **2021.08.02**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -21,7 +21,7 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.07.21. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.08.02. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/yt-dlp/yt-dlp.
- Search the bugtracker for similar issues: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
@@ -30,7 +30,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running yt-dlp version **2021.07.21**
- [ ] I've verified that I'm running yt-dlp version **2021.08.02**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -46,7 +46,7 @@ Add the `-v` flag to your command line you run yt-dlp with (`yt-dlp -v <your com
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] yt-dlp version 2021.07.21
[debug] yt-dlp version 2021.08.02
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -21,13 +21,13 @@ assignees: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.07.21. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of yt-dlp. Run `yt-dlp --version` and ensure your version is 2021.08.02. If it's not, see https://github.com/yt-dlp/yt-dlp on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: https://github.com/yt-dlp/yt-dlp. DO NOT post duplicates.
- Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running yt-dlp version **2021.07.21**
- [ ] I've verified that I'm running yt-dlp version **2021.08.02**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

31
.github/banner.svg vendored Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 24 KiB

View File

@@ -103,7 +103,8 @@ jobs:
- name: Upgrade pip and enable wheel support
run: python -m pip install --upgrade pip setuptools wheel
- name: Install Requirements
run: pip install pyinstaller mutagen pycryptodome websockets
# Custom pyinstaller built with https://github.com/yt-dlp/pyinstaller-builds
run: pip install "https://yt-dlp.github.io/pyinstaller-builds/x86_64/pyinstaller-4.5.1-py3-none-any.whl" mutagen pycryptodome websockets
- name: Bump version
id: bump_version
run: python devscripts/update-version.py
@@ -147,7 +148,7 @@ jobs:
- name: Upgrade pip and enable wheel support
run: python -m pip install --upgrade pip setuptools wheel
- name: Install Requirements
run: pip install pyinstaller mutagen pycryptodome websockets
run: pip install "https://yt-dlp.github.io/pyinstaller-builds/i686/pyinstaller-4.5.1-py3-none-any.whl" mutagen pycryptodome websockets
- name: Bump version
id: bump_version
run: python devscripts/update-version.py

View File

@@ -1,6 +1,6 @@
pukkandan (owner)
shirt-dev (collaborator)
colethedj (collaborator)
coletdjnz/colethedj (collaborator)
Ashish0804 (collaborator)
h-h-h-h
pauldubois98
@@ -63,3 +63,18 @@ TpmKranz
mzbaulhaque
zackmark29
mbway
zerodytrash
wesnm
pento
rigstot
dirkf
funniray
Jessecar96
jhwgh1968
kikuyan
max-te
nchilada
pgaig
PSlava
stdedos
u-spec-png

View File

@@ -19,11 +19,108 @@
-->
### 2021.08.10
* Add option `--replace-in-metadata`
* Add option `--no-simulate` to not simulate even when `--print` or `--list...` are used - Deprecates `--print-json`
* Allow entire infodict to be printed using `%()s` - makes `--dump-json` redundant
* Allow multiple `--exec` and `--exec-before-download`
* Add regex to `--match-filter`
* Add all format filtering operators also to `--match-filter` by [max-te](https://github.com/max-te)
* Add compat-option `no-keep-subs`
* [adobepass] Add MSO Cablevision by [Jessecar96](https://github.com/Jessecar96)
* [BandCamp] Add BandcampMusicIE by [Ashish0804](https://github.com/Ashish0804)
* [blackboardcollaborate] Add new extractor by [Ashish0804](https://github.com/Ashish0804)
* [eroprofile] Add album downloader by [jhwgh1968](https://github.com/jhwgh1968)
* [mirrativ] Add extractors by [nao20010128nao](https://github.com/nao20010128nao)
* [openrec] Add extractors by [nao20010128nao](https://github.com/nao20010128nao)
* [nbcolympics:stream] Fix extractor by [nchilada](https://github.com/nchilada), [pukkandan](https://github.com/pukkandan)
* [nbcolympics] Update extractor for 2020 olympics by [wesnm](https://github.com/wesnm)
* [paramountplus] Separate extractor and fix some titles by [shirt](https://github.com/shirt-dev), [pukkandan](https://github.com/pukkandan)
* [RCTIPlus] Support events and TV by [MinePlayersPE](https://github.com/MinePlayersPE)
* [Newgrounds] Improve extractor and fix playlist by [u-spec-png](https://github.com/u-spec-png)
* [aenetworks] Update `_THEPLATFORM_KEY` and `_THEPLATFORM_SECRET` by [wesnm](https://github.com/wesnm)
* [crunchyroll] Fix thumbnail by [funniray](https://github.com/funniray)
* [HotStar] Use API for metadata and extract subtitles by [Ashish0804](https://github.com/Ashish0804)
* [instagram] Fix comments extraction by [u-spec-png](https://github.com/u-spec-png)
* [peertube] Fix videos without description by [u-spec-png](https://github.com/u-spec-png)
* [twitch:clips] Extract `display_id` by [dirkf](https://github.com/dirkf)
* [viki] Print error message from API request
* [Vine] Remove invalid formats by [u-spec-png](https://github.com/u-spec-png)
* [VrtNU] Fix XSRF token by [pgaig](https://github.com/pgaig)
* [vrv] Fix thumbnail extraction by [funniray](https://github.com/funniray)
* [youtube] Add extractor-arg `include-live-dash` to show live dash formats
* [youtube] Improve signature function detection by [PSlava](https://github.com/PSlava)
* [youtube] Raise appropriate error when API pages can't be downloaded
* Ensure `_write_ytdl_file` closes file handle on error
* Fix `--compat-options filename` by [stdedos](https://github.com/stdedos)
* Fix issues with infodict sanitization
* Fix resuming when using `--no-part`
* Fix wrong extension for intermediate files
* Handle `BrokenPipeError` by [kikuyan](https://github.com/kikuyan)
* Show libraries present in verbose head
* [extractor] Detect `sttp` as subtitles in MPD by [fstirlitz](https://github.com/fstirlitz)
* [extractor] Reset non-repeating warnings per video
* [ffmpeg] Fix streaming `mp4` to `stdout`
* [ffpmeg] Allow `--ffmpeg-location` to be a file with different name
* [utils] Fix `InAdvancePagedList.__getitem__`
* [utils] Fix `traverse_obj` depth when `is_user_input`
* [webvtt] Merge daisy-chained duplicate cues by [fstirlitz](https://github.com/fstirlitz)
* [build] Use custom build of `pyinstaller` by [shirt](https://github.com/shirt-dev)
* [tests:download] Add batch testing for extractors (`test_YourExtractor_all`)
* [docs] Document which fields `--add-metadata` adds to the file
* [docs] Fix some mistakes and improve doc
* [cleanup] Misc code cleanup
### 2021.08.02
* Add logo, banner and donate links
* Expand and escape environment variables correctly in output template
* Add format types `j` (json), `l` (comma delimited list), `q` (quoted for terminal) in output template
* [downloader] Allow streaming some unmerged formats to stdout using ffmpeg
* [youtube] **Age-gate bypass**
* Add `agegate` clients by [pukkandan](https://github.com/pukkandan), [MinePlayersPE](https://github.com/MinePlayersPE)
* Add `thirdParty` to agegate clients to bypass more videos
* Simplify client definitions, expose `embedded` clients
* Improve age-gate detection by [coletdjnz](https://github.com/coletdjnz)
* Fix default global API key by [coletdjnz](https://github.com/coletdjnz)
* Add `creator` clients for age-gate bypass using unverified accounts by [zerodytrash](https://github.com/zerodytrash), [coletdjnz](https://github.com/coletdjnz), [pukkandan](https://github.com/pukkandan)
* [adobepass] Add MSO Sling TV by [wesnm](https://github.com/wesnm)
* [CBS] Add ParamountPlusSeriesIE by [Ashish0804](https://github.com/Ashish0804)
* [dplay] Add `ScienceChannelIE` by [Sipherdrakon](https://github.com/Sipherdrakon)
* [UtreonIE] Add extractor by [Ashish0804](https://github.com/Ashish0804)
* [youtube] Add `mweb` client by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Add `player_client=all`
* [youtube] Force `hl=en` for comments by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Fix format sorting when using alternate clients
* [youtube] Misc cleanup by [pukkandan](https://github.com/pukkandan), [coletdjnz](https://github.com/coletdjnz)
* [youtube] Extract SAPISID only once
* [CBS] Add fallback by [llacb47](https://github.com/llacb47), [pukkandan](https://github.com/pukkandan)
* [Hotstar] Support cookies by [Ashish0804](https://github.com/Ashish0804)
* [HotStarSeriesIE] Fix regex by [Ashish0804](https://github.com/Ashish0804)
* [bilibili] Improve `_VALID_URL`
* [mediaset] Fix extraction by [nixxo](https://github.com/nixxo)
* [Mxplayer] Add h265 formats by [Ashish0804](https://github.com/Ashish0804)
* [RCTIPlus] Remove PhantomJS dependency by [MinePlayersPE](https://github.com/MinePlayersPE)
* [tenplay] Add MA15+ age limit by [pento](https://github.com/pento)
* [vidio] Fix login error detection by [MinePlayersPE](https://github.com/MinePlayersPE)
* [vimeo] Better extraction of original file by [Ashish0804](https://github.com/Ashish0804)
* [generic] Support KVS player (replaces ThisVidIE) by [rigstot](https://github.com/rigstot)
* Add compat-option `no-clean-infojson`
* Remove `asr` appearing twice in `-F`
* Set `home:` as the default key for `-P`
* [utils] Fix slicing of reversed `LazyList`
* [FormatSort] Fix bug for audio with unknown codec
* [test:download] Support testing with `ignore_no_formats_error`
* [cleanup] Refactor some code
### 2021.07.24
* [youtube:tab] Extract video duration early
* [downloader] Pass `info_dict` to `progress_hook`s
* [youtube] Fix age-gated videos for API clients when cookies are supplied by [colethedj](https://github.com/colethedj)
* [youtube] Fix age-gated videos for API clients when cookies are supplied by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Disable `get_video_info` age-gate workaround - This endpoint seems to be completely dead
* [youtube] Try all clients even if age-gated
* [youtube] Fix subtitles only being extracted from the first client
@@ -43,7 +140,7 @@
* [FFmpegMetadata] Add language of each stream and some refactoring
* [douyin] Add extractor by [pukkandan](https://github.com/pukkandan), [pyx](https://github.com/pyx)
* [pornflip] Add extractor by [mzbaulhaque](https://github.com/mzbaulhaque)
* **[youtube] Extract data from multiple clients** by [pukkandan](https://github.com/pukkandan), [colethedj](https://github.com/colethedj)
* **[youtube] Extract data from multiple clients** by [pukkandan](https://github.com/pukkandan), [coletdjnz](https://github.com/coletdjnz)
* `player_client` now accepts multiple clients
* Default `player_client` = `android,web`
* This uses twice as many requests, but avoids throttling for most videos while also not losing any formats
@@ -56,19 +153,19 @@
* [youtube] Misc fixes
* Improve extraction of livestream metadata by [pukkandan](https://github.com/pukkandan), [krichbanana](https://github.com/krichbanana)
* Hide live dash formats since they can't be downloaded anyway
* Fix authentication when using multiple accounts by [colethedj](https://github.com/colethedj)
* Fix controversial videos when requested via API by [colethedj](https://github.com/colethedj)
* Fix session index extraction and headers for non-web player clients by [colethedj](https://github.com/colethedj)
* Fix authentication when using multiple accounts by [coletdjnz](https://github.com/coletdjnz)
* Fix controversial videos when requested via API by [coletdjnz](https://github.com/coletdjnz)
* Fix session index extraction and headers for non-web player clients by [coletdjnz](https://github.com/coletdjnz)
* Make `--extractor-retries` work for more errors
* Fix sorting of 3gp format
* Sanity check `chapters` (and refactor related code)
* Make `parse_time_text` and `_extract_chapters` non-fatal
* Misc cleanup and bug fixes by [colethedj](https://github.com/colethedj)
* Misc cleanup and bug fixes by [coletdjnz](https://github.com/coletdjnz)
* [youtube:tab] Fix channels tab
* [youtube:tab] Extract playlist availability by [colethedj](https://github.com/colethedj)
* **[youtube:comments] Move comment extraction to new API** by [colethedj](https://github.com/colethedj)
* [youtube:tab] Extract playlist availability by [coletdjnz](https://github.com/coletdjnz)
* **[youtube:comments] Move comment extraction to new API** by [coletdjnz](https://github.com/coletdjnz)
* Adds extractor-args `comment_sort` (`top`/`new`), `max_comments`, `max_comment_depth`
* [youtube:comments] Fix `is_favorited`, improve `like_count` parsing by [colethedj](https://github.com/colethedj)
* [youtube:comments] Fix `is_favorited`, improve `like_count` parsing by [coletdjnz](https://github.com/coletdjnz)
* [BravoTV] Improve metadata extraction by [kevinoconnor7](https://github.com/kevinoconnor7)
* [crunchyroll:playlist] Force http
* [yahoo:gyao:player] Relax `_VALID_URL` by [nao20010128nao](https://github.com/nao20010128nao)
@@ -94,7 +191,7 @@
* [utils] Improve `js_to_json` comment regex by [fstirlitz](https://github.com/fstirlitz)
* [webtt] Fix timestamps
* [compat] Remove unnecessary code
* [doc] fix default of multistreams
* [docs] fix default of multistreams
### 2021.07.07
@@ -104,7 +201,7 @@
* Add extractor option `skip` for `youtube`. Eg: `--extractor-args youtube:skip=hls,dash`
* Deprecates `--youtube-skip-dash-manifest`, `--youtube-skip-hls-manifest`, `--youtube-include-dash-manifest`, `--youtube-include-hls-manifest`
* Allow `--list...` options to work with `--print`, `--quiet` and other `--list...` options
* [youtube] Use `player` API for additional video extraction requests by [colethedj](https://github.com/colethedj)
* [youtube] Use `player` API for additional video extraction requests by [coletdjnz](https://github.com/coletdjnz)
* **Fixes youtube premium music** (format 141) extraction
* Adds extractor option `player_client` = `web`/`android`
* **`--extractor-args youtube:player_client=android` works around the throttling** for the time-being
@@ -112,7 +209,7 @@
* Adds age-gate fallback using embedded client
* [youtube] Choose correct Live chat API for upcoming streams by [krichbanana](https://github.com/krichbanana)
* [youtube] Fix subtitle names for age-gated videos
* [youtube:comments] Fix error handling and add `itct` to params by [colethedj](https://github.com/colethedj)
* [youtube:comments] Fix error handling and add `itct` to params by [coletdjnz](https://github.com/coletdjnz)
* [youtube_live_chat] Fix download with cookies by [siikamiika](https://github.com/siikamiika)
* [youtube_live_chat] use `clickTrackingParams` by [siikamiika](https://github.com/siikamiika)
* [Funimation] Rewrite extractor
@@ -159,9 +256,9 @@
* [downloader/mhtml] Add new downloader for slideshows/storyboards by [fstirlitz](https://github.com/fstirlitz)
* [youtube] Temporary **fix for age-gate**
* [youtube] Support ongoing live chat by [siikamiika](https://github.com/siikamiika)
* [youtube] Improve SAPISID cookie handling by [colethedj](https://github.com/colethedj)
* [youtube] Improve SAPISID cookie handling by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Login is not needed for `:ytrec`
* [youtube] Non-fatal alert reporting for unavailable videos page by [colethedj](https://github.com/colethedj)
* [youtube] Non-fatal alert reporting for unavailable videos page by [coletdjnz](https://github.com/coletdjnz)
* [twitcasting] Websocket support by [nao20010128nao](https://github.com/nao20010128nao)
* [mediasite] Extract slides by [fstirlitz](https://github.com/fstirlitz)
* [funimation] Extract subtitles
@@ -219,7 +316,7 @@
* Merge youtube-dl: Upto [commit/d495292](https://github.com/ytdl-org/youtube-dl/commit/d495292852b6c2f1bd58bc2141ff2b0265c952cf)
* Pre-check archive and filters during playlist extraction
* Handle Basic Auth `user:pass` in URLs by [hhirtz](https://github.com/hhirtz) and [pukkandan](https://github.com/pukkandan)
* [archiveorg] Add YoutubeWebArchiveIE by [colethedj](https://github.com/colethedj) and [alex-gedeon](https://github.com/alex-gedeon)
* [archiveorg] Add YoutubeWebArchiveIE by [coletdjnz](https://github.com/coletdjnz) and [alex-gedeon](https://github.com/alex-gedeon)
* [fancode] Add extractor by [rhsmachine](https://github.com/rhsmachine)
* [patreon] Support vimeo embeds by [rhsmachine](https://github.com/rhsmachine)
* [Saitosan] Add new extractor by [llacb47](https://github.com/llacb47)
@@ -262,7 +359,7 @@
* **Youtube improvements**:
* Support youtube music `MP`, `VL` and `browse` pages
* Extract more formats for youtube music by [craftingmod](https://github.com/craftingmod), [colethedj](https://github.com/colethedj) and [pukkandan](https://github.com/pukkandan)
* Extract more formats for youtube music by [craftingmod](https://github.com/craftingmod), [coletdjnz](https://github.com/coletdjnz) and [pukkandan](https://github.com/pukkandan)
* Extract multiple subtitles in same language by [pukkandan](https://github.com/pukkandan) and [tpikonen](https://github.com/tpikonen)
* Redirect channels that doesn't have a `videos` tab to their `UU` playlists
* Support in-channel search
@@ -271,10 +368,10 @@
* Extract audio language
* Add subtitle language names by [nixxo](https://github.com/nixxo) and [tpikonen](https://github.com/tpikonen)
* Show alerts only from the final webpage
* Add `html5=1` param to `get_video_info` page requests by [colethedj](https://github.com/colethedj)
* Add `html5=1` param to `get_video_info` page requests by [coletdjnz](https://github.com/coletdjnz)
* Better message when login required
* **Add option `--print`**: to print any field/template
* Deprecates: `--get-description`, `--get-duration`, `--get-filename`, `--get-format`, `--get-id`, `--get-thumbnail`, `--get-title`, `--get-url`
* Makes redundant: `--get-description`, `--get-duration`, `--get-filename`, `--get-format`, `--get-id`, `--get-thumbnail`, `--get-title`, `--get-url`
* Field `additional_urls` to download additional videos from metadata using [`--parse-metadata`](https://github.com/yt-dlp/yt-dlp#modifying-metadata)
* Merge youtube-dl: Upto [commit/dfbbe29](https://github.com/ytdl-org/youtube-dl/commit/dfbbe2902fc67f0f93ee47a8077c148055c67a9b)
* Write thumbnail of playlist and add `pl_thumbnail` outtmpl key
@@ -368,11 +465,11 @@
* [TubiTv] Add TubiTvShowIE by [Ashish0804](https://github.com/Ashish0804)
* [twitcasting] Fix extractor
* [viu:ott] Fix extractor and support series by [lkho](https://github.com/lkho) and [pukkandan](https://github.com/pukkandan)
* [youtube:tab] Show unavailable videos in playlists by [colethedj](https://github.com/colethedj)
* [youtube:tab] Show unavailable videos in playlists by [coletdjnz](https://github.com/coletdjnz)
* [youtube:tab] Reload with unavailable videos for all playlists
* [youtube] Ignore invalid stretch ratio
* [youtube] Improve channel syncid extraction to support ytcfg by [colethedj](https://github.com/colethedj)
* [youtube] Standardize API calls for tabs, mixes and search by [colethedj](https://github.com/colethedj)
* [youtube] Improve channel syncid extraction to support ytcfg by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Standardize API calls for tabs, mixes and search by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Bugfix in `_extract_ytcfg`
* [mildom:user:vod] Download only necessary amount of pages
* [mildom] Remove proxy completely by [fstirlitz](https://github.com/fstirlitz)
@@ -384,8 +481,8 @@
* Improve the yt-dlp.sh script by [fstirlitz](https://github.com/fstirlitz)
* [lazy_extractor] Do not load plugins
* [ci] Disable fail-fast
* [documentation] Clarify which deprecated options still work
* [documentation] Fix typos
* [docs] Clarify which deprecated options still work
* [docs] Fix typos
### 2021.04.11
@@ -402,17 +499,17 @@
* [nitter] Fix extraction of reply tweets and update instance list by [B0pol](https://github.com/B0pol)
* [nitter] Fix thumbnails by [B0pol](https://github.com/B0pol)
* [youtube] Fix thumbnail URL
* [youtube] Parse API parameters from initial webpage by [colethedj](https://github.com/colethedj)
* [youtube] Extract comments' approximate timestamp by [colethedj](https://github.com/colethedj)
* [youtube] Parse API parameters from initial webpage by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Extract comments' approximate timestamp by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Fix alert extraction
* [bilibili] Fix uploader
* [utils] Add `datetime_from_str` and `datetime_add_months` by [colethedj](https://github.com/colethedj)
* [utils] Add `datetime_from_str` and `datetime_add_months` by [coletdjnz](https://github.com/coletdjnz)
* Run some `postprocessors` before actual download
* Improve argument parsing for `-P`, `-o`, `-S`
* Fix some `m3u8` not obeying `--allow-unplayable-formats`
* Fix default of `dynamic_mpd`
* Deprecate `--all-formats`, `--include-ads`, `--hls-prefer-native`, `--hls-prefer-ffmpeg`
* [documentation] Improvements
* [docs] Improvements
### 2021.04.03
* Merge youtube-dl: Upto [commit/654b4f4](https://github.com/ytdl-org/youtube-dl/commit/654b4f4ff2718f38b3182c1188c5d569c14cc70a)
@@ -423,10 +520,10 @@
* [mildom] Update extractor with current proxy by [nao20010128nao](https://github.com/nao20010128nao)
* [ard:mediathek] Fix video id extraction
* [generic] Detect Invidious' link element
* [youtube] Show premium state in `availability` by [colethedj](https://github.com/colethedj)
* [youtube] Show premium state in `availability` by [coletdjnz](https://github.com/coletdjnz)
* [viewsource] Add extractor to handle `view-source:`
* [sponskrub] Run before embedding thumbnail
* [documentation] Improve `--parse-metadata` documentation
* [docs] Improve `--parse-metadata` documentation
### 2021.03.24.1
@@ -458,8 +555,8 @@
* Use headers and cookies when downloading subtitles by [damianoamatruda](https://github.com/damianoamatruda)
* Parse resolution in info dictionary by [damianoamatruda](https://github.com/damianoamatruda)
* More consistent warning messages by [damianoamatruda](https://github.com/damianoamatruda) and [pukkandan](https://github.com/pukkandan)
* [documentation] Add deprecated options and aliases in readme
* [documentation] Fix some minor mistakes
* [docs] Add deprecated options and aliases in readme
* [docs] Fix some minor mistakes
* [niconico] Partial fix adapted from [animelover1984/youtube-dl@b5eff52](https://github.com/animelover1984/youtube-dl/commit/b5eff52dd9ed5565672ea1694b38c9296db3fade) (login and smile formats still don't work)
* [niconico] Add user extractor by [animelover1984](https://github.com/animelover1984)
@@ -468,7 +565,7 @@
* [stitcher] Merge from youtube-dl by [nixxo](https://github.com/nixxo)
* [rcs] Improved extraction by [nixxo](https://github.com/nixxo)
* [linuxacadamy] Improve regex
* [youtube] Show if video is `private`, `unlisted` etc in info (`availability`) by [colethedj](https://github.com/colethedj) and [pukkandan](https://github.com/pukkandan)
* [youtube] Show if video is `private`, `unlisted` etc in info (`availability`) by [coletdjnz](https://github.com/coletdjnz) and [pukkandan](https://github.com/pukkandan)
* [youtube] bugfix for channel playlist extraction
* [nbc] Improve metadata extraction by [2ShedsJackson](https://github.com/2ShedsJackson)
@@ -485,15 +582,15 @@
* [wimtv] Add extractor by [nixxo](https://github.com/nixxo)
* [mtv] Add mtv.it and extract series metadata by [nixxo](https://github.com/nixxo)
* [pluto.tv] Add extractor by [kevinoconnor7](https://github.com/kevinoconnor7)
* [youtube] Rewrite comment extraction by [colethedj](https://github.com/colethedj)
* [youtube] Rewrite comment extraction by [coletdjnz](https://github.com/coletdjnz)
* [embedthumbnail] Set mtime correctly
* Refactor some postprocessor/downloader code by [pukkandan](https://github.com/pukkandan) and [shirt](https://github.com/shirt-dev)
### 2021.03.07
* [youtube] Fix history, mixes, community pages and trending by [pukkandan](https://github.com/pukkandan) and [colethedj](https://github.com/colethedj)
* [youtube] Fix private feeds/playlists on multi-channel accounts by [colethedj](https://github.com/colethedj)
* [youtube] Extract alerts from continuation by [colethedj](https://github.com/colethedj)
* [youtube] Fix history, mixes, community pages and trending by [pukkandan](https://github.com/pukkandan) and [coletdjnz](https://github.com/coletdjnz)
* [youtube] Fix private feeds/playlists on multi-channel accounts by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Extract alerts from continuation by [coletdjnz](https://github.com/coletdjnz)
* [cbs] Add support for ParamountPlus by [shirt](https://github.com/shirt-dev)
* [mxplayer] Rewrite extractor with show support by [pukkandan](https://github.com/pukkandan) and [Ashish0804](https://github.com/Ashish0804)
* [gedi] Improvements from youtube-dl by [nixxo](https://github.com/nixxo)
@@ -505,7 +602,7 @@
* [downloader] Fix bug for `ffmpeg`/`httpie`
* [update] Fix updater removing the executable bit on some UNIX distros
* [update] Fix current build hash for UNIX
* [documentation] Include wget/curl/aria2c install instructions for Unix by [Ashish0804](https://github.com/Ashish0804)
* [docs] Include wget/curl/aria2c install instructions for Unix by [Ashish0804](https://github.com/Ashish0804)
* Fix some videos downloading with `m3u8` extension
* Remove "fixup is ignored" warning when fixup wasn't passed by user
@@ -514,7 +611,7 @@
* [build] Fix bug
### 2021.03.03
* [youtube] Use new browse API for continuation page extraction by [colethedj](https://github.com/colethedj) and [pukkandan](https://github.com/pukkandan)
* [youtube] Use new browse API for continuation page extraction by [coletdjnz](https://github.com/coletdjnz) and [pukkandan](https://github.com/pukkandan)
* Fix HLS playlist downloading by [shirt](https://github.com/shirt-dev)
* Merge youtube-dl: Upto [2021.03.03](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.03.03)
* [mtv] Fix extractor
@@ -562,7 +659,7 @@
* [ffmpeg] Allow passing custom arguments before -i using `--ppa "ffmpeg_i1:ARGS"` syntax
* Fix `--windows-filenames` removing `/` from UNIX paths
* [hls] Show warning if pycryptodome is not found
* [documentation] Improvements
* [docs] Improvements
* Fix documentation of `Extractor Options`
* Document `all` in format selection
* Document `playable_in_embed` in output templates
@@ -590,7 +687,7 @@
* Exclude `vcruntime140.dll` from UPX by [jbruchon](https://github.com/jbruchon)
* Set version number based on UTC time, not local time
* Publish on PyPi only if token is set
* [documentation] Better document `--prefer-free-formats` and add `--no-prefer-free-format`
* [docs] Better document `--prefer-free-formats` and add `--no-prefer-free-format`
### 2021.02.15
@@ -633,7 +730,7 @@
* [movefiles] Fix compatibility with python2
* [remuxvideo] Fix validation of conditional remux
* [sponskrub] Don't raise error when the video does not exist
* [documentation] Crypto is an optional dependency
* [docs] Crypto is an optional dependency
### 2021.02.04
@@ -694,10 +791,10 @@
* Merge youtube-dl: Upto [2021.01.24](https://github.com/ytdl-org/youtube-dl/releases/tag/2021.01.16)
* Plugin support ([documentation](https://github.com/yt-dlp/yt-dlp#plugins))
* **Multiple paths**: New option `-P`/`--paths` to give different paths for different types of files
* The syntax is `-P "type:path" -P "type:path"` ([documentation](https://github.com/yt-dlp/yt-dlp#:~:text=-P,%20--paths%20TYPE:PATH))
* The syntax is `-P "type:path" -P "type:path"`
* Valid types are: home, temp, description, annotation, subtitle, infojson, thumbnail
* Additionally, configuration file is taken from home directory or current directory ([documentation](https://github.com/yt-dlp/yt-dlp#:~:text=Home%20Configuration))
* Allow passing different arguments to different external downloaders ([documentation](https://github.com/yt-dlp/yt-dlp#:~:text=--downloader-args%20NAME:ARGS))
* Additionally, configuration file is taken from home directory or current directory
* Allow passing different arguments to different external downloaders
* [mildom] Add extractor by [nao20010128nao](https://github.com/nao20010128nao)
* Warn when using old style `--external-downloader-args` and `--post-processor-args`
* Fix `--no-overwrite` when using `--write-link`
@@ -732,9 +829,9 @@
* [roosterteeth.com] Fix for bonus episodes by [Zocker1999NET](https://github.com/Zocker1999NET)
* [tiktok] Fix for when share_info is empty
* [EmbedThumbnail] Fix bug due to incorrect function name
* [documentation] Changed sponskrub links to point to [yt-dlp/SponSkrub](https://github.com/yt-dlp/SponSkrub) since I am now providing both linux and windows releases
* [documentation] Change all links to correctly point to new fork URL
* [documentation] Fixes typos
* [docs] Changed sponskrub links to point to [yt-dlp/SponSkrub](https://github.com/yt-dlp/SponSkrub) since I am now providing both linux and windows releases
* [docs] Change all links to correctly point to new fork URL
* [docs] Fixes typos
### 2021.01.12
@@ -830,7 +927,7 @@
* Redirect channel home to /video
* Print youtube's warning message
* Handle Multiple pages for feeds better
* [youtube] Fix ytsearch not returning results sometimes due to promoted content by [colethedj](https://github.com/colethedj)
* [youtube] Fix ytsearch not returning results sometimes due to promoted content by [coletdjnz](https://github.com/coletdjnz)
* [youtube] Temporary fix for automatic captions - disable json3 by [blackjack4494](https://github.com/blackjack4494)
* Add --break-on-existing by [gergesh](https://github.com/gergesh)
* Pre-check video IDs in the archive before downloading by [pukkandan](https://github.com/pukkandan)

39
Collaborators.md Normal file
View File

@@ -0,0 +1,39 @@
# Collaborators
This is a list of the collaborators of the project and their major contributions. See the [Changelog](Changelog.md) for more details.
You can also find lists of all [contributors of yt-dlp](CONTRIBUTORS) and [authors of youtube-dl](https://github.com/ytdl-org/youtube-dl/blob/master/AUTHORS)
## [pukkandan](https://github.com/pukkandan)
[![ko-fi](https://img.shields.io/badge/_-Ko--fi-red.svg?logo=kofi&labelColor=555555&style=for-the-badge)](https://ko-fi.com/pukkandan)
* Owner of the fork
## [shirt](https://github.com/shirt-dev)
[![ko-fi](https://img.shields.io/badge/_-Ko--fi-red.svg?logo=kofi&labelColor=555555&style=for-the-badge)](https://ko-fi.com/shirt)
* Multithreading (`-N`) and aria2c support for fragment downloads
* Support for media initialization and discontinuity in HLS
* The self-updater (`-U`)
## [coletdjnz](https://github.com/coletdjnz)
[![gh-sponsor](https://img.shields.io/badge/_-Sponsor-red.svg?logo=githubsponsors&labelColor=555555&style=for-the-badge)](https://github.com/sponsors/coletdjnz)
* YouTube improvements including: age-gate bypass, private playlists, multiple-clients (to avoid throttling) and a lot of under-the-hood improvements
## [Ashish0804](https://github.com/Ashish0804)
[![ko-fi](https://img.shields.io/badge/_-Ko--fi-red.svg?logo=kofi&labelColor=555555&style=for-the-badge)](https://ko-fi.com/ashish0804)
* Added support for new websites Zee5, MXPlayer, DiscoveryPlusIndia, ShemarooMe, Utreon etc
* Added playlist/series downloads for TubiTv, SonyLIV, Voot, HotStar etc

View File

@@ -13,7 +13,7 @@ pypi-files: AUTHORS Changelog.md LICENSE README.md README.txt supportedsites com
.PHONY: all clean install test tar pypi-files completions ot offlinetest codetest supportedsites
clean-test:
rm -rf *.dump *.part* *.ytdl *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.ape *.swf *.jpg *.png *.frag *.frag.urls *.frag.aria2 test/testdata/player-*.js
rm -rf *.dump *.part* *.ytdl *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.ape *.swf *.jpg *.png *.frag *.frag.urls *.frag.aria2 test/testdata/player-*.js *.opus *.webp *.ttml *.vtt *.jpeg
clean-dist:
rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ yt_dlp/extractor/lazy_extractors.py *.spec CONTRIBUTING.md.tmp yt-dlp yt-dlp.exe yt_dlp.egg-info/ AUTHORS .mailmap
clean-cache:

223
README.md
View File

@@ -1,17 +1,16 @@
<div align="center">
# YT-DLP
A command-line program to download videos from YouTube and many other [video platforms](supportedsites.md)
[![YT-DLP](https://raw.githubusercontent.com/yt-dlp/yt-dlp/master/.github/banner.svg)](#readme)
<!-- GHA doesn't have for-the-badge style
[![CI Status](https://github.com/yt-dlp/yt-dlp/workflows/Core%20Tests/badge.svg?branch=master)](https://github.com/yt-dlp/yt-dlp/actions)
-->
[![Release version](https://img.shields.io/github/v/release/yt-dlp/yt-dlp?color=brightgreen&label=Release&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/releases/latest)
[![License: Unlicense](https://img.shields.io/badge/License-Unlicense-blue.svg?style=for-the-badge)](LICENSE)
[![Release version](https://img.shields.io/github/v/release/yt-dlp/yt-dlp?color=blue&label=&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/releases/latest)
[![CI Status](https://img.shields.io/github/workflow/status/yt-dlp/yt-dlp/Core%20Tests/master?label=&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/actions)
[![License: Unlicense](https://img.shields.io/badge/-Unlicense-blue.svg?style=for-the-badge)](LICENSE)
[![Donate](https://img.shields.io/badge/_-Donate-red.svg?logo=githubsponsors&labelColor=555555&style=for-the-badge)](Collaborators.md#collaborators)
[![Supported Sites](https://img.shields.io/badge/-Supported_Sites-brightgreen.svg?style=for-the-badge)](supportedsites.md)
[![Discord](https://img.shields.io/discord/807245652072857610?color=blue&label=&logo=discord&style=for-the-badge)](https://discord.gg/H5MNcFW63r)
[![Doc Status](https://readthedocs.org/projects/yt-dlp/badge/?version=latest&style=for-the-badge)](https://yt-dlp.readthedocs.io)
[![Discord](https://img.shields.io/discord/807245652072857610?color=blue&label=discord&logo=discord&style=for-the-badge)](https://discord.gg/H5MNcFW63r)
[![Commits](https://img.shields.io/github/commit-activity/m/yt-dlp/yt-dlp?label=commits&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/commits)
[![Last Commit](https://img.shields.io/github/last-commit/yt-dlp/yt-dlp/master?style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/commits)
[![Last Commit](https://img.shields.io/github/last-commit/yt-dlp/yt-dlp/master?label=&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/commits)
[![Downloads](https://img.shields.io/github/downloads/yt-dlp/yt-dlp/total?style=for-the-badge&color=blue)](https://github.com/yt-dlp/yt-dlp/releases/latest)
[![PyPi Downloads](https://img.shields.io/pypi/dm/yt-dlp?label=PyPi&style=for-the-badge)](https://pypi.org/project/yt-dlp)
@@ -72,9 +71,10 @@ The major new features from the latest release of [blackjack4494/yt-dlc](https:/
* **Merged with animelover1984/youtube-dl**: You get most of the features and improvements from [animelover1984/youtube-dl](https://github.com/animelover1984/youtube-dl) including `--write-comments`, `BiliBiliSearch`, `BilibiliChannel`, Embedding thumbnail in mp4/ogg/opus, playlist infojson etc. Note that the NicoNico improvements are not available. See [#31](https://github.com/yt-dlp/yt-dlp/pull/31) for details.
* **Youtube improvements**:
* All Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`) supports downloading multiple pages of content
* All Feeds (`:ytfav`, `:ytwatchlater`, `:ytsubs`, `:ythistory`, `:ytrec`) and private playlists supports downloading multiple pages of content
* Search (`ytsearch:`, `ytsearchdate:`), search URLs and in-channel search works
* Mixes supports downloading multiple pages of content
* Most (but not all) age-gated content can be downloaded without cookies
* Partial workaround for throttling issue
* Redirect channel's home URL automatically to `/video` to preserve the old behaviour
* `255kbps` audio is extracted from youtube music if premium cookies are given
@@ -88,9 +88,9 @@ The major new features from the latest release of [blackjack4494/yt-dlc](https:/
* **Aria2c with HLS/DASH**: You can use `aria2c` as the external downloader for DASH(mpd) and HLS(m3u8) formats
* **New extractors**: AnimeLab, Philo MSO, Spectrum MSO, Rcs, Gedi, bitwave.tv, mildom, audius, zee5, mtv.it, wimtv, pluto.tv, niconico users, discoveryplus.in, mediathek, NFHSNetwork, nebula, ukcolumn, whowatch, MxplayerShow, parlview (au), YoutubeWebArchive, fancode, Saitosan, ShemarooMe, telemundo, VootSeries, SonyLIVSeries, HotstarSeries, VidioPremier, VidioLive, RCTIPlus, TBS Live, douyin, pornflip
* **New extractors**: AnimeLab, Philo MSO, Spectrum MSO, SlingTV MSO, Cablevision MSO, Rcs, Gedi, bitwave.tv, mildom, audius, zee5, mtv.it, wimtv, pluto.tv, niconico users, discoveryplus.in, mediathek, NFHSNetwork, nebula, ukcolumn, whowatch, MxplayerShow, parlview (au), YoutubeWebArchive, fancode, Saitosan, ShemarooMe, telemundo, VootSeries, SonyLIVSeries, HotstarSeries, VidioPremier, VidioLive, RCTIPlus, TBS Live, douyin, pornflip, ParamountPlusSeries, ScienceChannel, Utreon, OpenRec, BandcampMusic, blackboardcollaborate, eroprofile albums, mirrativ
* **Fixed extractors**: archive.org, roosterteeth.com, skyit, instagram, itv, SouthparkDe, spreaker, Vlive, akamai, ina, rumble, tennistv, amcnetworks, la7 podcasts, linuxacadamy, nitter, twitcasting, viu, crackle, curiositystream, mediasite, rmcdecouverte, sonyliv, tubi, tenplay, patreon, videa, yahoo, BravoTV, crunchyroll playlist, RTP, viki
* **Fixed/improved extractors**: archive.org, roosterteeth.com, skyit, instagram, itv, SouthparkDe, spreaker, Vlive, akamai, ina, rumble, tennistv, amcnetworks, la7 podcasts, linuxacadamy, nitter, twitcasting, viu, crackle, curiositystream, mediasite, rmcdecouverte, sonyliv, tubi, tenplay, patreon, videa, yahoo, BravoTV, crunchyroll playlist, RTP, viki, Hotstar, vidio, vimeo, mediaset, Mxplayer, nbcolympics, ParamountPlus, Newgrounds,
* **Subtitle extraction from manifests**: Subtitles can be extracted from streaming media manifests. See [commit/be6202f](https://github.com/yt-dlp/yt-dlp/commit/be6202f12b97858b9d716e608394b51065d0419f) for details
@@ -98,11 +98,11 @@ The major new features from the latest release of [blackjack4494/yt-dlc](https:/
* **Portable Configuration**: Configuration files are automatically loaded from the home and root directories. See [configuration](#configuration) for details
* **Output template improvements**: Output templates can now have date-time formatting, numeric offsets, object traversal etc. See [output template](#output-template) for details. Even more advanced operations can also be done with the help of `--parse-metadata`
* **Output template improvements**: Output templates can now have date-time formatting, numeric offsets, object traversal etc. See [output template](#output-template) for details. Even more advanced operations can also be done with the help of `--parse-metadata` and `--replace-in-metadata`
* **Other new options**: `--sleep-requests`, `--convert-thumbnails`, `--write-link`, `--force-download-archive`, `--force-overwrites`, `--break-on-reject` etc
* **Other new options**: `--print`, `--sleep-requests`, `--convert-thumbnails`, `--write-link`, `--force-download-archive`, `--force-overwrites`, `--break-on-reject` etc
* **Improvements**: Multiple `--postprocessor-args` and `--downloader-args`, faster archive checking, more [format selection options](#format-selection) etc
* **Improvements**: Regex and other operators in `--match-filter`, multiple `--postprocessor-args` and `--downloader-args`, faster archive checking, more [format selection options](#format-selection) etc
* **Plugin extractors**: Extractors can be loaded from an external file. See [plugins](#plugins) for details
@@ -123,7 +123,7 @@ Some of yt-dlp's default options are different from that of youtube-dl and youtu
* The options `--id`, `--auto-number` (`-A`), `--title` (`-t`) and `--literal` (`-l`), no longer work. See [removed options](#Removed) for details
* `avconv` is not supported as as an alternative to `ffmpeg`
* The default [output template](#output-template) is `%(title)s [%(id)s].%(ext)s`. There is no real reason for this change. This was changed before yt-dlp was ever made public and now there are no plans to change it back to `%(title)s.%(id)s.%(ext)s`. Instead, you may use `--compat-options filename`
* The default [format sorting](sorting-formats) is different from youtube-dl and prefers higher resolution and better codecs rather than higher bitrates. You can use the `--format-sort` option to change this to any order you prefer, or use `--compat-options format-sort` to use youtube-dl's sorting order
* The default [format sorting](#sorting-formats) is different from youtube-dl and prefers higher resolution and better codecs rather than higher bitrates. You can use the `--format-sort` option to change this to any order you prefer, or use `--compat-options format-sort` to use youtube-dl's sorting order
* The default format selector is `bv*+ba/b`. This means that if a combined video + audio format that is better than the best video-only format is found, the former will be prefered. Use `-f bv+ba/b` or `--compat-options format-spec` to revert this
* Unlike youtube-dlc, yt-dlp does not allow merging multiple audio/video streams into one file by default (since this conflicts with the use of `-f bv*+ba`). If needed, this feature must be enabled using `--audio-multistreams` and `--video-multistreams`. You can also use `--compat-options multistreams` to enable both
* `--ignore-errors` is enabled by default. Use `--abort-on-error` or `--compat-options abort-on-error` to abort on errors instead
@@ -137,6 +137,8 @@ Some of yt-dlp's default options are different from that of youtube-dl and youtu
* Unavailable videos are also listed for youtube playlists. Use `--compat-options no-youtube-unavailable-videos` to remove this
* If `ffmpeg` is used as the downloader, the downloading and merging of formats happen in a single step when possible. Use `--compat-options no-direct-merge` to revert this
* Thumbnail embedding in `mp4` is done with mutagen if possible. Use `--compat-options embed-thumbnail-atomicparsley` to force the use of AtomicParsley instead
* Some private fields such as filenames are removed by default from the infojson. Use `--no-clean-infojson` or `--compat-options no-clean-infojson` to revert this
* When `--embed-subs` and `--write-subs` are used together, the subtitles are written to disk and also embedded in the media file. You can use just `--embed-subs` to embed the subs and automatically delete the seperate file. See [#630 (comment)](https://github.com/yt-dlp/yt-dlp/issues/630#issuecomment-893659460) for more info. `--compat-options no-keep-subs` can be used to revert this.
For ease of use, a few more compat options are available:
* `--compat-options all`: Use all compat options
@@ -238,10 +240,10 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
(default) (Alias: --no-abort-on-error)
--abort-on-error Abort downloading of further videos if an
error occurs (Alias: --no-ignore-errors)
--dump-user-agent Display the current browser identification
--list-extractors List all supported extractors
--dump-user-agent Display the current user-agent and exit
--list-extractors List all supported extractors and exit
--extractor-descriptions Output descriptions of all supported
extractors
extractors and exit
--force-generic-extractor Force extraction to use the generic
extractor
--default-search PREFIX Use this prefix for unqualified URLs. For
@@ -337,25 +339,24 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
COUNT views
--max-views COUNT Do not download any videos with more than
COUNT views
--match-filter FILTER Generic video filter. Specify any key (see
"OUTPUT TEMPLATE" for a list of available
keys) to match if the key is present, !key
to check if the key is not present,
key>NUMBER (like "view_count > 12", also
works with >=, <, <=, !=, =) to compare
against a number, key = 'LITERAL' (like
"uploader = 'Mike Smith'", also works with
!=) to match against a string literal and &
to require multiple matches. Values which
are not known are excluded unless you put a
question mark (?) after the operator. For
example, to only match videos that have
been liked more than 100 times and disliked
less than 50 times (or the dislike
functionality is not available at the given
service), but who also have a description,
use --match-filter "like_count > 100 &
dislike_count <? 50 & description"
--match-filter FILTER Generic video filter. Any field (see
"OUTPUT TEMPLATE") can be compared with a
number or a string using the operators
defined in "Filtering formats". You can
also simply specify a field to match if the
field is present and "!field" to check if
the field is not present. In addition,
Python style regular expression matching
can be done using "~=", and multiple
filters can be checked with "&". Use a "\"
to escape "&" or quotes if needed. Eg:
--match-filter "!is_live & like_count>?100
& description~=\'(?i)\bcats \& dogs\b\'"
matches only videos that are not live, has
a like count more than 100 (or the like
field is not available), and also has a
description that contains the phrase "cats
& dogs" (ignoring case)
--no-match-filter Do not use generic video filter (default)
--no-playlist Download only the video, if the URL refers
to a video and a playlist
@@ -448,17 +449,17 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
stdin), one URL per line. Lines starting
with '#', ';' or ']' are considered as
comments and ignored
-P, --paths TYPES:PATH The paths where the files should be
-P, --paths [TYPES:]PATH The paths where the files should be
downloaded. Specify the type of file and
the path separated by a colon ":". All the
same types as --output are supported.
Additionally, you can also provide "home"
and "temp" paths. All intermediary files
are first downloaded to the temp path and
then the final files are moved over to the
home path after download is finished. This
option is ignored if --output is an
absolute path
(default) and "temp" paths. All
intermediary files are first downloaded to
the temp path and then the final files are
moved over to the home path after download
is finished. This option is ignored if
--output is an absolute path
-o, --output [TYPES:]TEMPLATE Output filename template; see "OUTPUT
TEMPLATE" for details
--output-na-placeholder TEXT Placeholder value for unavailable meta
@@ -550,8 +551,8 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--no-write-thumbnail Do not write thumbnail image to disk
(default)
--write-all-thumbnails Write all thumbnail image formats to disk
--list-thumbnails Simulate and list all available thumbnail
formats
--list-thumbnails List available thumbnails of each video.
Simulate unless --no-simulate is used
## Internet Shortcut Options:
--write-link Write an internet shortcut file, depending
@@ -563,30 +564,34 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--write-desktop-link Write a .desktop Linux internet shortcut
## Verbosity and Simulation Options:
-q, --quiet Activate quiet mode
-q, --quiet Activate quiet mode. If used with
--verbose, print the log to stderr
--no-warnings Ignore warnings
-s, --simulate Do not download the video and do not write
anything to disk
--no-simulate Download the video even if printing/listing
options are used
--ignore-no-formats-error Ignore "No video formats" error. Usefull
for extracting metadata even if the video
is not actually available for download
for extracting metadata even if the videos
are not actually available for download
(experimental)
--no-ignore-no-formats-error Throw error when no downloadable video
formats are found (default)
--skip-download Do not download the video but write all
related files (Alias: --no-download)
-O, --print TEMPLATE Simulate, quiet but print the given fields.
Either a field name or similar formatting
as the output template can be used
-j, --dump-json Simulate, quiet but print JSON information.
See "OUTPUT TEMPLATE" for a description of
available keys
-J, --dump-single-json Simulate, quiet but print JSON information
for each command-line argument. If the URL
refers to a playlist, dump the whole
playlist information in a single line
--print-json Be quiet and print the video information as
JSON (video is still being downloaded)
-O, --print TEMPLATE Quiet, but print the given fields for each
video. Simulate unless --no-simulate is
used. Either a field name or same syntax as
the output template can be used
-j, --dump-json Quiet, but print JSON information for each
video. Simulate unless --no-simulate is
used. See "OUTPUT TEMPLATE" for a
description of available keys
-J, --dump-single-json Quiet, but print JSON information for each
url or infojson passed. Simulate unless
--no-simulate is used. If the URL refers to
a playlist, the whole playlist information
is dumped in a single line
--force-write-archive Force download archive entries to be
written as far as no errors occur, even if
-s or another simulation option is used
@@ -657,8 +662,8 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
actually downloadable
--no-check-formats Do not check that the formats selected are
actually downloadable
-F, --list-formats List all available formats of requested
videos
-F, --list-formats List available formats of each video.
Simulate unless --no-simulate is used
--merge-output-format FORMAT If a merge is required (e.g.
bestvideo+bestaudio), output to given
container format. One of mkv, mp4, ogg,
@@ -676,7 +681,8 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
(Alias: --write-automatic-subs)
--no-write-auto-subs Do not write auto-generated subtitles
(default) (Alias: --no-write-automatic-subs)
--list-subs List all available subtitles for the video
--list-subs List available subtitles of each video.
Simulate unless --no-simulate is used
--sub-format FORMAT Subtitle format, accepts formats
preference, for example: "srt" or
"ass/srt/best"
@@ -711,7 +717,7 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--audio-format FORMAT Specify audio format to convert the audio
to when -x is used. Currently supported
formats are: best (default) or one of
aac|flac|mp3|m4a|opus|vorbis|wav
best|aac|flac|mp3|m4a|opus|vorbis|wav
--audio-quality QUALITY Specify ffmpeg audio quality, insert a
value between 0 (better) and 9 (worse) for
VBR or a specific bitrate like 128K
@@ -771,6 +777,10 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
--parse-metadata FROM:TO Parse additional metadata like title/artist
from other fields; see "MODIFYING METADATA"
for details
--replace-in-metadata FIELDS REGEX REPLACE
Replace text in a metadata field using the
given regex. This option can be used
multiple times
--xattrs Write metadata to the video file's xattrs
(using dublin core and xdg standards)
--fixup POLICY Automatically correct known faults of the
@@ -783,16 +793,22 @@ Then simply run `make`. You can also run `make yt-dlp` instead to compile only t
path to the binary or its containing
directory
--exec CMD Execute a command on the file after
downloading and post-processing. Similar
syntax to the output template can be used
downloading and post-processing. Same
syntax as the output template can be used
to pass any field as arguments to the
command. An additional field "filepath"
that contains the final path of the
downloaded file is also available. If no
fields are passed, "%(filepath)s" is
appended to the end of the command
fields are passed, %(filepath)q is appended
to the end of the command. This option can
be used multiple times
--no-exec Remove any previously defined --exec
--exec-before-download CMD Execute a command before the actual
download. The syntax is the same as --exec
but "filepath" is not available. This
option can be used multiple times
--no-exec-before-download Remove any previously defined
--exec-before-download
--convert-subs FORMAT Convert the subtitles to another format
(currently supported: srt|vtt|ass|lrc)
(Alias: --convert-subtitles)
@@ -917,10 +933,11 @@ The simplest usage of `-o` is not to set any template arguments when downloading
It may however also contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations.
The field names themselves (the part inside the parenthesis) can also have some special formatting:
1. **Object traversal**: The dictionaries and lists available in metadata can be traversed by using a `.` (dot) separator. You can also do python slicing using `:`. Eg: `%(tags.0)s`, `%(subtitles.en.-1.ext)`, `%(id.3:7:-1)s`. Note that the fields that become available using this method are not listed below. Use `-j` to see such fields
1. **Object traversal**: The dictionaries and lists available in metadata can be traversed by using a `.` (dot) separator. You can also do python slicing using `:`. Eg: `%(tags.0)s`, `%(subtitles.en.-1.ext)s`, `%(id.3:7:-1)s`, `%(formats.:.format_id)s`. `%()s` refers to the entire infodict. Note that all the fields that become available using this method are not listed below. Use `-j` to see such fields
1. **Addition**: Addition and subtraction of numeric fields can be done using `+` and `-` respectively. Eg: `%(playlist_index+10)03d`, `%(n_entries+1-playlist_index)d`
1. **Date/time Formatting**: Date/time fields can be formatted according to [strftime formatting](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) by specifying it separated from the field name using a `>`. Eg: `%(duration>%H-%M-%S)s`, `%(upload_date>%Y-%m-%d)s`, `%(epoch-3600>%H-%M-%S)s`
1. **Default**: A default value can be specified for when the field is empty using a `|` seperator. This overrides `--output-na-template`. Eg: `%(uploader|Unknown)s`
1. **More Conversions**: In addition to the normal format types `diouxXeEfFgGcrs`, `j`, `l`, `q` can be used for converting to **j**son, a comma seperated **l**ist and a string **q**uoted for the terminal respectively
To summarize, the general syntax for a field is:
```
@@ -957,7 +974,7 @@ The available fields are:
- `average_rating` (numeric): Average rating give by users, the scale used depends on the webpage
- `comment_count` (numeric): Number of comments on the video (For some extractors, comments are only downloaded at the end, and so this field cannot be used)
- `age_limit` (numeric): Age restriction for the video (years)
- `live_status` (string): One of 'is_live', 'was_live', 'upcoming', 'not_live'
- `live_status` (string): One of 'is_live', 'was_live', 'is_upcoming', 'not_live'
- `is_live` (boolean): Whether this video is a live stream or a fixed-length video
- `was_live` (boolean): Whether this video was originally a live stream
- `playable_in_embed` (string): Whether this video is allowed to play in embedded players on other sites
@@ -1320,13 +1337,39 @@ $ yt-dlp -S '+res:480,codec,br'
# MODIFYING METADATA
The metadata obtained the the extractors can be modified by using `--parse-metadata FROM:TO`. The general syntax is to give the name of a field or a template (with similar syntax to [output template](#output-template)) to extract data from, and the format to interpret it as, separated by a colon `:`. Either a [python regular expression](https://docs.python.org/3/library/re.html#regular-expression-syntax) with named capture groups or a similar syntax to the [output template](#output-template) (only `%(field)s` formatting is supported) can be used for `TO`. The option can be used multiple times to parse and modify various fields.
The metadata obtained the the extractors can be modified by using `--parse-metadata` and `--replace-in-metadata`
`--replace-in-metadata FIELDS REGEX REPLACE` is used to replace text in any metadata field using [python regular expression](https://docs.python.org/3/library/re.html#regular-expression-syntax). [Backreferences](https://docs.python.org/3/library/re.html?highlight=backreferences#re.sub) can be used in the replace string for advanced use.
The general syntax of `--parse-metadata FROM:TO` is to give the name of a field or a template (with same syntax as [output template](#output-template)) to extract data from, and the format to interpret it as, separated by a colon `:`. Either a [python regular expression](https://docs.python.org/3/library/re.html#regular-expression-syntax) with named capture groups or a similar syntax to the [output template](#output-template) (only `%(field)s` formatting is supported) can be used for `TO`. The option can be used multiple times to parse and modify various fields.
Note that any field created by this can be used in the [output template](#output-template) and will also affect the media file's metadata added when using `--add-metadata`.
This option also has a few special uses:
* You can use this to change the metadata that is embedded in the media file. To do this, set the value of the corresponding field with a `meta_` prefix. For example, any value you set to `meta_description` field will be added to the `description` field in the file. You can use this to set a different "description" and "synopsis", for example
* You can download an additional URL based on the metadata of the currently downloaded video. To do this, set the field `additional_urls` to the URL that you want to download. Eg: `--parse-metadata "description:(?P<additional_urls>https?://www\.vimeo\.com/\d+)` will download the first vimeo video found in the description
* You can use this to change the metadata that is embedded in the media file. To do this, set the value of the corresponding field with a `meta_` prefix. For example, any value you set to `meta_description` field will be added to the `description` field in the file. For example, you can use this to set a different "description" and "synopsis"
For reference, these are the fields yt-dlp adds by default to the file metadata:
Metadata fields|From
:---|:---
`title`|`track` or `title`
`date`|`upload_date`
`description`, `synopsis`|`description`
`purl`, `comment`|`webpage_url`
`track`|`track_number`
`artist`|`artist`, `creator`, `uploader` or `uploader_id`
`genre`|`genre`
`album`|`album`
`album_artist`|`album_artist`
`disc`|`disc_number`
`show`|`series`
`season_number`|`season_number`
`episode_id`|`episode` or `episode_id`
`episode_sort`|`episode_number`
`language` of each stream|From the format's `language`
**Note**: The file format may not support some of these fields
## Modifying metadata examples
@@ -1345,20 +1388,24 @@ $ yt-dlp --parse-metadata '%(series)s S%(season_number)02dE%(episode_number)02d:
# Set "comment" field in video metadata using description instead of webpage_url
$ yt-dlp --parse-metadata 'description:(?s)(?P<meta_comment>.+)' --add-metadata
# Replace all spaces and "_" in title and uploader with a `-`
$ yt-dlp --replace-in-metadata 'title,uploader' '[ _]' '-'
```
# EXTRACTOR ARGUMENTS
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) seperated string of `ARG=VAL1,VAL2`. Eg: `--extractor-args "youtube:skip=dash,hls;player_client=android" --extractor-args "funimation:version=uncut"`
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) seperated string of `ARG=VAL1,VAL2`. Eg: `--extractor-args "youtube:player_client=android_agegate,web;include_live_dash" --extractor-args "funimation:version=uncut"`
The following extractors use this feature:
* **youtube**
* `skip`: `hls` or `dash` (or both) to skip download of the respective manifests
* `player_client`: Clients to extract video data from - one or more of `web`, `android`, `ios`, `web_music`, `android_music`, `ios_music`. By default, `android,web` is used. If the URL is from `music.youtube.com`, `android,web,android_music,web_music` is used
* `player_client`: Clients to extract video data from. The main clients are `web`, `android`, `ios`, `mweb`. These also have `_music`, `_embedded`, `_agegate`, and `_creator` variants (Eg: `web_embedded`) (`mweb` has only `_agegate`). By default, `android,web` is used, but the agegate and creator variants are added as required for age-gated videos. Similarly the music variants are added for `music.youtube.com` urls. You can also use `all` to use all the clients
* `player_skip`: `configs` - skip any requests for client configs and use defaults
* `include_live_dash`: Include live dash formats (These formats don't download properly)
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side).
* `max_comments`: maximum amount of comments to download (default all).
* `max_comment_depth`: maximum depth for nested comments. YouTube supports depths 1 or 2 (default).
* `max_comments`: Maximum amount of comments to download (default all).
* `max_comment_depth`: Maximum depth for nested comments. YouTube supports depths 1 or 2 (default).
* **funimation**
* `language`: Languages to extract. Eg: `funimation:language=english,japanese`
@@ -1380,8 +1427,8 @@ Plugins are loaded from `<root-dir>/ytdlp_plugins/<type>/__init__.py`. Currently
These are all the deprecated options and the current alternative to achieve the same effect
#### Not recommended
While these options still work, their use is not recommended since there are other alternatives to achieve the same
#### Redundant options
While these options are redundant, they are still expected to be used due to their ease of use
--get-description --print description
--get-duration --print duration_string
@@ -1391,8 +1438,15 @@ While these options still work, their use is not recommended since there are oth
--get-thumbnail --print thumbnail
-e, --get-title --print title
-g, --get-url --print urls
-j, --dump-json --print "%()j"
#### Not recommended
While these options still work, their use is not recommended since there are other alternatives to achieve the same
--all-formats -f all
--all-subs --sub-langs all --write-subs
--print-json -j --no-simulate
--autonumber-size NUMBER Use string formatting. Eg: %(autonumber)03d
--autonumber-start NUMBER Use internal field formatting like %(autonumber+NUMBER)s
--metadata-from-title FORMAT --parse-metadata "%(title)s:FORMAT"
@@ -1405,8 +1459,13 @@ While these options still work, their use is not recommended since there are oth
--youtube-skip-hls-manifest --extractor-args "youtube:skip=hls" (Alias: --no-youtube-include-hls-manifest)
--youtube-include-dash-manifest Default (Alias: --no-youtube-skip-dash-manifest)
--youtube-include-hls-manifest Default (Alias: --no-youtube-skip-hls-manifest)
--test Used by developers for testing extractors. Not intended for the end user
--youtube-print-sig-code Used for testing youtube signatures
#### Developer options
These options are not intended to be used by the end-user
--test Download only part of video for testing extractors
--youtube-print-sig-code For testing youtube signatures
#### Old aliases

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.2 KiB

BIN
devscripts/logo.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

View File

@@ -11,5 +11,4 @@ else
exit 1
fi
echo python3 -m pytest -k $test_set
python3 -m pytest -k "$test_set"

5
docs/Collaborators.md Normal file
View File

@@ -0,0 +1,5 @@
---
orphan: true
---
```{include} ../Collaborators.md
```

View File

@@ -73,7 +73,7 @@ excluded_modules = ['test', 'ytdlp_plugins', 'youtube-dl', 'youtube-dlc']
PyInstaller.__main__.run([
'--name=yt-dlp%s' % _x86,
'--onefile',
'--icon=devscripts/cloud.ico',
'--icon=devscripts/logo.ico',
*[f'--exclude-module={module}' for module in excluded_modules],
*[f'--hidden-import={module}' for module in dependancies],
'--upx-exclude=vcruntime140.dll',

View File

@@ -95,6 +95,7 @@
- **Bandcamp**
- **Bandcamp:album**
- **Bandcamp:weekly**
- **BandcampMusic**
- **bangumi.bilibili.com**: BiliBili番剧
- **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer
@@ -129,6 +130,7 @@
- **BitChuteChannel**
- **bitwave:replay**
- **bitwave:stream**
- **BlackboardCollaborate**
- **BleacherReport**
- **BleacherReportCMS**
- **Bloomberg**
@@ -295,6 +297,7 @@
- **Engadget**
- **Eporner**
- **EroProfile**
- **EroProfile:album**
- **Escapist**
- **ESPN**
- **ESPNArticle**
@@ -552,6 +555,8 @@
- **MinistryGrid**
- **Minoto**
- **miomio.tv**
- **mirrativ**
- **mirrativ:user**
- **MiTele**: mitele.es
- **mixcloud**
- **mixcloud:playlist**
@@ -703,6 +708,8 @@
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
- **openrec**
- **openrec:capture**
- **OraTV**
- **orf:burgenland**: Radio Burgenland
- **orf:fm4**: radio FM4
@@ -728,6 +735,8 @@
- **PalcoMP3:video**
- **pandora.tv**: 판도라TV
- **ParamountNetwork**
- **ParamountPlus**
- **ParamountPlusSeries**
- **parliamentlive.tv**: UK parliament videos
- **Parlview**
- **Patreon**
@@ -815,6 +824,7 @@
- **RCSVarious**
- **RCTIPlus**
- **RCTIPlusSeries**
- **RCTIPlusTV**
- **RDS**: RDS.ca
- **RedBull**
- **RedBullEmbed**
@@ -873,6 +883,7 @@
- **savefrom.net**
- **SBS**: sbs.com.au
- **schooltv**
- **ScienceChannel**
- **screen.yahoo:search**: Yahoo screen search
- **Screencast**
- **ScreencastOMatic**
@@ -1011,7 +1022,6 @@
- **ThisAmericanLife**
- **ThisAV**
- **ThisOldHouse**
- **ThisVid**
- **TikTok**
- **tinypic**: tinypic.com videos
- **TMZ**
@@ -1108,6 +1118,7 @@
- **ustream:channel**
- **ustudio**
- **ustudio:embed**
- **Utreon**
- **Varzesh3**
- **Vbox7**
- **VeeHD**

View File

@@ -198,7 +198,10 @@ def expect_info_dict(self, got_dict, expected_dict):
expect_dict(self, got_dict, expected_dict)
# Check for the presence of mandatory fields
if got_dict.get('_type') not in ('playlist', 'multi_video'):
for key in ('id', 'url', 'title', 'ext'):
mandatory_fields = ['id', 'title']
if expected_dict.get('ext'):
mandatory_fields.extend(('url', 'ext'))
for key in mandatory_fields:
self.assertTrue(got_dict.get(key), 'Missing mandatory field %s' % key)
# Check for mandatory fields that are automatically set by YoutubeDL
for key in ['webpage_url', 'extractor', 'extractor_key']:

View File

@@ -10,14 +10,15 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import copy
import json
from test.helper import FakeYDL, assertRegexpMatches
from yt_dlp import YoutubeDL
from yt_dlp.compat import compat_str, compat_urllib_error
from yt_dlp.compat import compat_os_name, compat_setenv, compat_str, compat_urllib_error
from yt_dlp.extractor import YoutubeIE
from yt_dlp.extractor.common import InfoExtractor
from yt_dlp.postprocessor.common import PostProcessor
from yt_dlp.utils import ExtractorError, int_or_none, match_filter_func
from yt_dlp.utils import ExtractorError, int_or_none, match_filter_func, LazyList
TEST_URL = 'http://localhost/sample.mp4'
@@ -647,6 +648,7 @@ class TestYoutubeDL(unittest.TestCase):
'title1': '$PATH',
'title2': '%PATH%',
'title3': 'foo/bar\\test',
'title4': 'foo "bar" test',
'timestamp': 1618488000,
'duration': 100000,
'playlist_index': 1,
@@ -663,21 +665,28 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(ydl.validate_outtmpl(tmpl), None)
outtmpl, tmpl_dict = ydl.prepare_outtmpl(tmpl, info or self.outtmpl_info)
out = outtmpl % tmpl_dict
out = ydl.escape_outtmpl(outtmpl) % tmpl_dict
fname = ydl.prepare_filename(info or self.outtmpl_info)
if callable(expected):
self.assertTrue(expected(out))
self.assertTrue(expected(fname))
elif isinstance(expected, compat_str):
self.assertEqual((out, fname), (expected, expected))
else:
self.assertEqual((out, fname), expected)
if not isinstance(expected, (list, tuple)):
expected = (expected, expected)
for (name, got), expect in zip((('outtmpl', out), ('filename', fname)), expected):
if callable(expect):
self.assertTrue(expect(got), f'Wrong {name} from {tmpl}')
else:
self.assertEqual(got, expect, f'Wrong {name} from {tmpl}')
# Side-effects
original_infodict = dict(self.outtmpl_info)
test('foo.bar', 'foo.bar')
original_infodict['epoch'] = self.outtmpl_info.get('epoch')
self.assertTrue(isinstance(original_infodict['epoch'], int))
test('%(epoch)d', int_or_none)
self.assertEqual(original_infodict, self.outtmpl_info)
# Auto-generated fields
test('%(id)s.%(ext)s', '1234.mp4')
test('%(duration_string)s', ('27:46:40', '27-46-40'))
test('%(epoch)d', int_or_none)
test('%(resolution)s', '1080p')
test('%(playlist_index)s', '001')
test('%(autonumber)s', '00001')
@@ -685,9 +694,15 @@ class TestYoutubeDL(unittest.TestCase):
test('%(autonumber)s', '001', autonumber_size=3)
# Escaping %
test('%', '%')
test('%%', '%')
test('%%%%', '%%')
test('%s', '%s')
test('%%%s', '%%s')
test('%d', '%d')
test('%abc%', '%abc%')
test('%%(width)06d.%(ext)s', '%(width)06d.mp4')
test('%%%(height)s', '%1080')
test('%(width)06d.%(ext)s', 'NA.mp4')
test('%(width)06d.%%(ext)s', 'NA.%(ext)s')
test('%%(width)06d.%(ext)s', '%(width)06d.mp4')
@@ -702,12 +717,18 @@ class TestYoutubeDL(unittest.TestCase):
test('%(id)s', ('ab:cd', 'ab -cd'), info={'id': 'ab:cd'})
# Invalid templates
self.assertTrue(isinstance(YoutubeDL.validate_outtmpl('%'), ValueError))
self.assertTrue(isinstance(YoutubeDL.validate_outtmpl('%(title)'), ValueError))
test('%(invalid@tmpl|def)s', 'none', outtmpl_na_placeholder='none')
test('%()s', 'NA')
test('%s', '%s')
test('%d', '%d')
test('%(..)s', 'NA')
# Entire info_dict
def expect_same_infodict(out):
got_dict = json.loads(out)
for info_field, expected in self.outtmpl_info.items():
self.assertEqual(got_dict.get(info_field), expected, info_field)
return True
test('%()j', (expect_same_infodict, str))
# NA placeholder
NA_TEST_OUTTMPL = '%(uploader_date)s-%(width)d-%(x|def)s-%(id)s.%(ext)s'
@@ -738,13 +759,26 @@ class TestYoutubeDL(unittest.TestCase):
test('%(width|0)04d', '0000')
test('a%(width|)d', 'a', outtmpl_na_placeholder='none')
# Internal formatting
FORMATS = self.outtmpl_info['formats']
sanitize = lambda x: x.replace(':', ' -').replace('"', "'")
# Custom type casting
test('%(formats.:.id)l', 'id1, id2, id3')
test('%(ext)l', 'mp4')
test('%(formats.:.id) 15l', ' id1, id2, id3')
test('%(formats)j', (json.dumps(FORMATS), sanitize(json.dumps(FORMATS))))
if compat_os_name == 'nt':
test('%(title4)q', ('"foo \\"bar\\" test"', "'foo _'bar_' test'"))
else:
test('%(title4)q', ('\'foo "bar" test\'', "'foo 'bar' test'"))
# Internal formatting
test('%(timestamp-1000>%H-%M-%S)s', '11-43-20')
test('%(title|%)s %(title|%%)s', '% %%')
test('%(id+1-height+3)05d', '00158')
test('%(width+100)05d', 'NA')
test('%(formats.0) 15s', ('% 15s' % FORMATS[0], '% 15s' % str(FORMATS[0]).replace(':', ' -')))
test('%(formats.0)r', (repr(FORMATS[0]), repr(FORMATS[0]).replace(':', ' -')))
test('%(formats.0) 15s', ('% 15s' % FORMATS[0], '% 15s' % sanitize(str(FORMATS[0]))))
test('%(formats.0)r', (repr(FORMATS[0]), sanitize(repr(FORMATS[0]))))
test('%(height.0)03d', '001')
test('%(-height.0)04d', '-001')
test('%(formats.-1.id)s', FORMATS[-1]['id'])
@@ -754,11 +788,22 @@ class TestYoutubeDL(unittest.TestCase):
test('%(formats.0.id.-1+id)f', '1235.000000')
test('%(formats.0.id.-1+formats.1.id.-1)d', '3')
# Laziness
def gen():
yield from range(5)
raise self.assertTrue(False, 'LazyList should not be evaluated till here')
test('%(key.4)s', '4', info={'key': LazyList(gen())})
# Empty filename
test('%(foo|)s-%(bar|)s.%(ext)s', '-.mp4')
# test('%(foo|)s.%(ext)s', ('.mp4', '_.mp4')) # fixme
# test('%(foo|)s', ('', '_')) # fixme
# Environment variable expansion for prepare_filename
compat_setenv('__yt_dlp_var', 'expanded')
envvar = '%__yt_dlp_var%' if compat_os_name == 'nt' else '$__yt_dlp_var'
test(envvar, (envvar, 'expanded'))
# Path expansion and escaping
test('Hello %(title1)s', 'Hello $PATH')
test('Hello %(title2)s', 'Hello %PATH%')

View File

@@ -73,6 +73,8 @@ class TestDownload(unittest.TestCase):
maxDiff = None
COMPLETED_TESTS = {}
def __str__(self):
"""Identify each test with the `add_ie` attribute, if available."""
@@ -94,6 +96,9 @@ class TestDownload(unittest.TestCase):
def generator(test_case, tname):
def test_template(self):
if self.COMPLETED_TESTS.get(tname):
return
self.COMPLETED_TESTS[tname] = True
ie = yt_dlp.extractor.get_info_extractor(test_case['name'])()
other_ies = [get_info_extractor(ie_key)() for ie_key in test_case.get('add_ie', [])]
is_playlist = any(k.startswith('playlist') for k in test_case)
@@ -108,8 +113,13 @@ def generator(test_case, tname):
for tc in test_cases:
info_dict = tc.get('info_dict', {})
if not (info_dict.get('id') and info_dict.get('ext')):
raise Exception('Test definition incorrect. The output file cannot be known. Are both \'id\' and \'ext\' keys present?')
params = tc.get('params', {})
if not info_dict.get('id'):
raise Exception('Test definition incorrect. \'id\' key is not present')
elif not info_dict.get('ext'):
if params.get('skip_download') and params.get('ignore_no_formats_error'):
continue
raise Exception('Test definition incorrect. The output file cannot be known. \'ext\' key is not present')
if 'skip' in test_case:
print_skipping(test_case['skip'])
@@ -137,7 +147,7 @@ def generator(test_case, tname):
expect_warnings(ydl, test_case.get('expected_warnings', []))
def get_tc_filename(tc):
return ydl.prepare_filename(tc.get('info_dict', {}))
return ydl.prepare_filename(dict(tc.get('info_dict', {})))
res_dict = None
@@ -250,12 +260,12 @@ def generator(test_case, tname):
# And add them to TestDownload
for n, test_case in enumerate(defs):
tname = 'test_' + str(test_case['name'])
i = 1
while hasattr(TestDownload, tname):
tname = 'test_%s_%d' % (test_case['name'], i)
i += 1
tests_counter = {}
for test_case in defs:
name = test_case['name']
i = tests_counter.get(name, 0)
tests_counter[name] = i + 1
tname = f'test_{name}_{i}' if i else f'test_{name}'
test_method = generator(test_case, tname)
test_method.__name__ = str(tname)
ie_list = test_case.get('add_ie')
@@ -264,5 +274,22 @@ for n, test_case in enumerate(defs):
del test_method
def batch_generator(name, num_tests):
def test_template(self):
for i in range(num_tests):
getattr(self, f'test_{name}_{i}' if i else f'test_{name}')()
return test_template
for name, num_tests in tests_counter.items():
test_method = batch_generator(name, num_tests)
test_method.__name__ = f'test_{name}_all'
test_method.add_ie = ''
setattr(TestDownload, test_method.__name__, test_method)
del test_method
if __name__ == '__main__':
unittest.main()

View File

@@ -8,13 +8,14 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import try_rm
from test.helper import is_download_test, try_rm
root_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
download_file = join(root_dir, 'test.webm')
@is_download_test
class TestOverwrites(unittest.TestCase):
def setUp(self):
# create an empty file

View File

@@ -11,32 +11,31 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from yt_dlp import YoutubeDL
from yt_dlp.compat import compat_shlex_quote
from yt_dlp.postprocessor import (
ExecAfterDownloadPP,
ExecPP,
FFmpegThumbnailsConvertorPP,
MetadataFromFieldPP,
MetadataFromTitlePP,
MetadataParserPP,
)
class TestMetadataFromField(unittest.TestCase):
def test_format_to_regex(self):
pp = MetadataFromFieldPP(None, ['title:%(title)s - %(artist)s'])
self.assertEqual(pp._data[0]['regex'], r'(?P<title>.+)\ \-\ (?P<artist>.+)')
self.assertEqual(
MetadataParserPP.format_to_regex('%(title)s - %(artist)s'),
r'(?P<title>.+)\ \-\ (?P<artist>.+)')
self.assertEqual(MetadataParserPP.format_to_regex(r'(?P<x>.+)'), r'(?P<x>.+)')
def test_field_to_outtmpl(self):
pp = MetadataFromFieldPP(None, ['title:%(title)s : %(artist)s'])
self.assertEqual(pp._data[0]['tmpl'], '%(title)s')
def test_field_to_template(self):
self.assertEqual(MetadataParserPP.field_to_template('title'), '%(title)s')
self.assertEqual(MetadataParserPP.field_to_template('1'), '1')
self.assertEqual(MetadataParserPP.field_to_template('foo bar'), 'foo bar')
self.assertEqual(MetadataParserPP.field_to_template(' literal'), ' literal')
def test_in_out_seperation(self):
pp = MetadataFromFieldPP(None, ['%(title)s \\: %(artist)s:%(title)s : %(artist)s'])
self.assertEqual(pp._data[0]['in'], '%(title)s : %(artist)s')
self.assertEqual(pp._data[0]['out'], '%(title)s : %(artist)s')
class TestMetadataFromTitle(unittest.TestCase):
def test_format_to_regex(self):
pp = MetadataFromTitlePP(None, '%(title)s - %(artist)s')
self.assertEqual(pp._titleregex, r'(?P<title>.+)\ \-\ (?P<artist>.+)')
def test_metadatafromfield(self):
self.assertEqual(
MetadataFromFieldPP.to_action('%(title)s \\: %(artist)s:%(title)s : %(artist)s'),
(MetadataParserPP.Actions.INTERPRET, '%(title)s : %(artist)s', '%(title)s : %(artist)s'))
class TestConvertThumbnail(unittest.TestCase):
@@ -60,12 +59,12 @@ class TestConvertThumbnail(unittest.TestCase):
os.remove(file.format(out))
class TestExecAfterDownload(unittest.TestCase):
class TestExec(unittest.TestCase):
def test_parse_cmd(self):
pp = ExecAfterDownloadPP(YoutubeDL(), '')
pp = ExecPP(YoutubeDL(), '')
info = {'filepath': 'file name'}
quoted_filepath = compat_shlex_quote(info['filepath'])
cmd = 'echo %s' % compat_shlex_quote(info['filepath'])
self.assertEqual(pp.parse_cmd('echo', info), 'echo %s' % quoted_filepath)
self.assertEqual(pp.parse_cmd('echo.{}', info), 'echo.%s' % quoted_filepath)
self.assertEqual(pp.parse_cmd('echo "%(filepath)s"', info), 'echo "%s"' % info['filepath'])
self.assertEqual(pp.parse_cmd('echo', info), cmd)
self.assertEqual(pp.parse_cmd('echo {}', info), cmd)
self.assertEqual(pp.parse_cmd('echo %(filepath)q', info), cmd)

View File

@@ -1207,35 +1207,12 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
'9999 51')
def test_match_str(self):
self.assertRaises(ValueError, match_str, 'xy>foobar', {})
# Unary
self.assertFalse(match_str('xy', {'x': 1200}))
self.assertTrue(match_str('!xy', {'x': 1200}))
self.assertTrue(match_str('x', {'x': 1200}))
self.assertFalse(match_str('!x', {'x': 1200}))
self.assertTrue(match_str('x', {'x': 0}))
self.assertFalse(match_str('x>0', {'x': 0}))
self.assertFalse(match_str('x>0', {}))
self.assertTrue(match_str('x>?0', {}))
self.assertTrue(match_str('x>1K', {'x': 1200}))
self.assertFalse(match_str('x>2K', {'x': 1200}))
self.assertTrue(match_str('x>=1200 & x < 1300', {'x': 1200}))
self.assertFalse(match_str('x>=1100 & x < 1200', {'x': 1200}))
self.assertFalse(match_str('y=a212', {'y': 'foobar42'}))
self.assertTrue(match_str('y=foobar42', {'y': 'foobar42'}))
self.assertFalse(match_str('y!=foobar42', {'y': 'foobar42'}))
self.assertTrue(match_str('y!=foobar2', {'y': 'foobar42'}))
self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 90, 'description': 'foo'}))
self.assertTrue(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'description': 'foo'}))
self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'dislike_count': 60, 'description': 'foo'}))
self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'dislike_count': 10}))
self.assertTrue(match_str('is_live', {'is_live': True}))
self.assertFalse(match_str('is_live', {'is_live': False}))
self.assertFalse(match_str('is_live', {'is_live': None}))
@@ -1249,6 +1226,69 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
self.assertFalse(match_str('!title', {'title': 'abc'}))
self.assertFalse(match_str('!title', {'title': ''}))
# Numeric
self.assertFalse(match_str('x>0', {'x': 0}))
self.assertFalse(match_str('x>0', {}))
self.assertTrue(match_str('x>?0', {}))
self.assertTrue(match_str('x>1K', {'x': 1200}))
self.assertFalse(match_str('x>2K', {'x': 1200}))
self.assertTrue(match_str('x>=1200 & x < 1300', {'x': 1200}))
self.assertFalse(match_str('x>=1100 & x < 1200', {'x': 1200}))
# String
self.assertFalse(match_str('y=a212', {'y': 'foobar42'}))
self.assertTrue(match_str('y=foobar42', {'y': 'foobar42'}))
self.assertFalse(match_str('y!=foobar42', {'y': 'foobar42'}))
self.assertTrue(match_str('y!=foobar2', {'y': 'foobar42'}))
self.assertTrue(match_str('y^=foo', {'y': 'foobar42'}))
self.assertFalse(match_str('y!^=foo', {'y': 'foobar42'}))
self.assertFalse(match_str('y^=bar', {'y': 'foobar42'}))
self.assertTrue(match_str('y!^=bar', {'y': 'foobar42'}))
self.assertRaises(ValueError, match_str, 'x^=42', {'x': 42})
self.assertTrue(match_str('y*=bar', {'y': 'foobar42'}))
self.assertFalse(match_str('y!*=bar', {'y': 'foobar42'}))
self.assertFalse(match_str('y*=baz', {'y': 'foobar42'}))
self.assertTrue(match_str('y!*=baz', {'y': 'foobar42'}))
self.assertTrue(match_str('y$=42', {'y': 'foobar42'}))
self.assertFalse(match_str('y$=43', {'y': 'foobar42'}))
# And
self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 90, 'description': 'foo'}))
self.assertTrue(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'description': 'foo'}))
self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'dislike_count': 60, 'description': 'foo'}))
self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'dislike_count': 10}))
# Regex
self.assertTrue(match_str(r'x~=\bbar', {'x': 'foo bar'}))
self.assertFalse(match_str(r'x~=\bbar.+', {'x': 'foo bar'}))
self.assertFalse(match_str(r'x~=^FOO', {'x': 'foo bar'}))
self.assertTrue(match_str(r'x~=(?i)^FOO', {'x': 'foo bar'}))
# Quotes
self.assertTrue(match_str(r'x^="foo"', {'x': 'foo "bar"'}))
self.assertFalse(match_str(r'x^="foo "', {'x': 'foo "bar"'}))
self.assertFalse(match_str(r'x$="bar"', {'x': 'foo "bar"'}))
self.assertTrue(match_str(r'x$=" \"bar\""', {'x': 'foo "bar"'}))
# Escaping &
self.assertFalse(match_str(r'x=foo & bar', {'x': 'foo & bar'}))
self.assertTrue(match_str(r'x=foo \& bar', {'x': 'foo & bar'}))
self.assertTrue(match_str(r'x=foo \& bar & x^=foo', {'x': 'foo & bar'}))
self.assertTrue(match_str(r'x="foo \& bar" & x^=foo', {'x': 'foo & bar'}))
# Example from docs
self.assertTrue(
r'!is_live & like_count>?100 & description~=\'(?i)\bcats \& dogs\b\'',
{'description': 'Raining Cats & Dogs'})
def test_parse_dfxp_time_expr(self):
self.assertEqual(parse_dfxp_time_expr(None), None)
self.assertEqual(parse_dfxp_time_expr(''), None)
@@ -1537,8 +1577,11 @@ Line 1
self.assertEqual(LazyList(it).exhaust(), it)
self.assertEqual(LazyList(it)[5], it[5])
self.assertEqual(LazyList(it)[5:], it[5:])
self.assertEqual(LazyList(it)[:5], it[:5])
self.assertEqual(LazyList(it)[::2], it[::2])
self.assertEqual(LazyList(it)[1::2], it[1::2])
self.assertEqual(LazyList(it)[5::-1], it[5::-1])
self.assertEqual(LazyList(it)[6:2:-2], it[6:2:-2])
self.assertEqual(LazyList(it)[::-1], it[::-1])
@@ -1550,6 +1593,7 @@ Line 1
self.assertEqual(list(LazyList(it).reverse()), it[::-1])
self.assertEqual(list(LazyList(it).reverse()[1:3:7]), it[::-1][1:3:7])
self.assertEqual(list(LazyList(it).reverse()[::-1]), it)
def test_LazyList_laziness(self):

View File

@@ -35,6 +35,7 @@ from .compat import (
compat_kwargs,
compat_numeric_types,
compat_os_name,
compat_shlex_quote,
compat_str,
compat_tokenize_tokenize,
compat_urllib_error,
@@ -65,7 +66,8 @@ from .utils import (
float_or_none,
format_bytes,
format_field,
STR_FORMAT_RE,
STR_FORMAT_RE_TMPL,
STR_FORMAT_TYPES,
formatSeconds,
GeoRestrictedError,
HEADRequest,
@@ -107,6 +109,7 @@ from .utils import (
try_get,
UnavailableVideoError,
url_basename,
variadic,
version_tuple,
write_json_file,
write_string,
@@ -123,6 +126,7 @@ from .extractor import (
)
from .extractor.openload import PhantomJSwrapper
from .downloader import (
FFmpegFD,
get_suitable_downloader,
shorten_protocol_name
)
@@ -194,7 +198,8 @@ class YoutubeDL(object):
(or video) as a single JSON line.
force_write_download_archive: Force writing download archive regardless
of 'skip_download' or 'simulate'.
simulate: Do not download the video files.
simulate: Do not download the video files. If unset (or None),
simulate only if listsubtitles, listformats or list_thumbnails is used
format: Video format code. see "FORMAT SELECTION" for more details.
allow_unplayable_formats: Allow unplayable formats to be extracted and downloaded.
ignore_no_formats_error: Ignore "No video formats" error. Usefull for
@@ -215,7 +220,7 @@ class YoutubeDL(object):
'temp' and the keys of OUTTMPL_TYPES (in utils.py)
outtmpl: Dictionary of templates for output names. Allowed keys
are 'default' and the keys of OUTTMPL_TYPES (in utils.py).
A string a also accepted for backward compatibility
For compatibility with youtube-dl, a single string can also be used
outtmpl_na_placeholder: Placeholder for unavailable meta fields.
restrictfilenames: Do not allow "&" and spaces in file names
trim_file_name: Limit length of filename (extension excluded)
@@ -229,6 +234,8 @@ class YoutubeDL(object):
overwrites: Overwrite all video and metadata files if True,
overwrite only non-video files if None
and don't overwrite any file if False
For compatibility with youtube-dl,
"nooverwrites" may also be used instead
playliststart: Playlist item to start at.
playlistend: Playlist item to end at.
playlist_items: Specific indices of playlist to download.
@@ -241,7 +248,7 @@ class YoutubeDL(object):
writedescription: Write the video description to a .description file
writeinfojson: Write the video description to a .info.json file
clean_infojson: Remove private fields from the infojson
writecomments: Extract video comments. This will not be written to disk
getcomments: Extract video comments. This will not be written to disk
unless writeinfojson is also given
writeannotations: Write the video annotations to a .annotations.xml file
writethumbnail: Write the thumbnail image to a file
@@ -400,7 +407,8 @@ class YoutubeDL(object):
compat_opts: Compatibility options. See "Differences in default behavior".
The following options do not work when used through the API:
filename, abort-on-error, multistreams, no-live-chat,
no-playlist-metafiles. Refer __init__.py for their implementation
no-clean-infojson, no-playlist-metafiles, no-keep-subs.
Refer __init__.py for their implementation
The following parameters are not used by YoutubeDL itself, they are used by
the downloader (see yt_dlp/downloader/common.py):
@@ -414,10 +422,12 @@ class YoutubeDL(object):
ffmpeg_location: Location of the ffmpeg/avconv binary; either the path
to the binary or its containing directory.
postprocessor_args: A dictionary of postprocessor/executable keys (in lower case)
and a list of additional command-line arguments for the
postprocessor/executable. The dict can also have "PP+EXE" keys
which are used when the given exe is used by the given PP.
Use 'default' as the name for arguments to passed to all PP
and a list of additional command-line arguments for the
postprocessor/executable. The dict can also have "PP+EXE" keys
which are used when the given exe is used by the given PP.
Use 'default' as the name for arguments to passed to all PP
For compatibility with youtube-dl, a single list of args
can also be used
The following options are used by the extractors:
extractor_retries: Number of times to retry for known errors
@@ -509,8 +519,15 @@ class YoutubeDL(object):
self.report_warning('--merge-output-format will be ignored since --remux-video or --recode-video is given')
self.params['merge_output_format'] = self.params['final_ext']
if 'overwrites' in self.params and self.params['overwrites'] is None:
del self.params['overwrites']
if self.params.get('overwrites') is None:
self.params.pop('overwrites', None)
elif self.params.get('nooverwrites') is not None:
# nooverwrites was unnecessarily changed to overwrites
# in 0c3d0f51778b153f65c21906031c2e091fcfb641
# This ensures compatibility with both keys
self.params['overwrites'] = not self.params['nooverwrites']
else:
self.params['nooverwrites'] = not self.params['overwrites']
if params.get('bidi_workaround', False):
try:
@@ -701,7 +718,7 @@ class YoutubeDL(object):
def save_console_title(self):
if not self.params.get('consoletitle', False):
return
if self.params.get('simulate', False):
if self.params.get('simulate'):
return
if compat_os_name != 'nt' and 'TERM' in os.environ:
# Save the title on stack
@@ -710,7 +727,7 @@ class YoutubeDL(object):
def restore_console_title(self):
if not self.params.get('consoletitle', False):
return
if self.params.get('simulate', False):
if self.params.get('simulate'):
return
if compat_os_name != 'nt' and 'TERM' in os.environ:
# Restore the title from stack
@@ -845,28 +862,52 @@ class YoutubeDL(object):
return sanitize_path(path, force=self.params.get('windowsfilenames'))
@staticmethod
def validate_outtmpl(tmpl):
def _outtmpl_expandpath(outtmpl):
# expand_path translates '%%' into '%' and '$$' into '$'
# correspondingly that is not what we want since we need to keep
# '%%' intact for template dict substitution step. Working around
# with boundary-alike separator hack.
sep = ''.join([random.choice(ascii_letters) for _ in range(32)])
outtmpl = outtmpl.replace('%%', '%{0}%'.format(sep)).replace('$$', '${0}$'.format(sep))
# outtmpl should be expand_path'ed before template dict substitution
# because meta fields may contain env variables we don't want to
# be expanded. For example, for outtmpl "%(title)s.%(ext)s" and
# title "Hello $PATH", we don't want `$PATH` to be expanded.
return expand_path(outtmpl).replace(sep, '')
@staticmethod
def escape_outtmpl(outtmpl):
''' Escape any remaining strings like %s, %abc% etc. '''
return re.sub(
STR_FORMAT_RE_TMPL.format('', '(?![%(\0])'),
lambda mobj: ('' if mobj.group('has_key') else '%') + mobj.group(0),
outtmpl)
@classmethod
def validate_outtmpl(cls, outtmpl):
''' @return None or Exception object '''
outtmpl = re.sub(
STR_FORMAT_RE_TMPL.format('[^)]*', '[ljq]'),
lambda mobj: f'{mobj.group(0)[:-1]}s',
cls._outtmpl_expandpath(outtmpl))
try:
re.sub(
STR_FORMAT_RE.format(''),
lambda mobj: ('%' if not mobj.group('has_key') else '') + mobj.group(0),
tmpl
) % collections.defaultdict(int)
cls.escape_outtmpl(outtmpl) % collections.defaultdict(int)
return None
except ValueError as err:
return err
def prepare_outtmpl(self, outtmpl, info_dict, sanitize=None):
""" Make the template and info_dict suitable for substitution (outtmpl % info_dict)"""
info_dict = dict(info_dict)
na = self.params.get('outtmpl_na_placeholder', 'NA')
""" Make the template and info_dict suitable for substitution : ydl.outtmpl_escape(outtmpl) % info_dict """
info_dict.setdefault('epoch', int(time.time())) # keep epoch consistent once set
info_dict = dict(info_dict) # Do not sanitize so as not to consume LazyList
for key in ('__original_infodict', '__postprocessors'):
info_dict.pop(key, None)
info_dict['duration_string'] = ( # %(duration>%H-%M-%S)s is wrong if duration > 24hrs
formatSeconds(info_dict['duration'], '-' if sanitize else ':')
if info_dict.get('duration', None) is not None
else None)
info_dict['epoch'] = int(time.time())
info_dict['autonumber'] = self.params.get('autonumber_start', 1) - 1 + self._num_downloads
if info_dict.get('resolution') is None:
info_dict['resolution'] = self.format_resolution(info_dict, default=None)
@@ -879,14 +920,14 @@ class YoutubeDL(object):
}
TMPL_DICT = {}
EXTERNAL_FORMAT_RE = re.compile(STR_FORMAT_RE.format('[^)]*'))
EXTERNAL_FORMAT_RE = re.compile(STR_FORMAT_RE_TMPL.format('[^)]*', f'[{STR_FORMAT_TYPES}ljq]'))
MATH_FUNCTIONS = {
'+': float.__add__,
'-': float.__sub__,
}
# Field is of the form key1.key2...
# where keys (except first) can be string, int or slice
FIELD_RE = r'\w+(?:\.(?:\w+|{num}|{num}?(?::{num}?){{1,2}}))*'.format(num=r'(?:-?\d+)')
FIELD_RE = r'\w*(?:\.(?:\w+|{num}|{num}?(?::{num}?){{1,2}}))*'.format(num=r'(?:-?\d+)')
MATH_FIELD_RE = r'''{field}|{num}'''.format(field=FIELD_RE, num=r'-?\d+(?:.\d+)?')
MATH_OPERATORS_RE = r'(?:%s)' % '|'.join(map(re.escape, MATH_FUNCTIONS.keys()))
INTERNAL_FORMAT_RE = re.compile(r'''(?x)
@@ -897,12 +938,15 @@ class YoutubeDL(object):
(?:\|(?P<default>.*?))?
$'''.format(field=FIELD_RE, math_op=MATH_OPERATORS_RE, math_field=MATH_FIELD_RE))
get_key = lambda k: traverse_obj(
info_dict, k.split('.'), is_user_input=True, traverse_string=True)
def _traverse_infodict(k):
k = k.split('.')
if k[0] == '':
k.pop(0)
return traverse_obj(info_dict, k, is_user_input=True, traverse_string=True)
def get_value(mdict):
# Object traversal
value = get_key(mdict['fields'])
value = _traverse_infodict(mdict['fields'])
# Negative
if mdict['negate']:
value = float_or_none(value)
@@ -924,7 +968,7 @@ class YoutubeDL(object):
item, multiplier = (item[1:], -1) if item[0] == '-' else (item, 1)
offset = float_or_none(item)
if offset is None:
offset = float_or_none(get_key(item))
offset = float_or_none(_traverse_infodict(item))
try:
value = operator(value, multiplier * offset)
except (TypeError, ZeroDivisionError):
@@ -936,12 +980,17 @@ class YoutubeDL(object):
return value
na = self.params.get('outtmpl_na_placeholder', 'NA')
def _dumpjson_default(obj):
if isinstance(obj, (set, LazyList)):
return list(obj)
raise TypeError(f'Object of type {type(obj).__name__} is not JSON serializable')
def create_key(outer_mobj):
if not outer_mobj.group('has_key'):
return '%{}'.format(outer_mobj.group(0))
return f'%{outer_mobj.group(0)}'
key = outer_mobj.group('key')
fmt = outer_mobj.group('format')
mobj = re.match(INTERNAL_FORMAT_RE, key)
if mobj is None:
value, default, mobj = None, na, {'fields': ''}
@@ -950,13 +999,21 @@ class YoutubeDL(object):
default = mobj['default'] if mobj['default'] is not None else na
value = get_value(mobj)
fmt = outer_mobj.group('format')
if fmt == 's' and value is not None and key in field_size_compat_map.keys():
fmt = '0{:d}d'.format(field_size_compat_map[key])
value = default if value is None else value
if fmt == 'c':
value = compat_str(value)
str_fmt = f'{fmt[:-1]}s'
if fmt[-1] == 'l':
value, fmt = ', '.join(variadic(value)), str_fmt
elif fmt[-1] == 'j':
value, fmt = json.dumps(value, default=_dumpjson_default), str_fmt
elif fmt[-1] == 'q':
value, fmt = compat_shlex_quote(str(value)), str_fmt
elif fmt[-1] == 'c':
value = str(value)
if value is None:
value, fmt = default, 's'
else:
@@ -965,16 +1022,18 @@ class YoutubeDL(object):
value = float_or_none(value)
if value is None:
value, fmt = default, 's'
if sanitize:
if fmt[-1] == 'r':
# If value is an object, sanitize might convert it to a string
# So we convert it to repr first
value, fmt = repr(value), '%ss' % fmt[:-1]
value, fmt = repr(value), str_fmt
if fmt[-1] in 'csr':
value = sanitize(mobj['fields'].split('.')[-1], value)
key += '\0%s' % fmt
key = '%s\0%s' % (key.replace('%', '%\0'), outer_mobj.group('format'))
TMPL_DICT[key] = value
return '%({key}){fmt}'.format(key=key, fmt=fmt)
return '{prefix}%({key}){fmt}'.format(key=key, fmt=fmt, prefix=outer_mobj.group('prefix'))
return EXTERNAL_FORMAT_RE.sub(create_key, outtmpl), TMPL_DICT
@@ -986,19 +1045,8 @@ class YoutubeDL(object):
is_id=(k == 'id' or k.endswith('_id')))
outtmpl = self.outtmpl_dict.get(tmpl_type, self.outtmpl_dict['default'])
outtmpl, template_dict = self.prepare_outtmpl(outtmpl, info_dict, sanitize)
# expand_path translates '%%' into '%' and '$$' into '$'
# correspondingly that is not what we want since we need to keep
# '%%' intact for template dict substitution step. Working around
# with boundary-alike separator hack.
sep = ''.join([random.choice(ascii_letters) for _ in range(32)])
outtmpl = outtmpl.replace('%%', '%{0}%'.format(sep)).replace('$$', '${0}$'.format(sep))
# outtmpl should be expand_path'ed before template dict substitution
# because meta fields may contain env variables we don't want to
# be expanded. For example, for outtmpl "%(title)s.%(ext)s" and
# title "Hello $PATH", we don't want `$PATH` to be expanded.
filename = expand_path(outtmpl).replace(sep, '') % template_dict
outtmpl = self.escape_outtmpl(self._outtmpl_expandpath(outtmpl))
filename = outtmpl % template_dict
force_ext = OUTTMPL_TYPES.get(tmpl_type)
if force_ext is not None:
@@ -1031,7 +1079,6 @@ class YoutubeDL(object):
self.report_warning('--paths is ignored when an outputting to stdout', only_once=True)
elif os.path.isabs(filename):
self.report_warning('--paths is ignored since an absolute path is given in output template', only_once=True)
self.__prepare_filename_warned = True
if filename == '-' or not filename:
return filename
@@ -1234,7 +1281,7 @@ class YoutubeDL(object):
ie_result = self.process_video_result(ie_result, download=download)
additional_urls = (ie_result or {}).get('additional_urls')
if additional_urls:
# TODO: Improve MetadataFromFieldPP to allow setting a list
# TODO: Improve MetadataParserPP to allow setting a list
if isinstance(additional_urls, compat_str):
additional_urls = [additional_urls]
self.to_screen(
@@ -1310,15 +1357,12 @@ class YoutubeDL(object):
'It needs to be updated.' % ie_result.get('extractor'))
def _fixup(r):
self.add_extra_info(
r,
{
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
}
)
self.add_extra_info(r, {
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
})
return r
ie_result['entries'] = [
self.process_ie_result(_fixup(r), download, extra_info)
@@ -1434,7 +1478,7 @@ class YoutubeDL(object):
else:
self.to_screen('[info] Writing playlist metadata as JSON to: ' + infofn)
try:
write_json_file(self.filter_requested_info(ie_result, self.params.get('clean_infojson', True)), infofn)
write_json_file(self.sanitize_info(ie_result, self.params.get('clean_infojson', True)), infofn)
except (OSError, IOError):
self.report_error('Cannot write playlist metadata to JSON file ' + infofn)
@@ -1582,7 +1626,7 @@ class YoutubeDL(object):
return merger.available and merger.can_merge()
prefer_best = (
not self.params.get('simulate', False)
not self.params.get('simulate')
and download
and (
not can_merge()
@@ -1730,6 +1774,7 @@ class YoutubeDL(object):
if not allow_multiple_streams[aud_vid] and fmt_info.get(aud_vid[0] + 'codec') != 'none':
if get_no_more[aud_vid]:
formats_info.pop(i)
break
get_no_more[aud_vid] = True
if len(formats_info) == 1:
@@ -2154,7 +2199,7 @@ class YoutubeDL(object):
format['format'] = '{id} - {res}{note}'.format(
id=format['format_id'],
res=self.format_resolution(format),
note=' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '',
note=format_field(format, 'format_note', ' (%s)'),
)
# Automatically determine file extension if missing
if format.get('ext') is None:
@@ -2183,20 +2228,22 @@ class YoutubeDL(object):
info_dict, _ = self.pre_process(info_dict)
list_only = self.params.get('list_thumbnails') or self.params.get('listformats') or self.params.get('listsubtitles')
if self.params.get('list_thumbnails'):
self.list_thumbnails(info_dict)
if self.params.get('listformats'):
if not info_dict.get('formats'):
raise ExtractorError('No video formats found', expected=True)
self.list_formats(info_dict)
if self.params.get('listsubtitles'):
if 'automatic_captions' in info_dict:
self.list_subtitles(
info_dict['id'], automatic_captions, 'automatic captions')
self.list_subtitles(info_dict['id'], subtitles, 'subtitles')
list_only = self.params.get('simulate') is None and (
self.params.get('list_thumbnails') or self.params.get('listformats') or self.params.get('listsubtitles'))
if list_only:
# Without this printing, -F --print-json will not work
self.__forced_printings(info_dict, self.prepare_filename(info_dict), incomplete=True)
if self.params.get('list_thumbnails'):
self.list_thumbnails(info_dict)
if self.params.get('listformats'):
if not info_dict.get('formats'):
raise ExtractorError('No video formats found', expected=True)
self.list_formats(info_dict)
if self.params.get('listsubtitles'):
if 'automatic_captions' in info_dict:
self.list_subtitles(
info_dict['id'], automatic_captions, 'automatic captions')
self.list_subtitles(info_dict['id'], subtitles, 'subtitles')
return
format_selector = self.format_selector
@@ -2292,7 +2339,8 @@ class YoutubeDL(object):
requested_langs = ['en']
else:
requested_langs = [list(all_sub_langs)[0]]
self.write_debug('Downloading subtitles: %s' % ', '.join(requested_langs))
if requested_langs:
self.write_debug('Downloading subtitles: %s' % ', '.join(requested_langs))
formats_query = self.params.get('subtitlesformat', 'best')
formats_preference = formats_query.split('/') if formats_query else []
@@ -2340,11 +2388,13 @@ class YoutubeDL(object):
elif 'url' in info_dict:
info_dict['urls'] = info_dict['url'] + info_dict.get('play_path', '')
if self.params.get('forceprint') or self.params.get('forcejson'):
self.post_extract(info_dict)
for tmpl in self.params.get('forceprint', []):
if re.match(r'\w+$', tmpl):
tmpl = '%({})s'.format(tmpl)
tmpl, info_copy = self.prepare_outtmpl(tmpl, info_dict)
self.to_stdout(tmpl % info_copy)
self.to_stdout(self.escape_outtmpl(tmpl) % info_copy)
print_mandatory('title')
print_mandatory('id')
@@ -2352,13 +2402,12 @@ class YoutubeDL(object):
print_optional('thumbnail')
print_optional('description')
print_optional('filename')
if self.params.get('forceduration', False) and info_dict.get('duration') is not None:
if self.params.get('forceduration') and info_dict.get('duration') is not None:
self.to_stdout(formatSeconds(info_dict['duration']))
print_mandatory('format')
if self.params.get('forcejson', False):
self.post_extract(info_dict)
self.to_stdout(json.dumps(info_dict, default=repr))
if self.params.get('forcejson'):
self.to_stdout(json.dumps(self.sanitize_info(info_dict)))
def dl(self, name, info, subtitle=False, test=False):
@@ -2377,7 +2426,7 @@ class YoutubeDL(object):
}
else:
params = self.params
fd = get_suitable_downloader(info, params)(self, params)
fd = get_suitable_downloader(info, params, to_stdout=(name == '-'))(self, params)
if not test:
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
@@ -2393,8 +2442,6 @@ class YoutubeDL(object):
assert info_dict.get('_type', 'video') == 'video'
info_dict.setdefault('__postprocessors', [])
max_downloads = self.params.get('max_downloads')
if max_downloads is not None:
if self._num_downloads >= int(max_downloads):
@@ -2420,7 +2467,7 @@ class YoutubeDL(object):
# Forced printings
self.__forced_printings(info_dict, full_filename, incomplete=('format' not in info_dict))
if self.params.get('simulate', False):
if self.params.get('simulate'):
if self.params.get('force_write_download_archive', False):
self.record_download_archive(info_dict)
@@ -2520,7 +2567,7 @@ class YoutubeDL(object):
else:
self.to_screen('[info] Writing video metadata as JSON to: ' + infofn)
try:
write_json_file(self.filter_requested_info(info_dict, self.params.get('clean_infojson', True)), infofn)
write_json_file(self.sanitize_info(info_dict, self.params.get('clean_infojson', True)), infofn)
except (OSError, IOError):
self.report_error('Cannot write video metadata to JSON file ' + infofn)
return
@@ -2595,6 +2642,7 @@ class YoutubeDL(object):
info_dict = self.run_pp(MoveFilesAfterDownloadPP(self, False), info_dict)
else:
# Download
info_dict.setdefault('__postprocessors', [])
try:
def existing_file(*filepaths):
@@ -2647,14 +2695,17 @@ class YoutubeDL(object):
info_dict['ext'] = 'mkv'
self.report_warning(
'Requested formats are incompatible for merge and will be merged into mkv.')
new_ext = info_dict['ext']
def correct_ext(filename):
def correct_ext(filename, ext=new_ext):
if filename == '-':
return filename
filename_real_ext = os.path.splitext(filename)[1][1:]
filename_wo_ext = (
os.path.splitext(filename)[0]
if filename_real_ext == old_ext
if filename_real_ext in (old_ext, new_ext)
else filename)
return '%s.%s' % (filename_wo_ext, info_dict['ext'])
return '%s.%s' % (filename_wo_ext, ext)
# Ensure filename always has a correct extension for successful merge
full_filename = correct_ext(full_filename)
@@ -2663,20 +2714,16 @@ class YoutubeDL(object):
info_dict['__real_download'] = False
_protocols = set(determine_protocol(f) for f in requested_formats)
if len(_protocols) == 1:
if len(_protocols) == 1: # All requested formats have same protocol
info_dict['protocol'] = _protocols.pop()
directly_mergable = (
'no-direct-merge' not in self.params.get('compat_opts', [])
and info_dict.get('protocol') is not None # All requested formats have same protocol
and not self.params.get('allow_unplayable_formats')
and get_suitable_downloader(info_dict, self.params).__name__ == 'FFmpegFD')
if directly_mergable:
info_dict['url'] = requested_formats[0]['url']
# Treat it as a single download
dl_filename = existing_file(full_filename, temp_filename)
if dl_filename is None:
success, real_download = self.dl(temp_filename, info_dict)
info_dict['__real_download'] = real_download
directly_mergable = FFmpegFD.can_merge_formats(info_dict)
if dl_filename is not None:
pass
elif (directly_mergable and get_suitable_downloader(
info_dict, self.params, to_stdout=(temp_filename == '-')) == FFmpegFD):
info_dict['url'] = '\n'.join(f['url'] for f in requested_formats)
success, real_download = self.dl(temp_filename, info_dict)
info_dict['__real_download'] = real_download
else:
downloaded = []
merger = FFmpegMergerPP(self)
@@ -2690,28 +2737,36 @@ class YoutubeDL(object):
'You have requested merging of multiple formats but ffmpeg is not installed. '
'The formats won\'t be merged.')
if dl_filename is None:
for f in requested_formats:
new_info = dict(info_dict)
del new_info['requested_formats']
new_info.update(f)
if temp_filename == '-':
reason = ('using a downloader other than ffmpeg' if directly_mergable
else 'but the formats are incompatible for simultaneous download' if merger.available
else 'but ffmpeg is not installed')
self.report_warning(
f'You have requested downloading multiple formats to stdout {reason}. '
'The formats will be streamed one after the other')
fname = temp_filename
for f in requested_formats:
new_info = dict(info_dict)
del new_info['requested_formats']
new_info.update(f)
if temp_filename != '-':
fname = prepend_extension(
self.prepare_filename(new_info, 'temp'),
correct_ext(temp_filename, new_info['ext']),
'f%s' % f['format_id'], new_info['ext'])
if not self._ensure_dir_exists(fname):
return
downloaded.append(fname)
partial_success, real_download = self.dl(fname, new_info)
info_dict['__real_download'] = info_dict['__real_download'] or real_download
success = success and partial_success
if merger.available and not self.params.get('allow_unplayable_formats'):
info_dict['__postprocessors'].append(merger)
info_dict['__files_to_merge'] = downloaded
# Even if there were no downloads, it is being merged only now
info_dict['__real_download'] = True
else:
for file in downloaded:
files_to_move[file] = None
partial_success, real_download = self.dl(fname, new_info)
info_dict['__real_download'] = info_dict['__real_download'] or real_download
success = success and partial_success
if merger.available and not self.params.get('allow_unplayable_formats'):
info_dict['__postprocessors'].append(merger)
info_dict['__files_to_merge'] = downloaded
# Even if there were no downloads, it is being merged only now
info_dict['__real_download'] = True
else:
for file in downloaded:
files_to_move[file] = None
else:
# Just a single file
dl_filename = existing_file(full_filename, temp_filename)
@@ -2826,7 +2881,7 @@ class YoutubeDL(object):
else:
if self.params.get('dump_single_json', False):
self.post_extract(res)
self.to_stdout(json.dumps(res, default=repr))
self.to_stdout(json.dumps(self.sanitize_info(res)))
return self._download_retcode
@@ -2835,7 +2890,7 @@ class YoutubeDL(object):
[info_filename], mode='r',
openhook=fileinput.hook_encoded('utf-8'))) as f:
# FileInput doesn't have a read method, we can't call json.load
info = self.filter_requested_info(json.loads('\n'.join(f)), self.params.get('clean_infojson', True))
info = self.sanitize_info(json.loads('\n'.join(f)), self.params.get('clean_infojson', True))
try:
self.process_ie_result(info, download=True)
except (DownloadError, EntryNotInPlaylist, ThrottledDownload):
@@ -2848,16 +2903,20 @@ class YoutubeDL(object):
return self._download_retcode
@staticmethod
def filter_requested_info(info_dict, actually_filter=True):
remove_keys = ['__original_infodict'] # Always remove this since this may contain a copy of the entire dict
def sanitize_info(info_dict, remove_private_keys=False):
''' Sanitize the infodict for converting to json '''
info_dict.setdefault('epoch', int(time.time()))
remove_keys = {'__original_infodict'} # Always remove this since this may contain a copy of the entire dict
keep_keys = ['_type'], # Always keep this to facilitate load-info-json
if actually_filter:
remove_keys += ('requested_formats', 'requested_subtitles', 'requested_entries', 'filepath', 'entries', 'original_url')
if remove_private_keys:
remove_keys |= {
'requested_formats', 'requested_subtitles', 'requested_entries',
'filepath', 'entries', 'original_url', 'playlist_autonumber',
}
empty_values = (None, {}, [], set(), tuple())
reject = lambda k, v: k not in keep_keys and (
k.startswith('_') or k in remove_keys or v in empty_values)
else:
info_dict['epoch'] = int(time.time())
reject = lambda k, v: k in remove_keys
filter_fn = lambda obj: (
list(map(filter_fn, obj)) if isinstance(obj, (LazyList, list, tuple, set))
@@ -2865,6 +2924,11 @@ class YoutubeDL(object):
else dict((k, filter_fn(v)) for k, v in obj.items() if not reject(k, v)))
return filter_fn(info_dict)
@staticmethod
def filter_requested_info(info_dict, actually_filter=True):
''' Alias of sanitize_info for backward compatibility '''
return YoutubeDL.sanitize_info(info_dict, actually_filter)
def run_pp(self, pp, infodict):
files_to_delete = []
if '__files_to_move' not in infodict:
@@ -3069,7 +3133,7 @@ class YoutubeDL(object):
format_field(f, 'language', '[%s]'),
format_field(f, 'format_note'),
format_field(f, 'container', ignore=(None, f.get('ext'))),
format_field(f, 'asr', '%5dHz')))),
))),
] for f in formats if f.get('preference') is None or f['preference'] >= -1000]
header_line = ['ID', 'EXT', 'RESOLUTION', 'FPS', '|', ' FILESIZE', ' TBR', 'PROTO',
'|', 'VCODEC', ' VBR', 'ACODEC', ' ABR', ' ASR', 'MORE INFO']
@@ -3129,11 +3193,6 @@ class YoutubeDL(object):
if not self.params.get('verbose'):
return
if type('') is not compat_str:
# Python 2.6 on SLES11 SP1 (https://github.com/ytdl-org/youtube-dl/issues/3326)
self.report_warning(
'Your Python is broken! Update to a newer and supported version')
stdout_encoding = getattr(
sys.stdout, 'encoding', 'missing (%s)' % type(sys.stdout).__name__)
encoding_str = (
@@ -3189,14 +3248,24 @@ class YoutubeDL(object):
exe_versions['rtmpdump'] = rtmpdump_version()
exe_versions['phantomjs'] = PhantomJSwrapper._version()
exe_str = ', '.join(
'%s %s' % (exe, v)
for exe, v in sorted(exe_versions.items())
if v
)
if not exe_str:
exe_str = 'none'
f'{exe} {v}' for exe, v in sorted(exe_versions.items()) if v
) or 'none'
self._write_string('[debug] exe versions: %s\n' % exe_str)
from .downloader.fragment import can_decrypt_frag
from .downloader.websocket import has_websockets
from .postprocessor.embedthumbnail import has_mutagen
from .cookies import SQLITE_AVAILABLE, KEYRING_AVAILABLE
lib_str = ', '.join(sorted(filter(None, (
can_decrypt_frag and 'pycryptodome',
has_websockets and 'websockets',
has_mutagen and 'mutagen',
SQLITE_AVAILABLE and 'sqlite',
KEYRING_AVAILABLE and 'keyring',
)))) or 'none'
self._write_string('[debug] Optional libraries: %s\n' % lib_str)
proxy_map = {}
for handler in self._opener.handlers:
if hasattr(handler, 'proxies'):

View File

@@ -7,6 +7,7 @@ __license__ = 'Public Domain'
import codecs
import io
import itertools
import os
import random
import re
@@ -18,6 +19,7 @@ from .options import (
)
from .compat import (
compat_getpass,
compat_shlex_quote,
workaround_optparse_bug9161,
)
from .cookies import SUPPORTED_BROWSERS
@@ -46,14 +48,15 @@ from .downloader import (
from .extractor import gen_extractors, list_extractors
from .extractor.common import InfoExtractor
from .extractor.adobepass import MSO_INFO
from .postprocessor.ffmpeg import (
from .postprocessor import (
FFmpegExtractAudioPP,
FFmpegSubtitlesConvertorPP,
FFmpegThumbnailsConvertorPP,
FFmpegVideoConvertorPP,
FFmpegVideoRemuxerPP,
MetadataFromFieldPP,
MetadataParserPP,
)
from .postprocessor.metadatafromfield import MetadataFromFieldPP
from .YoutubeDL import YoutubeDL
@@ -280,7 +283,7 @@ def _real_main(argv=None):
'filename', 'format-sort', 'abort-on-error', 'format-spec', 'no-playlist-metafiles',
'multistreams', 'no-live-chat', 'playlist-index', 'list-formats', 'no-direct-merge',
'no-youtube-channel-redirect', 'no-youtube-unavailable-videos', 'no-attach-info-json',
'embed-thumbnail-atomicparsley', 'seperate-video-versions',
'embed-thumbnail-atomicparsley', 'seperate-video-versions', 'no-clean-infojson', 'no-keep-subs',
]
compat_opts = parse_compat_opts()
@@ -291,7 +294,7 @@ def _real_main(argv=None):
compat_opts.update(['*%s' % name])
return True
def set_default_compat(compat_name, opt_name, default=True, remove_compat=False):
def set_default_compat(compat_name, opt_name, default=True, remove_compat=True):
attr = getattr(opts, opt_name)
if compat_name in compat_opts:
if attr is None:
@@ -307,6 +310,7 @@ def _real_main(argv=None):
set_default_compat('abort-on-error', 'ignoreerrors')
set_default_compat('no-playlist-metafiles', 'allow_playlist_files')
set_default_compat('no-clean-infojson', 'clean_infojson')
if 'format-sort' in compat_opts:
opts.format_sort.extend(InfoExtractor.FormatSort.ytdl_default)
_video_multistreams_set = set_default_compat('multistreams', 'allow_multiple_video_streams', False, remove_compat=False)
@@ -316,7 +320,7 @@ def _real_main(argv=None):
outtmpl_default = opts.outtmpl.get('default')
if 'filename' in compat_opts:
if outtmpl_default is None:
outtmpl_default = '%(title)s.%(id)s.%(ext)s'
outtmpl_default = '%(title)s-%(id)s.%(ext)s'
opts.outtmpl.update({'default': outtmpl_default})
else:
_unused_compat_opt('filename')
@@ -328,7 +332,8 @@ def _real_main(argv=None):
for k, tmpl in opts.outtmpl.items():
validate_outtmpl(tmpl, '%s output template' % k)
for tmpl in opts.forceprint:
opts.forceprint = opts.forceprint or []
for tmpl in opts.forceprint or []:
validate_outtmpl(tmpl, 'print template')
if opts.extractaudio and not opts.keepvideo and opts.format is None:
@@ -343,13 +348,29 @@ def _real_main(argv=None):
if re.match(InfoExtractor.FormatSort.regex, f) is None:
parser.error('invalid format sort string "%s" specified' % f)
if opts.metafromfield is None:
opts.metafromfield = []
def metadataparser_actions(f):
if isinstance(f, str):
cmd = '--parse-metadata %s' % compat_shlex_quote(f)
try:
actions = [MetadataFromFieldPP.to_action(f)]
except Exception as err:
parser.error(f'{cmd} is invalid; {err}')
else:
cmd = '--replace-in-metadata %s' % ' '.join(map(compat_shlex_quote, f))
actions = ((MetadataParserPP.Actions.REPLACE, x, *f[1:]) for x in f[0].split(','))
for action in actions:
try:
MetadataParserPP.validate_action(*action)
except Exception as err:
parser.error(f'{cmd} is invalid; {err}')
yield action
if opts.parse_metadata is None:
opts.parse_metadata = []
if opts.metafromtitle is not None:
opts.metafromfield.append('title:%s' % opts.metafromtitle)
for f in opts.metafromfield:
if re.match(MetadataFromFieldPP.regex, f) is None:
parser.error('invalid format string "%s" specified for --parse-metadata' % f)
opts.parse_metadata.append('title:%s' % opts.metafromtitle)
opts.parse_metadata = list(itertools.chain(*map(metadataparser_actions, opts.parse_metadata)))
any_getting = opts.forceprint or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson or opts.dump_single_json
any_printing = opts.print_json
@@ -401,10 +422,10 @@ def _real_main(argv=None):
# PostProcessors
postprocessors = []
if opts.metafromfield:
if opts.parse_metadata:
postprocessors.append({
'key': 'MetadataFromField',
'formats': opts.metafromfield,
'key': 'MetadataParser',
'actions': opts.parse_metadata,
# Run this immediately after extraction is complete
'when': 'pre_process'
})
@@ -425,7 +446,7 @@ def _real_main(argv=None):
# Must be after all other before_dl
if opts.exec_before_dl_cmd:
postprocessors.append({
'key': 'ExecAfterDownload',
'key': 'Exec',
'exec_cmd': opts.exec_before_dl_cmd,
'when': 'before_dl'
})
@@ -457,13 +478,13 @@ def _real_main(argv=None):
if opts.addmetadata:
postprocessors.append({'key': 'FFmpegMetadata'})
if opts.embedsubtitles:
already_have_subtitle = opts.writesubtitles
already_have_subtitle = opts.writesubtitles and 'no-keep-subs' not in compat_opts
postprocessors.append({
'key': 'FFmpegEmbedSubtitle',
# already_have_subtitle = True prevents the file from being deleted after embedding
'already_have_subtitle': already_have_subtitle
})
if not already_have_subtitle:
if not opts.writeautomaticsub and 'no-keep-subs' not in compat_opts:
opts.writesubtitles = True
# --all-sub automatically sets --write-sub if --write-auto-sub is not given
# this was the old behaviour if only --all-sub was given.
@@ -496,10 +517,10 @@ def _real_main(argv=None):
# XAttrMetadataPP should be run after post-processors that may change file contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# ExecAfterDownload must be the last PP
# Exec must be the last PP
if opts.exec_cmd:
postprocessors.append({
'key': 'ExecAfterDownload',
'key': 'Exec',
'exec_cmd': opts.exec_cmd,
# Run this only after the files have been moved to their final locations
'when': 'after_move'
@@ -549,7 +570,7 @@ def _real_main(argv=None):
'forcejson': opts.dumpjson or opts.print_json,
'dump_single_json': opts.dump_single_json,
'force_write_download_archive': opts.force_write_download_archive,
'simulate': opts.simulate or any_getting,
'simulate': (any_getting or None) if opts.simulate is None else opts.simulate,
'skip_download': opts.skip_download,
'format': opts.format,
'allow_unplayable_formats': opts.allow_unplayable_formats,
@@ -733,6 +754,11 @@ def main(argv=None):
sys.exit('ERROR: fixed output name but more than one file to download')
except KeyboardInterrupt:
sys.exit('\nERROR: Interrupted by user')
except BrokenPipeError:
# https://docs.python.org/3/library/signal.html#note-on-sigpipe
devnull = os.open(os.devnull, os.O_WRONLY)
os.dup2(devnull, sys.stdout.fileno())
sys.exit(r'\nERROR: {err}')
__all__ = ['main', 'YoutubeDL', 'gen_extractors', 'list_extractors']

View File

@@ -3,17 +3,20 @@ from __future__ import unicode_literals
from ..compat import compat_str
from ..utils import (
determine_protocol,
NO_DEFAULT
)
def _get_real_downloader(info_dict, protocol=None, *args, **kwargs):
def get_suitable_downloader(info_dict, params={}, default=NO_DEFAULT, protocol=None, to_stdout=False):
info_dict['protocol'] = determine_protocol(info_dict)
info_copy = info_dict.copy()
if protocol:
info_copy['protocol'] = protocol
return get_suitable_downloader(info_copy, *args, **kwargs)
info_copy['to_stdout'] = to_stdout
return _get_suitable_downloader(info_copy, params, default)
# Some of these require _get_real_downloader
# Some of these require get_suitable_downloader
from .common import FileDownloader
from .dash import DashSegmentsFD
from .f4m import F4mFD
@@ -69,22 +72,24 @@ def shorten_protocol_name(proto, simplify=False):
return short_protocol_names.get(proto, proto)
def get_suitable_downloader(info_dict, params={}, default=HttpFD):
def _get_suitable_downloader(info_dict, params, default):
"""Get the downloader class that can handle the info dict."""
protocol = determine_protocol(info_dict)
info_dict['protocol'] = protocol
if default is NO_DEFAULT:
default = HttpFD
# if (info_dict.get('start_time') or info_dict.get('end_time')) and not info_dict.get('requested_formats') and FFmpegFD.can_download(info_dict):
# return FFmpegFD
protocol = info_dict['protocol']
downloaders = params.get('external_downloader')
external_downloader = (
downloaders if isinstance(downloaders, compat_str) or downloaders is None
else downloaders.get(shorten_protocol_name(protocol, True), downloaders.get('default')))
if external_downloader and external_downloader.lower() == 'native':
external_downloader = 'native'
if external_downloader not in (None, 'native'):
if external_downloader is None:
if info_dict['to_stdout'] and FFmpegFD.can_merge_formats(info_dict, params):
return FFmpegFD
elif external_downloader.lower() != 'native':
ed = get_external_downloader(external_downloader)
if ed.can_download(info_dict, external_downloader):
return ed
@@ -92,9 +97,10 @@ def get_suitable_downloader(info_dict, params={}, default=HttpFD):
if protocol in ('m3u8', 'm3u8_native'):
if info_dict.get('is_live'):
return FFmpegFD
elif external_downloader == 'native':
elif (external_downloader or '').lower() == 'native':
return HlsFD
elif _get_real_downloader(info_dict, 'm3u8_frag_urls', params, None):
elif get_suitable_downloader(
info_dict, params, None, protocol='m3u8_frag_urls', to_stdout=info_dict['to_stdout']):
return HlsFD
elif params.get('hls_prefer_native') is True:
return HlsFD

View File

@@ -47,8 +47,11 @@ class FileDownloader(object):
min_filesize: Skip files smaller than this size
max_filesize: Skip files larger than this size
xattr_set_filesize: Set ytdl.filesize user xattribute with expected size.
external_downloader_args: A list of additional command-line arguments for the
external downloader.
external_downloader_args: A dictionary of downloader keys (in lower case)
and a list of additional command-line arguments for the
executable. Use 'default' as the name for arguments to be
passed to all downloaders. For compatibility with youtube-dl,
a single list of args can also be used
hls_use_mpegts: Use the mpegts container for HLS videos.
http_chunk_size: Size of a chunk for chunk-based HTTP downloading. May be
useful for bypassing bandwidth throttling imposed by
@@ -320,12 +323,9 @@ class FileDownloader(object):
'[download] Got server HTTP error: %s. Retrying (attempt %d of %s) ...'
% (error_to_compat_str(err), count, self.format_retries(retries)))
def report_file_already_downloaded(self, file_name):
def report_file_already_downloaded(self, *args, **kwargs):
"""Report file has already been fully downloaded."""
try:
self.to_screen('[download] %s has already been downloaded' % file_name)
except UnicodeEncodeError:
self.to_screen('[download] The file has already been downloaded')
return self.ydl.report_file_already_downloaded(*args, **kwargs)
def report_unable_to_resume(self):
"""Report it was impossible to resume download."""
@@ -343,7 +343,7 @@ class FileDownloader(object):
"""
nooverwrites_and_exists = (
not self.params.get('overwrites', subtitle)
not self.params.get('overwrites', True)
and os.path.exists(encodeFilename(filename))
)

View File

@@ -1,6 +1,6 @@
from __future__ import unicode_literals
from ..downloader import _get_real_downloader
from ..downloader import get_suitable_downloader
from .fragment import FragmentFD
from ..utils import urljoin
@@ -15,11 +15,15 @@ class DashSegmentsFD(FragmentFD):
FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict):
if info_dict.get('is_live'):
self.report_error('Live DASH videos are not supported')
fragment_base_url = info_dict.get('fragment_base_url')
fragments = info_dict['fragments'][:1] if self.params.get(
'test', False) else info_dict['fragments']
real_downloader = _get_real_downloader(info_dict, 'dash_frag_urls', self.params, None)
real_downloader = get_suitable_downloader(
info_dict, self.params, None, protocol='dash_frag_urls', to_stdout=(filename == '-'))
ctx = {
'filename': filename,
@@ -54,9 +58,6 @@ class DashSegmentsFD(FragmentFD):
info_copy = info_dict.copy()
info_copy['fragments'] = fragments_to_download
fd = real_downloader(self.ydl, self.params)
# TODO: Make progress updates work without hooking twice
# for ph in self._progress_hooks:
# fd.add_progress_hook(ph)
return fd.real_download(filename, info_copy)
return self.download_and_append_fragments(ctx, fragments_to_download, info_dict)

View File

@@ -36,6 +36,7 @@ from ..utils import (
class ExternalFD(FileDownloader):
SUPPORTED_PROTOCOLS = ('http', 'https', 'ftp', 'ftps')
can_download_to_stdout = False
def real_download(self, filename, info_dict):
self.report_destination(filename)
@@ -93,7 +94,9 @@ class ExternalFD(FileDownloader):
@classmethod
def supports(cls, info_dict):
return info_dict['protocol'] in cls.SUPPORTED_PROTOCOLS
return (
(cls.can_download_to_stdout or not info_dict.get('to_stdout'))
and info_dict['protocol'] in cls.SUPPORTED_PROTOCOLS)
@classmethod
def can_download(cls, info_dict, path=None):
@@ -341,16 +344,27 @@ class HttpieFD(ExternalFD):
class FFmpegFD(ExternalFD):
SUPPORTED_PROTOCOLS = ('http', 'https', 'ftp', 'ftps', 'm3u8', 'm3u8_native', 'rtsp', 'rtmp', 'rtmp_ffmpeg', 'mms')
can_download_to_stdout = True
@classmethod
def available(cls, path=None):
# TODO: Fix path for ffmpeg
# Fixme: This may be wrong when --ffmpeg-location is used
return FFmpegPostProcessor().available
def on_process_started(self, proc, stdin):
""" Override this in subclasses """
pass
@classmethod
def can_merge_formats(cls, info_dict, params={}):
return (
info_dict.get('requested_formats')
and info_dict.get('protocol')
and not params.get('allow_unplayable_formats')
and 'no-direct-merge' not in params.get('compat_opts', [])
and cls.can_download(info_dict))
def _call_downloader(self, tmpfilename, info_dict):
urls = [f['url'] for f in info_dict.get('requested_formats', [])] or [info_dict['url']]
ffpp = FFmpegPostProcessor(downloader=self)
@@ -368,6 +382,9 @@ class FFmpegFD(ExternalFD):
if not self.params.get('verbose'):
args += ['-hide_banner']
args += info_dict.get('_ffmpeg_args', [])
# This option exists only for compatibility. Extractors should use `_ffmpeg_args` instead
seekable = info_dict.get('_seekable')
if seekable is not None:
# setting -seekable prevents ffmpeg from guessing if the server
@@ -456,6 +473,7 @@ class FFmpegFD(ExternalFD):
if self.params.get('test', False):
args += ['-fs', compat_str(self._TEST_FILE_SIZE)]
ext = info_dict['ext']
if protocol in ('m3u8', 'm3u8_native'):
use_mpegts = (tmpfilename == '-') or self.params.get('hls_use_mpegts')
if use_mpegts is None:
@@ -468,8 +486,10 @@ class FFmpegFD(ExternalFD):
args += ['-bsf:a', 'aac_adtstoasc']
elif protocol == 'rtmp':
args += ['-f', 'flv']
elif ext == 'mp4' and tmpfilename == '-':
args += ['-f', 'mpegts']
else:
args += ['-f', EXT_TO_OUT_FORMATS.get(info_dict['ext'], info_dict['ext'])]
args += ['-f', EXT_TO_OUT_FORMATS.get(ext, ext)]
args = [encodeArgument(opt) for opt in args]
args.append(encodeFilename(ffpp._ffmpeg_filename_argument(tmpfilename), True))

View File

@@ -105,17 +105,19 @@ class FragmentFD(FileDownloader):
def _write_ytdl_file(self, ctx):
frag_index_stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'w')
downloader = {
'current_fragment': {
'index': ctx['fragment_index'],
},
}
if 'extra_state' in ctx:
downloader['extra_state'] = ctx['extra_state']
if ctx.get('fragment_count') is not None:
downloader['fragment_count'] = ctx['fragment_count']
frag_index_stream.write(json.dumps({'downloader': downloader}))
frag_index_stream.close()
try:
downloader = {
'current_fragment': {
'index': ctx['fragment_index'],
},
}
if 'extra_state' in ctx:
downloader['extra_state'] = ctx['extra_state']
if ctx.get('fragment_count') is not None:
downloader['fragment_count'] = ctx['fragment_count']
frag_index_stream.write(json.dumps({'downloader': downloader}))
finally:
frag_index_stream.close()
def _download_fragment(self, ctx, frag_url, info_dict, headers=None, request_data=None):
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], ctx['fragment_index'])
@@ -327,7 +329,7 @@ class FragmentFD(FileDownloader):
'fragment_index': 0,
})
def download_and_append_fragments(self, ctx, fragments, info_dict, pack_func=None):
def download_and_append_fragments(self, ctx, fragments, info_dict, *, pack_func=None, finish_func=None):
fragment_retries = self.params.get('fragment_retries', 0)
is_fatal = (lambda idx: idx == 0) if self.params.get('skip_unavailable_fragments', True) else (lambda _: True)
if not pack_func:
@@ -422,5 +424,8 @@ class FragmentFD(FileDownloader):
if not result:
return False
if finish_func is not None:
ctx['dest_stream'].write(finish_func())
ctx['dest_stream'].flush()
self._finish_frag_download(ctx, info_dict)
return True

View File

@@ -4,7 +4,7 @@ import re
import io
import binascii
from ..downloader import _get_real_downloader
from ..downloader import get_suitable_downloader
from .fragment import FragmentFD, can_decrypt_frag
from .external import FFmpegFD
@@ -80,16 +80,14 @@ class HlsFD(FragmentFD):
fd = FFmpegFD(self.ydl, self.params)
self.report_warning(
'%s detected unsupported features; extraction will be delegated to %s' % (self.FD_NAME, fd.get_basename()))
# TODO: Make progress updates work without hooking twice
# for ph in self._progress_hooks:
# fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict)
is_webvtt = info_dict['ext'] == 'vtt'
if is_webvtt:
real_downloader = None # Packing the fragments is not currently supported for external downloader
else:
real_downloader = _get_real_downloader(info_dict, 'm3u8_frag_urls', self.params, None)
real_downloader = get_suitable_downloader(
info_dict, self.params, None, protocol='m3u8_frag_urls', to_stdout=(filename == '-'))
if real_downloader and not real_downloader.supports_manifest(s):
real_downloader = None
if real_downloader:
@@ -262,29 +260,35 @@ class HlsFD(FragmentFD):
block.end += adjust
dedup_window = extra_state.setdefault('webvtt_dedup_window', [])
cue = block.as_json
# skip the cue if an identical one appears
# in the window of potential duplicates
# and prune the window of unviable candidates
ready = []
i = 0
skip = True
is_new = True
while i < len(dedup_window):
window_cue = dedup_window[i]
if window_cue == cue:
break
if window_cue['end'] >= cue['start']:
i += 1
wcue = dedup_window[i]
wblock = webvtt.CueBlock.from_json(wcue)
i += 1
if wblock.hinges(block):
wcue['end'] = block.end
is_new = False
continue
if wblock == block:
is_new = False
continue
if wblock.end > block.start:
continue
ready.append(wblock)
i -= 1
del dedup_window[i]
else:
skip = False
if skip:
continue
if is_new:
dedup_window.append(block.as_json)
for block in ready:
block.write_into(output)
# add the cue to the window
dedup_window.append(cue)
# we only emit cues once they fall out of the duplicate window
continue
elif isinstance(block, webvtt.Magic):
# take care of MPEG PES timestamp overflow
if block.mpegts is None:
@@ -319,6 +323,19 @@ class HlsFD(FragmentFD):
block.write_into(output)
return output.getvalue().encode('utf-8')
def fin_fragments():
dedup_window = extra_state.get('webvtt_dedup_window')
if not dedup_window:
return b''
output = io.StringIO()
for cue in dedup_window:
webvtt.CueBlock.from_json(cue).write_into(output)
return output.getvalue().encode('utf-8')
self.download_and_append_fragments(
ctx, fragments, info_dict, pack_func=pack_fragment, finish_func=fin_fragments)
else:
pack_fragment = None
return self.download_and_append_fragments(ctx, fragments, info_dict, pack_fragment)
return self.download_and_append_fragments(ctx, fragments, info_dict)

View File

@@ -4,7 +4,7 @@ from __future__ import unicode_literals
import threading
from .common import FileDownloader
from ..downloader import _get_real_downloader
from ..downloader import get_suitable_downloader
from ..extractor.niconico import NiconicoIE
from ..compat import compat_urllib_request
@@ -20,7 +20,7 @@ class NiconicoDmcFD(FileDownloader):
ie = NiconicoIE(self.ydl)
info_dict, heartbeat_info_dict = ie._get_heartbeat_info(info_dict)
fd = _get_real_downloader(info_dict, params=self.params)(self.ydl, self.params)
fd = get_suitable_downloader(info_dict, params=self.params)(self.ydl, self.params)
success = download_complete = False
timer = [None]

View File

@@ -76,6 +76,11 @@ MSO_INFO = {
'username_field': 'IDToken1',
'password_field': 'IDToken2',
},
'Cablevision': {
'name': 'Optimum/Cablevision',
'username_field': 'j_username',
'password_field': 'j_password',
},
'thr030': {
'name': '3 Rivers Communications'
},
@@ -1330,6 +1335,11 @@ MSO_INFO = {
'cou060': {
'name': 'Zito Media'
},
'slingtv': {
'name': 'Sling TV',
'username_field': 'username',
'password_field': 'password',
},
}
@@ -1565,6 +1575,40 @@ class AdobePassIE(InfoExtractor):
}), headers={
'Content-Type': 'application/x-www-form-urlencoded'
})
elif mso_id == 'slingtv':
# SlingTV has a meta-refresh based authentication, but also
# looks at the tab history to count the number of times the
# browser has been on a page
first_bookend_page, urlh = provider_redirect_page_res
hidden_data = self._hidden_inputs(first_bookend_page)
hidden_data['history'] = 1
provider_login_page_res = self._download_webpage_handle(
urlh.geturl(), video_id, 'Sending first bookend',
query=hidden_data)
provider_association_redirect, urlh = post_form(
provider_login_page_res, 'Logging in', {
mso_info['username_field']: username,
mso_info['password_field']: password
})
provider_refresh_redirect_url = extract_redirect_url(
provider_association_redirect, url=urlh.geturl())
last_bookend_page, urlh = self._download_webpage_handle(
provider_refresh_redirect_url, video_id,
'Downloading Auth Association Redirect Page')
hidden_data = self._hidden_inputs(last_bookend_page)
hidden_data['history'] = 3
mvpd_confirm_page_res = self._download_webpage_handle(
urlh.geturl(), video_id, 'Sending final bookend',
query=hidden_data)
post_form(mvpd_confirm_page_res, 'Confirming Login')
else:
# Some providers (e.g. DIRECTV NOW) have another meta refresh
# based redirect that should be followed.
@@ -1577,10 +1621,13 @@ class AdobePassIE(InfoExtractor):
'Downloading Provider Redirect Page (meta refresh)')
provider_login_page_res = post_form(
provider_redirect_page_res, self._DOWNLOADING_LOGIN_PAGE)
mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', {
form_data = {
mso_info.get('username_field', 'username'): username,
mso_info.get('password_field', 'password'): password,
})
mso_info.get('password_field', 'password'): password
}
if mso_id == 'Cablevision':
form_data['_eventId_proceed'] = ''
mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', form_data)
if mso_id != 'Rogers':
post_form(mvpd_confirm_page_res, 'Confirming Login')

View File

@@ -20,8 +20,8 @@ class AENetworksBaseIE(ThePlatformIE):
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/'''
_THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t'
_THEPLATFORM_KEY = '43jXaGRQud'
_THEPLATFORM_SECRET = 'S10BPXHMlb'
_DOMAIN_MAP = {
'history.com': ('HISTORY', 'history'),
'aetv.com': ('AETV', 'aetv'),

View File

@@ -212,7 +212,7 @@ class BandcampIE(InfoExtractor):
class BandcampAlbumIE(BandcampIE):
IE_NAME = 'Bandcamp:album'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<id>[^/?#&]+))?'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?!/music)(?:/album/(?P<id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@@ -389,3 +389,43 @@ class BandcampWeeklyIE(BandcampIE):
'episode_id': show_id,
'formats': formats
}
class BandcampMusicIE(InfoExtractor):
_VALID_URL = r'https?://(?P<id>[^/]+)\.bandcamp\.com/music'
_TESTS = [{
'url': 'https://steviasphere.bandcamp.com/music',
'playlist_mincount': 47,
'info_dict': {
'id': 'steviasphere',
},
}, {
'url': 'https://coldworldofficial.bandcamp.com/music',
'playlist_mincount': 10,
'info_dict': {
'id': 'coldworldofficial',
},
}, {
'url': 'https://nuclearwarnowproductions.bandcamp.com/music',
'playlist_mincount': 399,
'info_dict': {
'id': 'nuclearwarnowproductions',
},
}
]
_TYPE_IE_DICT = {
'album': BandcampAlbumIE.ie_key(),
'track': BandcampIE.ie_key()
}
def _real_extract(self, url):
id = self._match_id(url)
webpage = self._download_webpage(url, id)
items = re.findall(r'href\=\"\/(?P<path>(?P<type>album|track)+/[^\"]+)', webpage)
entries = [
self.url_result(
f'https://{id}.bandcamp.com/{item[0]}',
ie=self._TYPE_IE_DICT[item[1]])
for item in items]
return self.playlist_result(entries, id)

View File

@@ -37,7 +37,7 @@ class BiliBiliIE(InfoExtractor):
video/[aA][vV]|
anime/(?P<anime_id>\d+)/play\#
)(?P<id>\d+)|
video/[bB][vV](?P<id_bv>[^/?#&]+)
(s/)?video/[bB][vV](?P<id_bv>[^/?#&]+)
)
(?:/?\?p=(?P<page>\d+))?
'''

View File

@@ -0,0 +1,68 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_iso8601
class BlackboardCollaborateIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?P<region>[a-z-]+)\.bbcollab\.com/
(?:
collab/ui/session/playback/load|
recording
)/
(?P<id>[^/]+)'''
_TESTS = [
{
'url': 'https://us-lti.bbcollab.com/collab/ui/session/playback/load/0a633b6a88824deb8c918f470b22b256',
'md5': 'bb7a055682ee4f25fdb5838cdf014541',
'info_dict': {
'id': '0a633b6a88824deb8c918f470b22b256',
'title': 'HESI A2 Information Session - Thursday, May 6, 2021 - recording_1',
'ext': 'mp4',
'duration': 1896000,
'timestamp': 1620331399,
'upload_date': '20210506',
},
},
{
'url': 'https://us.bbcollab.com/collab/ui/session/playback/load/76761522adfe4345a0dee6794bbcabda',
'only_matching': True,
},
{
'url': 'https://ca.bbcollab.com/collab/ui/session/playback/load/b6399dcb44df4f21b29ebe581e22479d',
'only_matching': True,
},
{
'url': 'https://eu.bbcollab.com/recording/51ed7b50810c4444a106e48cefb3e6b5',
'only_matching': True,
},
{
'url': 'https://au.bbcollab.com/collab/ui/session/playback/load/2bccf7165d7c419ab87afc1ec3f3bb15',
'only_matching': True,
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
region = mobj.group('region')
video_id = mobj.group('id')
info = self._download_json(
'https://{}.bbcollab.com/collab/api/csa/recordings/{}/data'.format(region, video_id), video_id)
duration = info.get('duration')
title = info['name']
upload_date = info.get('created')
streams = info['streams']
formats = [{'format_id': k, 'url': url} for k, url in streams.items()]
return {
'duration': duration,
'formats': formats,
'id': video_id,
'timestamp': parse_iso8601(upload_date),
'title': title,
}

View File

@@ -1,7 +1,6 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from .gigya import GigyaBaseIE
@@ -17,6 +16,7 @@ from ..utils import (
str_or_none,
strip_or_none,
url_or_none,
urlencode_postdata
)
@@ -265,7 +265,7 @@ class VrtNUIE(GigyaBaseIE):
'expected_warnings': ['Unable to download asset JSON', 'is not a supported codec', 'Unknown MIME type'],
}]
_NETRC_MACHINE = 'vrtnu'
_APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'
_APIKEY = '3_qhEcPa5JGFROVwu5SWKqJ4mVOIkwlFNMSKwzPDAh8QZOtHqu6L4nD5Q7lk0eXOOG'
_CONTEXT_ID = 'R3595707040'
def _real_initialize(self):
@@ -276,35 +276,38 @@ class VrtNUIE(GigyaBaseIE):
if username is None:
return
auth_data = {
'APIKey': self._APIKEY,
'targetEnv': 'jssdk',
'loginID': username,
'password': password,
'authMode': 'cookie',
}
auth_info = self._gigya_login(auth_data)
auth_info = self._download_json(
'https://accounts.vrt.be/accounts.login', None,
note='Login data', errnote='Could not get Login data',
headers={}, data=urlencode_postdata({
'loginID': username,
'password': password,
'sessionExpiration': '-2',
'APIKey': self._APIKEY,
'targetEnv': 'jssdk',
}))
# Sometimes authentication fails for no good reason, retry
login_attempt = 1
while login_attempt <= 3:
try:
# When requesting a token, no actual token is returned, but the
# necessary cookies are set.
self._request_webpage('https://token.vrt.be/vrtnuinitlogin',
None, note='Requesting XSRF Token', errnote='Could not get XSRF Token',
query={'provider': 'site', 'destination': 'https://www.vrt.be/vrtnu/'})
post_data = {
'UID': auth_info['UID'],
'UIDSignature': auth_info['UIDSignature'],
'signatureTimestamp': auth_info['signatureTimestamp'],
'client_id': 'vrtnu-site',
'_csrf': self._get_cookies('https://login.vrt.be').get('OIDCXSRF').value,
}
self._request_webpage(
'https://token.vrt.be',
'https://login.vrt.be/perform_login',
None, note='Requesting a token', errnote='Could not get a token',
headers={
'Content-Type': 'application/json',
'Referer': 'https://www.vrt.be/vrtnu/',
},
data=json.dumps({
'uid': auth_info['UID'],
'uidsig': auth_info['UIDSignature'],
'ts': auth_info['signatureTimestamp'],
'email': auth_info['profile']['email'],
}).encode('utf-8'))
headers={}, data=urlencode_postdata(post_data))
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
login_attempt += 1

View File

@@ -8,6 +8,7 @@ from ..utils import (
xpath_element,
xpath_text,
update_url_query,
url_or_none,
)
@@ -25,16 +26,62 @@ class CBSBaseIE(ThePlatformFeedIE):
})
return subtitles
def _extract_common_video_info(self, content_id, asset_types, mpx_acc, extra_info):
tp_path = 'dJ5BDC/media/guid/%d/%s' % (mpx_acc, content_id)
tp_release_url = f'https://link.theplatform.com/s/{tp_path}'
info = self._extract_theplatform_metadata(tp_path, content_id)
formats, subtitles = [], {}
last_e = None
for asset_type, query in asset_types.items():
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id,
'Downloading %s SMIL data' % asset_type)
except ExtractorError as e:
last_e = e
if asset_type != 'fallback':
continue
query['formats'] = '' # blank query to check if expired
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id,
'Downloading %s SMIL data, trying again with another format' % asset_type)
except ExtractorError as e:
last_e = e
continue
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
if last_e and not formats:
self.raise_no_formats(last_e, True, content_id)
self._sort_formats(formats)
extra_info.update({
'id': content_id,
'formats': formats,
'subtitles': subtitles,
})
info.update({k: v for k, v in extra_info.items() if v is not None})
return info
def _extract_video_info(self, *args, **kwargs):
# Extract assets + metadata and call _extract_common_video_info
raise NotImplementedError('This method must be implemented by subclasses')
def _real_extract(self, url):
return self._extract_video_info(self._match_id(url))
class CBSIE(CBSBaseIE):
_VALID_URL = r'''(?x)
(?:
cbs:|
https?://(?:www\.)?(?:
(?:cbs|paramountplus)\.com/(?:shows/[^/]+/video|movies/[^/]+)/|
cbs\.com/(?:shows/[^/]+/video|movies/[^/]+)/|
colbertlateshow\.com/(?:video|podcasts)/)
)(?P<id>[\w-]+)'''
# All tests are blocked outside US
_TESTS = [{
'url': 'https://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': {
@@ -51,19 +98,28 @@ class CBSIE(CBSBaseIE):
# m3u8 download
'skip_download': True,
},
'_skip': 'Blocked outside the US',
}, {
'url': 'https://www.cbs.com/shows/the-late-show-with-stephen-colbert/video/60icOhMb9NcjbcWnF_gub9XXHdeBcNk2/the-late-show-6-23-21-christine-baranski-joy-oladokun-',
'info_dict': {
'id': '60icOhMb9NcjbcWnF_gub9XXHdeBcNk2',
'title': 'The Late Show - 6/23/21 (Christine Baranski, Joy Oladokun)',
'timestamp': 1624507140,
'description': 'md5:e01af24e95c74d55e8775aef86117b95',
'uploader': 'CBSI-NEW',
'upload_date': '20210624',
},
'params': {
'ignore_no_formats_error': True,
'skip_download': True,
},
'expected_warnings': [
'This content expired on', 'No video formats found', 'Requested format is not available'],
}, {
'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/',
'only_matching': True,
}, {
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}, {
'url': 'https://www.paramountplus.com/shows/all-rise/video/QmR1WhNkh1a_IrdHZrbcRklm176X_rVc/all-rise-space/',
'only_matching': True,
}, {
'url': 'https://www.paramountplus.com/movies/million-dollar-american-princesses-meghan-and-harry/C0LpgNwXYeB8txxycdWdR9TjxpJOsdCq',
'only_matching': True,
}]
def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517):
@@ -72,53 +128,34 @@ class CBSIE(CBSBaseIE):
content_id, query={'partner': site, 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title') or xpath_text(video_data, 'videotitle', 'title')
tp_path = 'dJ5BDC/media/guid/%d/%s' % (mpx_acc, content_id)
tp_release_url = 'https://link.theplatform.com/s/' + tp_path
asset_types = []
subtitles = {}
formats = []
last_e = None
asset_types = {}
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types or 'HLS_FPS' in asset_type or 'DASH_CENC' in asset_type:
continue
asset_types.append(asset_type)
query = {
'mbr': 'true',
'assetTypes': asset_type,
}
if asset_type.startswith('HLS') or asset_type in ('OnceURL', 'StreamPack'):
if not asset_type:
# fallback for content_ids that videoPlayerService doesn't return anything for
asset_type = 'fallback'
query['formats'] = 'M3U+none,MPEG4,M3U+appleHlsEncryption,MP3'
del query['assetTypes']
if asset_type in asset_types:
continue
elif any(excluded in asset_type for excluded in ('HLS_FPS', 'DASH_CENC', 'OnceURL')):
continue
if asset_type.startswith('HLS') or 'StreamPack' in asset_type:
query['formats'] = 'MPEG4,M3U'
elif asset_type in ('RTMP', 'WIFI', '3G'):
query['formats'] = 'MPEG4,FLV'
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id,
'Downloading %s SMIL data' % asset_type)
except ExtractorError as e:
last_e = e
continue
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
if last_e and not formats:
raise last_e
self._sort_formats(formats)
asset_types[asset_type] = query
info = self._extract_theplatform_metadata(tp_path, content_id)
info.update({
'id': content_id,
return self._extract_common_video_info(content_id, asset_types, mpx_acc, extra_info={
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
'thumbnail': url_or_none(xpath_text(video_data, 'previewImageURL')),
})
return info
def _real_extract(self, url):
content_id = self._match_id(url)
return self._extract_video_info(content_id)

View File

@@ -35,7 +35,6 @@ from ..downloader.f4m import (
remove_encrypted_media,
)
from ..utils import (
NO_DEFAULT,
age_restricted,
base_url,
bug_reports_message,
@@ -45,10 +44,11 @@ from ..utils import (
determine_protocol,
dict_get,
error_to_compat_str,
ExtractorError,
extract_attributes,
ExtractorError,
fix_xml_ampersands,
float_or_none,
format_field,
GeoRestrictedError,
GeoUtils,
int_or_none,
@@ -56,6 +56,7 @@ from ..utils import (
JSON_LD_RE,
mimetype2ext,
network_exceptions,
NO_DEFAULT,
orderedSet,
parse_bitrate,
parse_codecs,
@@ -64,8 +65,8 @@ from ..utils import (
parse_m3u8_attributes,
parse_resolution,
RegexNotFoundError,
sanitized_Request,
sanitize_filename,
sanitized_Request,
str_or_none,
str_to_int,
strip_or_none,
@@ -75,9 +76,9 @@ from ..utils import (
unified_timestamp,
update_Request,
update_url_query,
urljoin,
url_basename,
url_or_none,
urljoin,
variadic,
xpath_element,
xpath_text,
@@ -297,7 +298,7 @@ class InfoExtractor(object):
live stream that goes on instead of a fixed-length video.
was_live: True, False, or None (=unknown). Whether this video was
originally a live stream.
live_status: 'is_live', 'upcoming', 'was_live', 'not_live' or None (=unknown)
live_status: 'is_live', 'is_upcoming', 'was_live', 'not_live' or None (=unknown)
If absent, automatically set from is_live, was_live
start_time: Time in seconds where the reproduction should start, as
specified in the URL.
@@ -442,6 +443,7 @@ class InfoExtractor(object):
"""Constructor. Receives an optional downloader."""
self._ready = False
self._x_forwarded_for_ip = None
self._printed_messages = set()
self.set_downloader(downloader)
@classmethod
@@ -470,6 +472,7 @@ class InfoExtractor(object):
def initialize(self):
"""Initializes an instance (authentication, etc)."""
self._printed_messages = set()
self._initialize_geo_bypass({
'countries': self._GEO_COUNTRIES,
'ip_blocks': self._GEO_IP_BLOCKS,
@@ -999,10 +1002,14 @@ class InfoExtractor(object):
expected_status=expected_status)
return res if res is False else res[0]
def report_warning(self, msg, video_id=None, *args, **kwargs):
idstr = '' if video_id is None else '%s: ' % video_id
self._downloader.report_warning(
'[%s] %s%s' % (self.IE_NAME, idstr, msg), *args, **kwargs)
def report_warning(self, msg, video_id=None, *args, only_once=False, **kwargs):
idstr = format_field(video_id, template='%s: ')
msg = f'[{self.IE_NAME}] {idstr}{msg}'
if only_once:
if f'WARNING: {msg}' in self._printed_messages:
return
self._printed_messages.add(f'WARNING: {msg}')
self._downloader.report_warning(msg, *args, **kwargs)
def to_screen(self, msg, *args, **kwargs):
"""Print msg to screen, prefixing it with '[ie_name]'"""
@@ -1052,6 +1059,8 @@ class InfoExtractor(object):
def raise_no_formats(self, msg, expected=False, video_id=None):
if expected and self.get_param('ignore_no_formats_error'):
self.report_warning(msg, video_id)
elif isinstance(msg, ExtractorError):
raise msg
else:
raise ExtractorError(msg, expected=expected, video_id=video_id)
@@ -1298,7 +1307,7 @@ class InfoExtractor(object):
# JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
fatal = kwargs.get('fatal', True) if default is NO_DEFAULT else False
json_ld = []
for mobj in json_ld_list:
json_ld_item = self._parse_json(
@@ -1497,7 +1506,7 @@ class InfoExtractor(object):
'order': ('m4a', 'aac', 'mp3', 'ogg', 'opus', 'webm', '', 'none'),
'order_free': ('opus', 'ogg', 'webm', 'm4a', 'mp3', 'aac', '', 'none')},
'hidden': {'visible': False, 'forced': True, 'type': 'extractor', 'max': -1000},
'aud_or_vid': {'visible': False, 'forced': True, 'type': 'multiple', 'default': 1,
'aud_or_vid': {'visible': False, 'forced': True, 'type': 'multiple',
'field': ('vcodec', 'acodec'),
'function': lambda it: int(any(v != 'none' for v in it))},
'ie_pref': {'priority': True, 'type': 'extractor'},
@@ -1521,7 +1530,8 @@ class InfoExtractor(object):
'br': {'type': 'combined', 'field': ('tbr', 'vbr', 'abr'), 'same_limit': True},
'size': {'type': 'combined', 'same_limit': True, 'field': ('filesize', 'fs_approx')},
'ext': {'type': 'combined', 'field': ('vext', 'aext')},
'res': {'type': 'multiple', 'field': ('height', 'width'), 'function': min},
'res': {'type': 'multiple', 'field': ('height', 'width'),
'function': lambda it: (lambda l: min(l) if l else 0)(tuple(filter(None, it)))},
# Most of these exist only for compatibility reasons
'dimension': {'type': 'alias', 'field': 'res'},
@@ -1565,7 +1575,7 @@ class InfoExtractor(object):
elif key == 'convert':
default = 'order' if type == 'ordered' else 'float_string' if field else 'ignore'
else:
default = {'type': 'field', 'visible': True, 'order': [], 'not_in_list': (None,), 'function': max}.get(key, None)
default = {'type': 'field', 'visible': True, 'order': [], 'not_in_list': (None,)}.get(key, None)
propObj[key] = default
return propObj[key]
@@ -1705,11 +1715,7 @@ class InfoExtractor(object):
type = 'field' # Only 'field' is allowed in multiple for now
actual_fields = self._get_field_setting(field, 'field')
def wrapped_function(values):
values = tuple(filter(lambda x: x is not None, values))
return self._get_field_setting(field, 'function')(values) if values else None
value = wrapped_function((get_value(f) for f in actual_fields))
value = self._get_field_setting(field, 'function')(get_value(f) for f in actual_fields)
else:
value = get_value(field)
return self._calculate_field_preference_from_value(format, field, type, value)
@@ -1948,7 +1954,7 @@ class InfoExtractor(object):
self.report_warning(bug_reports_message(
"Ignoring subtitle tracks found in the HLS manifest; "
"if any subtitle tracks are missing,"
))
), only_once=True)
return fmts
def _extract_m3u8_formats_and_subtitles(
@@ -2231,7 +2237,7 @@ class InfoExtractor(object):
self.report_warning(bug_reports_message(
"Ignoring subtitle tracks found in the SMIL manifest; "
"if any subtitle tracks are missing,"
))
), only_once=True)
return fmts
def _extract_smil_info(self, smil_url, video_id, fatal=True, f4m_params=None):
@@ -2457,7 +2463,7 @@ class InfoExtractor(object):
self.report_warning(bug_reports_message(
"Ignoring subtitle tracks found in the DASH manifest; "
"if any subtitle tracks are missing,"
))
), only_once=True)
return fmts
def _extract_mpd_formats_and_subtitles(
@@ -2484,7 +2490,7 @@ class InfoExtractor(object):
self.report_warning(bug_reports_message(
"Ignoring subtitle tracks found in the DASH manifest; "
"if any subtitle tracks are missing,"
))
), only_once=True)
return fmts
def _parse_mpd_formats_and_subtitles(
@@ -2590,215 +2596,223 @@ class InfoExtractor(object):
mime_type = representation_attrib['mimeType']
content_type = representation_attrib.get('contentType', mime_type.split('/')[0])
if content_type in ('video', 'audio', 'text') or mime_type == 'image/jpeg':
base_url = ''
for element in (representation, adaptation_set, period, mpd_doc):
base_url_e = element.find(_add_ns('BaseURL'))
if base_url_e is not None:
base_url = base_url_e.text + base_url
if re.match(r'^https?://', base_url):
break
if mpd_base_url and not re.match(r'^https?://', base_url):
if not mpd_base_url.endswith('/') and not base_url.startswith('/'):
mpd_base_url += '/'
base_url = mpd_base_url + base_url
representation_id = representation_attrib.get('id')
lang = representation_attrib.get('lang')
url_el = representation.find(_add_ns('BaseURL'))
filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
bandwidth = int_or_none(representation_attrib.get('bandwidth'))
if representation_id is not None:
format_id = representation_id
codecs = representation_attrib.get('codecs', '')
if content_type not in ('video', 'audio', 'text'):
if mime_type == 'image/jpeg':
content_type = 'image/jpeg'
if codecs.split('.')[0] == 'stpp':
content_type = 'text'
else:
format_id = content_type
if mpd_id:
format_id = mpd_id + '-' + format_id
if content_type in ('video', 'audio'):
f = {
'format_id': format_id,
'manifest_url': mpd_url,
'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')),
'tbr': float_or_none(bandwidth, 1000),
'asr': int_or_none(representation_attrib.get('audioSamplingRate')),
'fps': int_or_none(representation_attrib.get('frameRate')),
'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
'format_note': 'DASH %s' % content_type,
'filesize': filesize,
'container': mimetype2ext(mime_type) + '_dash',
}
f.update(parse_codecs(representation_attrib.get('codecs')))
elif content_type == 'text':
f = {
'ext': mimetype2ext(mime_type),
'manifest_url': mpd_url,
'filesize': filesize,
}
elif mime_type == 'image/jpeg':
# See test case in VikiIE
# https://www.viki.com/videos/1175236v-choosing-spouse-by-lottery-episode-1
f = {
'format_id': format_id,
'ext': 'mhtml',
'manifest_url': mpd_url,
'format_note': 'DASH storyboards (jpeg)',
'acodec': 'none',
'vcodec': 'none',
}
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
continue
def prepare_template(template_name, identifiers):
tmpl = representation_ms_info[template_name]
# First of, % characters outside $...$ templates
# must be escaped by doubling for proper processing
# by % operator string formatting used further (see
# https://github.com/ytdl-org/youtube-dl/issues/16867).
t = ''
in_template = False
for c in tmpl:
base_url = ''
for element in (representation, adaptation_set, period, mpd_doc):
base_url_e = element.find(_add_ns('BaseURL'))
if base_url_e is not None:
base_url = base_url_e.text + base_url
if re.match(r'^https?://', base_url):
break
if mpd_base_url and not re.match(r'^https?://', base_url):
if not mpd_base_url.endswith('/') and not base_url.startswith('/'):
mpd_base_url += '/'
base_url = mpd_base_url + base_url
representation_id = representation_attrib.get('id')
lang = representation_attrib.get('lang')
url_el = representation.find(_add_ns('BaseURL'))
filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
bandwidth = int_or_none(representation_attrib.get('bandwidth'))
if representation_id is not None:
format_id = representation_id
else:
format_id = content_type
if mpd_id:
format_id = mpd_id + '-' + format_id
if content_type in ('video', 'audio'):
f = {
'format_id': format_id,
'manifest_url': mpd_url,
'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')),
'tbr': float_or_none(bandwidth, 1000),
'asr': int_or_none(representation_attrib.get('audioSamplingRate')),
'fps': int_or_none(representation_attrib.get('frameRate')),
'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
'format_note': 'DASH %s' % content_type,
'filesize': filesize,
'container': mimetype2ext(mime_type) + '_dash',
}
f.update(parse_codecs(codecs))
elif content_type == 'text':
f = {
'ext': mimetype2ext(mime_type),
'manifest_url': mpd_url,
'filesize': filesize,
}
elif content_type == 'image/jpeg':
# See test case in VikiIE
# https://www.viki.com/videos/1175236v-choosing-spouse-by-lottery-episode-1
f = {
'format_id': format_id,
'ext': 'mhtml',
'manifest_url': mpd_url,
'format_note': 'DASH storyboards (jpeg)',
'acodec': 'none',
'vcodec': 'none',
}
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
def prepare_template(template_name, identifiers):
tmpl = representation_ms_info[template_name]
# First of, % characters outside $...$ templates
# must be escaped by doubling for proper processing
# by % operator string formatting used further (see
# https://github.com/ytdl-org/youtube-dl/issues/16867).
t = ''
in_template = False
for c in tmpl:
t += c
if c == '$':
in_template = not in_template
elif c == '%' and not in_template:
t += c
if c == '$':
in_template = not in_template
elif c == '%' and not in_template:
t += c
# Next, $...$ templates are translated to their
# %(...) counterparts to be used with % operator
if representation_id is not None:
t = t.replace('$RepresentationID$', representation_id)
t = re.sub(r'\$(%s)\$' % '|'.join(identifiers), r'%(\1)d', t)
t = re.sub(r'\$(%s)%%([^$]+)\$' % '|'.join(identifiers), r'%(\1)\2', t)
t.replace('$$', '$')
return t
# Next, $...$ templates are translated to their
# %(...) counterparts to be used with % operator
if representation_id is not None:
t = t.replace('$RepresentationID$', representation_id)
t = re.sub(r'\$(%s)\$' % '|'.join(identifiers), r'%(\1)d', t)
t = re.sub(r'\$(%s)%%([^$]+)\$' % '|'.join(identifiers), r'%(\1)\2', t)
t.replace('$$', '$')
return t
# @initialization is a regular template like @media one
# so it should be handled just the same way (see
# https://github.com/ytdl-org/youtube-dl/issues/11605)
if 'initialization' in representation_ms_info:
initialization_template = prepare_template(
'initialization',
# As per [1, 5.3.9.4.2, Table 15, page 54] $Number$ and
# $Time$ shall not be included for @initialization thus
# only $Bandwidth$ remains
('Bandwidth', ))
representation_ms_info['initialization_url'] = initialization_template % {
'Bandwidth': bandwidth,
}
# @initialization is a regular template like @media one
# so it should be handled just the same way (see
# https://github.com/ytdl-org/youtube-dl/issues/11605)
if 'initialization' in representation_ms_info:
initialization_template = prepare_template(
'initialization',
# As per [1, 5.3.9.4.2, Table 15, page 54] $Number$ and
# $Time$ shall not be included for @initialization thus
# only $Bandwidth$ remains
('Bandwidth', ))
representation_ms_info['initialization_url'] = initialization_template % {
'Bandwidth': bandwidth,
}
def location_key(location):
return 'url' if re.match(r'^https?://', location) else 'path'
def location_key(location):
return 'url' if re.match(r'^https?://', location) else 'path'
if 'segment_urls' not in representation_ms_info and 'media' in representation_ms_info:
if 'segment_urls' not in representation_ms_info and 'media' in representation_ms_info:
media_template = prepare_template('media', ('Number', 'Bandwidth', 'Time'))
media_location_key = location_key(media_template)
media_template = prepare_template('media', ('Number', 'Bandwidth', 'Time'))
media_location_key = location_key(media_template)
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time
if '%(Number' in media_template and 's' not in representation_ms_info:
segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info:
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{
media_location_key: media_template % {
'Number': segment_number,
'Bandwidth': bandwidth,
},
'duration': segment_duration,
} for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else:
# $Number*$ or $Time$ in media template with S list available
# Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
# Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
representation_ms_info['fragments'] = []
segment_time = 0
segment_d = None
segment_number = representation_ms_info['start_number']
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time
if '%(Number' in media_template and 's' not in representation_ms_info:
segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info:
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{
media_location_key: media_template % {
'Number': segment_number,
'Bandwidth': bandwidth,
},
'duration': segment_duration,
} for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else:
# $Number*$ or $Time$ in media template with S list available
# Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
# Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
representation_ms_info['fragments'] = []
segment_time = 0
segment_d = None
segment_number = representation_ms_info['start_number']
def add_segment_url():
segment_url = media_template % {
'Time': segment_time,
'Bandwidth': bandwidth,
'Number': segment_number,
}
representation_ms_info['fragments'].append({
media_location_key: segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
def add_segment_url():
segment_url = media_template % {
'Time': segment_time,
'Bandwidth': bandwidth,
'Number': segment_number,
}
representation_ms_info['fragments'].append({
media_location_key: segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time
segment_d = s['d']
for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time
segment_d = s['d']
add_segment_url()
segment_number += 1
for r in range(s.get('r', 0)):
segment_time += segment_d
add_segment_url()
segment_number += 1
for r in range(s.get('r', 0)):
segment_time += segment_d
add_segment_url()
segment_number += 1
segment_time += segment_d
elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
# No media template
# Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
# or any YouTube dashsegments video
fragments = []
segment_index = 0
timescale = representation_ms_info['timescale']
for s in representation_ms_info['s']:
duration = float_or_none(s['d'], timescale)
for r in range(s.get('r', 0) + 1):
segment_uri = representation_ms_info['segment_urls'][segment_index]
fragments.append({
location_key(segment_uri): segment_uri,
'duration': duration,
})
segment_index += 1
representation_ms_info['fragments'] = fragments
elif 'segment_urls' in representation_ms_info:
# Segment URLs with no SegmentTimeline
# Example: https://www.seznam.cz/zpravy/clanek/cesko-zasahne-vitr-o-sile-vichrice-muze-byt-i-zivotu-nebezpecny-39091
# https://github.com/ytdl-org/youtube-dl/pull/14844
fragments = []
segment_duration = float_or_none(
representation_ms_info['segment_duration'],
representation_ms_info['timescale']) if 'segment_duration' in representation_ms_info else None
for segment_url in representation_ms_info['segment_urls']:
fragment = {
location_key(segment_url): segment_url,
}
if segment_duration:
fragment['duration'] = segment_duration
fragments.append(fragment)
representation_ms_info['fragments'] = fragments
# If there is a fragments key available then we correctly recognized fragmented media.
# Otherwise we will assume unfragmented media with direct access. Technically, such
# assumption is not necessarily correct since we may simply have no support for
# some forms of fragmented media renditions yet, but for now we'll use this fallback.
if 'fragments' in representation_ms_info:
f.update({
# NB: mpd_url may be empty when MPD manifest is parsed from a string
'url': mpd_url or base_url,
'fragment_base_url': base_url,
'fragments': [],
'protocol': 'http_dash_segments' if mime_type != 'image/jpeg' else 'mhtml',
})
if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url']
if not f.get('url'):
f['url'] = initialization_url
f['fragments'].append({location_key(initialization_url): initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
else:
# Assuming direct URL to unfragmented media.
f['url'] = base_url
if content_type in ('video', 'audio') or mime_type == 'image/jpeg':
formats.append(f)
elif content_type == 'text':
subtitles.setdefault(lang or 'und', []).append(f)
segment_time += segment_d
elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
# No media template
# Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
# or any YouTube dashsegments video
fragments = []
segment_index = 0
timescale = representation_ms_info['timescale']
for s in representation_ms_info['s']:
duration = float_or_none(s['d'], timescale)
for r in range(s.get('r', 0) + 1):
segment_uri = representation_ms_info['segment_urls'][segment_index]
fragments.append({
location_key(segment_uri): segment_uri,
'duration': duration,
})
segment_index += 1
representation_ms_info['fragments'] = fragments
elif 'segment_urls' in representation_ms_info:
# Segment URLs with no SegmentTimeline
# Example: https://www.seznam.cz/zpravy/clanek/cesko-zasahne-vitr-o-sile-vichrice-muze-byt-i-zivotu-nebezpecny-39091
# https://github.com/ytdl-org/youtube-dl/pull/14844
fragments = []
segment_duration = float_or_none(
representation_ms_info['segment_duration'],
representation_ms_info['timescale']) if 'segment_duration' in representation_ms_info else None
for segment_url in representation_ms_info['segment_urls']:
fragment = {
location_key(segment_url): segment_url,
}
if segment_duration:
fragment['duration'] = segment_duration
fragments.append(fragment)
representation_ms_info['fragments'] = fragments
# If there is a fragments key available then we correctly recognized fragmented media.
# Otherwise we will assume unfragmented media with direct access. Technically, such
# assumption is not necessarily correct since we may simply have no support for
# some forms of fragmented media renditions yet, but for now we'll use this fallback.
if 'fragments' in representation_ms_info:
f.update({
# NB: mpd_url may be empty when MPD manifest is parsed from a string
'url': mpd_url or base_url,
'fragment_base_url': base_url,
'fragments': [],
'protocol': 'http_dash_segments' if mime_type != 'image/jpeg' else 'mhtml',
})
if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url']
if not f.get('url'):
f['url'] = initialization_url
f['fragments'].append({location_key(initialization_url): initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
# Assuming direct URL to unfragmented media.
f['url'] = base_url
if content_type in ('video', 'audio') or mime_type == 'image/jpeg':
formats.append(f)
elif content_type == 'text':
subtitles.setdefault(lang or 'und', []).append(f)
return formats, subtitles
def _extract_ism_formats(self, *args, **kwargs):

View File

@@ -29,6 +29,7 @@ from ..utils import (
merge_dicts,
remove_end,
sanitized_Request,
try_get,
urlencode_postdata,
xpath_text,
)
@@ -458,6 +459,18 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
video_description = (self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
thumbnails = []
thumbnail_url = (self._parse_json(self._html_search_regex(
r'<script type="application\/ld\+json">\n\s*(.+?)<\/script>',
webpage, 'thumbnail_url', default='{}'), video_id)).get('image')
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'width': 1920,
'height': 1080
})
if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
video_uploader = self._html_search_regex(
@@ -592,21 +605,25 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
r'(?s)<h\d[^>]+\bid=["\']showmedia_about_episode_num[^>]+>(.+?)</h\d',
webpage, 'series', fatal=False)
season = episode = episode_number = duration = thumbnail = None
season = episode = episode_number = duration = None
if isinstance(metadata, compat_etree_Element):
season = xpath_text(metadata, 'series_title')
episode = xpath_text(metadata, 'episode_title')
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
duration = float_or_none(media_metadata.get('duration'), 1000)
thumbnail = xpath_text(metadata, 'episode_image_url')
if not episode:
episode = media_metadata.get('title')
if not episode_number:
episode_number = int_or_none(media_metadata.get('episode_number'))
if not thumbnail:
thumbnail = media_metadata.get('thumbnail', {}).get('url')
thumbnail_url = try_get(media, lambda x: x['thumbnail']['url'])
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'width': 640,
'height': 360
})
season_number = int_or_none(self._search_regex(
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
@@ -619,7 +636,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'title': video_title,
'description': video_description,
'duration': duration,
'thumbnail': thumbnail,
'thumbnails': thumbnails,
'uploader': video_uploader,
'series': series,
'season': season,

View File

@@ -63,8 +63,7 @@ class DiscoveryPlusIndiaShowIE(InfoExtractor):
'info_dict': {
'id': 'how-do-they-do-it',
},
}
]
}]
def _entries(self, show_name):
headers = {

View File

@@ -296,50 +296,6 @@ class DPlayIE(InfoExtractor):
url, display_id, host, 'dplay' + country, country)
class DiscoveryPlusIE(DPlayIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/video' + DPlayIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoveryplus.com/video/property-brothers-forever-home/food-and-family',
'info_dict': {
'id': '1140794',
'display_id': 'property-brothers-forever-home/food-and-family',
'ext': 'mp4',
'title': 'Food and Family',
'description': 'The brothers help a Richmond family expand their single-level home.',
'duration': 2583.113,
'timestamp': 1609304400,
'upload_date': '20201230',
'creator': 'HGTV',
'series': 'Property Brothers: Forever Home',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}]
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers['x-disco-client'] = 'WEB:UNKNOWN:dplus_us:15.0.0'
def _download_video_playback_info(self, disco_base, video_id, headers):
return self._download_json(
disco_base + 'playback/v3/videoPlaybackInfo',
video_id, headers=headers, data=json.dumps({
'deviceInfo': {
'adBlocker': False,
},
'videoId': video_id,
'wisteriaProperties': {
'platform': 'desktop',
'product': 'dplus_us',
},
}).encode('utf-8'))['data']['attributes']['streaming']
def _real_extract(self, url):
display_id = self._match_id(url)
return self._get_disco_api_info(
url, display_id, 'us1-prod-direct.discoveryplus.com', 'go', 'us')
class HGTVDeIE(DPlayIE):
_VALID_URL = r'https?://de\.hgtv\.com/sendungen' + DPlayIE._PATH_REGEX
_TESTS = [{
@@ -367,3 +323,70 @@ class HGTVDeIE(DPlayIE):
display_id = self._match_id(url)
return self._get_disco_api_info(
url, display_id, 'eu1-prod.disco-api.com', 'hgtv', 'de')
class DiscoveryPlusIE(DPlayIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/video' + DPlayIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoveryplus.com/video/property-brothers-forever-home/food-and-family',
'info_dict': {
'id': '1140794',
'display_id': 'property-brothers-forever-home/food-and-family',
'ext': 'mp4',
'title': 'Food and Family',
'description': 'The brothers help a Richmond family expand their single-level home.',
'duration': 2583.113,
'timestamp': 1609304400,
'upload_date': '20201230',
'creator': 'HGTV',
'series': 'Property Brothers: Forever Home',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}]
_PRODUCT = 'dplus_us'
_API_URL = 'us1-prod-direct.discoveryplus.com'
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers['x-disco-client'] = f'WEB:UNKNOWN:{self._PRODUCT}:15.0.0'
def _download_video_playback_info(self, disco_base, video_id, headers):
return self._download_json(
disco_base + 'playback/v3/videoPlaybackInfo',
video_id, headers=headers, data=json.dumps({
'deviceInfo': {
'adBlocker': False,
},
'videoId': video_id,
'wisteriaProperties': {
'platform': 'desktop',
'product': self._PRODUCT,
},
}).encode('utf-8'))['data']['attributes']['streaming']
def _real_extract(self, url):
display_id = self._match_id(url)
return self._get_disco_api_info(
url, display_id, self._API_URL, 'go', 'us')
class ScienceChannelIE(DiscoveryPlusIE):
_VALID_URL = r'https?://(?:www\.)?sciencechannel\.com/video' + DPlayIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.sciencechannel.com/video/strangest-things-science-atve-us/nazi-mystery-machine',
'info_dict': {
'id': '2842849',
'display_id': 'strangest-things-science-atve-us/nazi-mystery-machine',
'ext': 'mp4',
'title': 'Nazi Mystery Machine',
'description': 'Experts investigate the secrets of a revolutionary encryption machine.',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}]
_PRODUCT = 'sci'
_API_URL = 'us1-prod-direct.sciencechannel.com'

View File

@@ -90,3 +90,40 @@ class EroProfileIE(InfoExtractor):
'title': title,
'age_limit': 18,
})
class EroProfileAlbumIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?eroprofile\.com/m/videos/album/(?P<id>[^/]+)'
IE_NAME = 'EroProfile:album'
_TESTS = [{
'url': 'https://www.eroprofile.com/m/videos/album/BBW-2-893',
'info_dict': {
'id': 'BBW-2-893',
'title': 'BBW 2'
},
'playlist_mincount': 486,
},
]
def _extract_from_page(self, page):
for url in re.findall(r'href=".*?(/m/videos/view/[^"]+)"', page):
yield self.url_result(f'https://www.eroprofile.com{url}', EroProfileIE.ie_key())
def _entries(self, playlist_id, first_page):
yield from self._extract_from_page(first_page)
page_urls = re.findall(rf'href=".*?(/m/videos/album/{playlist_id}\?pnum=(\d+))"', first_page)
for url, n in page_urls[1:]:
yield from self._extract_from_page(self._download_webpage(
f'https://www.eroprofile.com{url}',
playlist_id, note=f'Downloading playlist page {int(n) - 1}'))
def _real_extract(self, url):
playlist_id = self._match_id(url)
first_page = self._download_webpage(url, playlist_id, note='Downloading playlist')
playlist_title = self._search_regex(
r'<title>Album: (.*) - EroProfile</title>', first_page, 'playlist_title')
return self.playlist_result(self._entries(playlist_id, first_page), playlist_id, playlist_title)

View File

@@ -109,7 +109,12 @@ from .awaan import (
from .azmedien import AZMedienIE
from .baidu import BaiduVideoIE
from .bandaichannel import BandaiChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .bandcamp import (
BandcampIE,
BandcampAlbumIE,
BandcampWeeklyIE,
BandcampMusicIE,
)
from .bbc import (
BBCCoUkIE,
BBCCoUkArticleIE,
@@ -151,6 +156,7 @@ from .bitwave import (
BitwaveStreamIE,
)
from .biqle import BIQLEIE
from .blackboardcollaborate import BlackboardCollaborateIE
from .bleacherreport import (
BleacherReportIE,
BleacherReportCMSIE,
@@ -330,6 +336,7 @@ from .dplay import (
DPlayIE,
DiscoveryPlusIE,
HGTVDeIE,
ScienceChannelIE
)
from .dreisat import DreiSatIE
from .drbonanza import DRBonanzaIE
@@ -382,7 +389,10 @@ from .elpais import ElPaisIE
from .embedly import EmbedlyIE
from .engadget import EngadgetIE
from .eporner import EpornerIE
from .eroprofile import EroProfileIE
from .eroprofile import (
EroProfileIE,
EroProfileAlbumIE,
)
from .escapist import EscapistIE
from .espn import (
ESPNIE,
@@ -731,6 +741,10 @@ from .minds import (
from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE
from .miomio import MioMioIE
from .mirrativ import (
MirrativIE,
MirrativUserIE,
)
from .mit import TechTVMITIE, OCWMITIE
from .mitele import MiTeleIE
from .mixcloud import (
@@ -932,6 +946,10 @@ from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
)
from .openrec import (
OpenRecIE,
OpenRecCaptureIE,
)
from .ora import OraTVIE
from .orf import (
ORFTVthekIE,
@@ -961,6 +979,10 @@ from .palcomp3 import (
PalcoMP3VideoIE,
)
from .pandoratv import PandoraTVIE
from .paramountplus import (
ParamountPlusIE,
ParamountPlusSeriesIE,
)
from .parliamentliveuk import ParliamentLiveUKIE
from .parlview import ParlviewIE
from .patreon import PatreonIE
@@ -1073,6 +1095,7 @@ from .rcs import (
from .rcti import (
RCTIPlusIE,
RCTIPlusSeriesIE,
RCTIPlusTVIE,
)
from .rds import RDSIE
from .redbulltv import (
@@ -1338,7 +1361,6 @@ from .theweatherchannel import TheWeatherChannelIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .thisoldhouse import ThisOldHouseIE
from .thisvid import ThisVidIE
from .threeqsdn import ThreeQSDNIE
from .tiktok import TikTokIE
from .tinypic import TinyPicIE
@@ -1492,6 +1514,7 @@ from .ustudio import (
UstudioIE,
UstudioEmbedIE,
)
from .utreon import UtreonIE
from .varzesh3 import Varzesh3IE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE

View File

@@ -2238,6 +2238,87 @@ class GenericIE(InfoExtractor):
'title': '#WEAREFMI PT.2 2021 MsMotorTV',
},
'playlist_count': 1,
}, {
# KVS Player
'url': 'https://www.kvs-demo.com/videos/105/kelis-4th-of-july/',
'info_dict': {
'id': '105',
'display_id': 'kelis-4th-of-july',
'ext': 'mp4',
'title': 'Kelis - 4th Of July',
'thumbnail': 'https://kvs-demo.com/contents/videos_screenshots/0/105/preview.jpg',
},
'params': {
'skip_download': True,
},
}, {
# KVS Player
'url': 'https://www.kvs-demo.com/embed/105/',
'info_dict': {
'id': '105',
'display_id': 'kelis-4th-of-july',
'ext': 'mp4',
'title': 'Kelis - 4th Of July / Embed Player',
'thumbnail': 'https://kvs-demo.com/contents/videos_screenshots/0/105/preview.jpg',
},
'params': {
'skip_download': True,
},
}, {
# KVS Player
'url': 'https://thisvid.com/videos/french-boy-pantsed/',
'md5': '3397979512c682f6b85b3b04989df224',
'info_dict': {
'id': '2400174',
'display_id': 'french-boy-pantsed',
'ext': 'mp4',
'title': 'French Boy Pantsed - ThisVid.com',
'thumbnail': 'https://media.thisvid.com/contents/videos_screenshots/2400000/2400174/preview.mp4.jpg',
}
}, {
# KVS Player
'url': 'https://thisvid.com/embed/2400174/',
'md5': '3397979512c682f6b85b3b04989df224',
'info_dict': {
'id': '2400174',
'display_id': 'french-boy-pantsed',
'ext': 'mp4',
'title': 'French Boy Pantsed - ThisVid.com',
'thumbnail': 'https://media.thisvid.com/contents/videos_screenshots/2400000/2400174/preview.mp4.jpg',
}
}, {
# KVS Player
'url': 'https://youix.com/video/leningrad-zoj/',
'md5': '94f96ba95706dc3880812b27b7d8a2b8',
'info_dict': {
'id': '18485',
'display_id': 'leningrad-zoj',
'ext': 'mp4',
'title': 'Клип: Ленинград - ЗОЖ скачать, смотреть онлайн | Youix.com',
'thumbnail': 'https://youix.com/contents/videos_screenshots/18000/18485/preview_480x320_youix_com.mp4.jpg',
}
}, {
# KVS Player
'url': 'https://youix.com/embed/18485',
'md5': '94f96ba95706dc3880812b27b7d8a2b8',
'info_dict': {
'id': '18485',
'display_id': 'leningrad-zoj',
'ext': 'mp4',
'title': 'Ленинград - ЗОЖ',
'thumbnail': 'https://youix.com/contents/videos_screenshots/18000/18485/preview_480x320_youix_com.mp4.jpg',
}
}, {
# KVS Player
'url': 'https://bogmedia.org/videos/21217/40-nochey-40-nights-2016/',
'md5': '94166bdb26b4cb1fb9214319a629fc51',
'info_dict': {
'id': '21217',
'display_id': '40-nochey-40-nights-2016',
'ext': 'mp4',
'title': '40 ночей (2016) - BogMedia.org',
'thumbnail': 'https://bogmedia.org/contents/videos_screenshots/21000/21217/preview_480p.mp4.jpg',
}
},
]
@@ -2343,6 +2424,44 @@ class GenericIE(InfoExtractor):
'title': title,
}
def _kvs_getrealurl(self, video_url, license_code):
if not video_url.startswith('function/0/'):
return video_url # not obfuscated
url_path, _, url_query = video_url.partition('?')
urlparts = url_path.split('/')[2:]
license = self._kvs_getlicensetoken(license_code)
newmagic = urlparts[5][:32]
for o in range(len(newmagic) - 1, -1, -1):
new = ''
l = (o + sum([int(n) for n in license[o:]])) % 32
for i in range(0, len(newmagic)):
if i == o:
new += newmagic[l]
elif i == l:
new += newmagic[o]
else:
new += newmagic[i]
newmagic = new
urlparts[5] = newmagic + urlparts[5][32:]
return '/'.join(urlparts) + '?' + url_query
def _kvs_getlicensetoken(self, license):
modlicense = license.replace('$', '').replace('0', '1')
center = int(len(modlicense) / 2)
fronthalf = int(modlicense[:center + 1])
backhalf = int(modlicense[center:])
modlicense = str(4 * abs(fronthalf - backhalf))
retval = ''
for o in range(0, center + 1):
for i in range(1, 5):
retval += str((int(license[o + i]) + int(modlicense[o])) % 10)
return retval
def _real_extract(self, url):
if url.startswith('//'):
return self.url_result(self.http_scheme() + url)
@@ -3478,6 +3597,52 @@ class GenericIE(InfoExtractor):
)
.*?
['"]?file['"]?\s*:\s*["\'](.*?)["\']''', webpage))
if not found:
# Look for generic KVS player
found = re.search(r'<script [^>]*?src="https://.+?/kt_player\.js\?v=(?P<ver>(?P<maj_ver>\d+)(\.\d+)+)".*?>', webpage)
if found:
if found.group('maj_ver') not in ['4', '5']:
self.report_warning('Untested major version (%s) in player engine--Download may fail.' % found.group('ver'))
flashvars = re.search(r'(?ms)<script.*?>.*?var\s+flashvars\s*=\s*(\{.*?\});.*?</script>', webpage)
flashvars = self._parse_json(flashvars.group(1), video_id, transform_source=js_to_json)
# extract the part after the last / as the display_id from the
# canonical URL.
display_id = self._search_regex(
r'(?:<link href="https?://[^"]+/(.+?)/?" rel="canonical"\s*/?>'
r'|<link rel="canonical" href="https?://[^"]+/(.+?)/?"\s*/?>)',
webpage, 'display_id', fatal=False
)
title = self._html_search_regex(r'<(?:h1|title)>(?:Video: )?(.+?)</(?:h1|title)>', webpage, 'title')
thumbnail = flashvars['preview_url']
if thumbnail.startswith('//'):
protocol, _, _ = url.partition('/')
thumbnail = protocol + thumbnail
formats = []
for key in ('video_url', 'video_alt_url', 'video_alt_url2'):
if key in flashvars and '/get_file/' in flashvars[key]:
next_format = {
'url': self._kvs_getrealurl(flashvars[key], flashvars['license_code']),
'format_id': flashvars.get(key + '_text', key),
'ext': 'mp4',
}
height = re.search(r'%s_(\d+)p\.mp4(?:/[?].*)?$' % flashvars['video_id'], flashvars[key])
if height:
next_format['height'] = int(height.group(1))
else:
next_format['quality'] = 1
formats.append(next_format)
self._sort_formats(formats)
return {
'id': flashvars['video_id'],
'display_id': display_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}
if not found:
# Broaden the search a little bit
found = filter_video(re.findall(r'[^A-Za-z0-9]?(?:file|source)=(http[^\'"&]*)', webpage))

View File

@@ -7,7 +7,6 @@ import re
import time
import uuid
import json
import random
from .common import InfoExtractor
from ..compat import (
@@ -27,34 +26,24 @@ from ..utils import (
class HotStarBaseIE(InfoExtractor):
_AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee'
def _call_api_impl(self, path, video_id, query, st=None):
def _call_api_impl(self, path, video_id, query, st=None, cookies=None):
st = int_or_none(st) or int(time.time())
exp = st + 6000
auth = 'st=%d~exp=%d~acl=/*' % (st, exp)
auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest()
def _generate_device_id():
"""
Reversed from javascript library.
JS function is generateUUID
"""
t = int(round(time.time() * 1000))
e = "xxxxxxxx-xxxx-4xxx-xxxx-xxxxxxxxxxxx" # 4 seems to be interchangeable
def _replacer():
n = int((t + 16 * random.random())) % 16 | 0
return hex(n if "x" == e else 3 & n | 8)[2:]
return "".join([_.replace('x', _replacer()) for _ in e])
token = self._download_json(
'https://api.hotstar.com/um/v3/users',
video_id, note='Downloading token',
data=json.dumps({"device_ids": [{"id": compat_str(uuid.uuid4()), "type": "device_id"}]}).encode('utf-8'),
headers={
'hotstarauth': auth,
'x-hs-platform': 'PCTV', # or 'web'
'Content-Type': 'application/json',
})['user_identity']
if cookies and cookies.get('userUP'):
token = cookies.get('userUP').value
else:
token = self._download_json(
'https://api.hotstar.com/um/v3/users',
video_id, note='Downloading token',
data=json.dumps({"device_ids": [{"id": compat_str(uuid.uuid4()), "type": "device_id"}]}).encode('utf-8'),
headers={
'hotstarauth': auth,
'x-hs-platform': 'PCTV', # or 'web'
'Content-Type': 'application/json',
})['user_identity']
response = self._download_json(
'https://api.hotstar.com/' + path, video_id, headers={
@@ -70,16 +59,19 @@ class HotStarBaseIE(InfoExtractor):
return response['data']
def _call_api(self, path, video_id, query_name='contentId'):
return self._call_api_impl(path, video_id, {
return self._download_json('https://api.hotstar.com/' + path, video_id=video_id, query={
query_name: video_id,
'tas': 10000,
}, headers={
'x-country-code': 'IN',
'x-platform-code': 'PCTV',
})
def _call_api_v2(self, path, video_id, st=None):
def _call_api_v2(self, path, video_id, st=None, cookies=None):
return self._call_api_impl(
'%s/content/%s' % (path, video_id), video_id, st=st, query={
'%s/content/%s' % (path, video_id), video_id, st=st, cookies=cookies, query={
'desired-config': 'audio_channel:stereo|dynamic_range:sdr|encryption:plain|ladder:tv|package:dash|resolution:hd|subs-tag:HotstarVIP|video_codec:vp9',
'device-id': compat_str(uuid.uuid4()),
'device-id': cookies.get('device_id').value if cookies.get('device_id') else compat_str(uuid.uuid4()),
'os-name': 'Windows',
'os-version': '10',
})
@@ -88,15 +80,25 @@ class HotStarBaseIE(InfoExtractor):
class HotStarIE(HotStarBaseIE):
IE_NAME = 'hotstar'
_VALID_URL = r'''(?x)
https?://(?:www\.)?hotstar\.com(?:/in)?/(?!in/)
(?:
tv/(?:[^/?#]+/){3}|
(?!tv/)[^?#]+/
)?
(?P<id>\d{10})
(?:
hotstar\:|
https?://(?:www\.)?hotstar\.com(?:/in)?/(?!in/)
)
(?:
(?P<type>movies|sports|episode|(?P<tv>tv))
(?:
\:|
/[^/?#]+/
(?(tv)
(?:[^/?#]+/){2}|
(?:[^/?#]+/)*
)
)|
[^/?#]+/
)?
(?P<id>\d{10})
'''
_TESTS = [{
# contentData
'url': 'https://www.hotstar.com/can-you-not-spread-rumours/1000076273',
'info_dict': {
'id': '1000076273',
@@ -107,56 +109,89 @@ class HotStarIE(HotStarBaseIE):
'upload_date': '20151111',
'duration': 381,
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
# contentDetail
'url': 'hotstar:1000076273',
'only_matching': True,
}, {
'url': 'https://www.hotstar.com/movies/radha-gopalam/1000057157',
'info_dict': {
'id': '1000057157',
'ext': 'mp4',
'title': 'Radha Gopalam',
'description': 'md5:be3bc342cc120bbc95b3b0960e2b0d22',
'timestamp': 1140805800,
'upload_date': '20060224',
'duration': 9182,
},
}, {
'url': 'hotstar:movies:1000057157',
'only_matching': True,
}, {
'url': 'http://www.hotstar.com/sports/cricket/rajitha-sizzles-on-debut-with-329/2001477583',
'url': 'https://www.hotstar.com/in/sports/cricket/follow-the-blues-2021/recap-eng-fight-back-on-day-2/1260066104',
'only_matching': True,
}, {
'url': 'http://www.hotstar.com/1000000515',
'url': 'https://www.hotstar.com/in/sports/football/most-costly-pl-transfers-ft-grealish/1260065956',
'only_matching': True,
}, {
# contentData
'url': 'hotstar:sports:1260065956',
'only_matching': True,
}, {
# contentData
'url': 'hotstar:sports:1260066104',
'only_matching': True,
}, {
# only available via api v2
'url': 'https://www.hotstar.com/tv/ek-bhram-sarvagun-sampanna/s-2116/janhvi-targets-suman/1000234847',
'info_dict': {
'id': '1000234847',
'ext': 'mp4',
'title': 'Janhvi Targets Suman',
'description': 'md5:78a85509348910bd1ca31be898c5796b',
'timestamp': 1556670600,
'upload_date': '20190501',
'duration': 1219,
'channel': 'StarPlus',
'channel_id': 3,
'series': 'Ek Bhram - Sarvagun Sampanna',
'season': 'Chapter 1',
'season_number': 1,
'season_id': 6771,
'episode': 'Janhvi Targets Suman',
'episode_number': 8,
},
}, {
'url': 'hotstar:episode:1000234847',
'only_matching': True,
}]
_GEO_BYPASS = False
_TYPE = {
'movies': 'movie',
'sports': 'match',
'episode': 'episode',
'tv': 'episode',
None: 'content',
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(url, video_id)
st = urlh.headers.get('x-origin-date')
app_state = self._parse_json(self._search_regex(
r'<script>window\.APP_STATE\s*=\s*({.+?})</script>',
webpage, 'app state'), video_id)
video_data = {}
getters = list(
lambda x, k=k: x['initialState']['content%s' % k]['content']
for k in ('Data', 'Detail')
)
for v in app_state.values():
content = try_get(v, getters, dict)
if content and content.get('contentId') == video_id:
video_data = content
break
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_type = mobj.group('type')
cookies = self._get_cookies(url)
video_type = self._TYPE.get(video_type, video_type)
video_data = self._call_api(f'o/v1/{video_type}/detail', video_id)['body']['results']['item']
title = video_data['title']
if not self.get_param('allow_unplayable_formats') and video_data.get('drmProtected'):
raise ExtractorError('This video is DRM protected.', expected=True)
headers = {'Referer': url}
headers = {'Referer': 'https://www.hotstar.com/in'}
formats = []
subs = {}
geo_restricted = False
_, urlh = self._download_webpage_handle('https://www.hotstar.com/in', video_id)
# Required to fix https://github.com/yt-dlp/yt-dlp/issues/396
st = urlh.headers.get('x-origin-date')
# change to v2 in the future
playback_sets = self._call_api_v2('play/v1/playback', video_id, st=st)['playBackSets']
playback_sets = self._call_api_v2('play/v1/playback', video_id, st=st, cookies=cookies)['playBackSets']
for playback_set in playback_sets:
if not isinstance(playback_set, dict):
continue
@@ -171,13 +206,17 @@ class HotStarIE(HotStarBaseIE):
ext = determine_ext(format_url)
try:
if 'package:hls' in tags or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
hls_formats, hls_subs = self._extract_m3u8_formats_and_subtitles(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native',
m3u8_id='hls', headers=headers))
m3u8_id='hls', headers=headers)
formats.extend(hls_formats)
subs = self._merge_subtitles(subs, hls_subs)
elif 'package:dash' in tags or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', headers=headers))
dash_formats, dash_subs = self._extract_mpd_formats_and_subtitles(
format_url, video_id, mpd_id='dash', headers=headers)
formats.extend(dash_formats)
subs = self._merge_subtitles(subs, dash_subs)
elif ext == 'f4m':
# produce broken files
pass
@@ -205,6 +244,7 @@ class HotStarIE(HotStarBaseIE):
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('broadcastDate') or video_data.get('startDate')),
'formats': formats,
'subtitles': subs,
'channel': video_data.get('channelName'),
'channel_id': video_data.get('channelId'),
'series': video_data.get('showName'),
@@ -233,8 +273,7 @@ class HotStarPlaylistIE(HotStarBaseIE):
def _real_extract(self, url):
playlist_id = self._match_id(url)
collection = self._call_api('o/v1/tray/find', playlist_id, 'uqId')
collection = self._call_api('o/v1/tray/find', playlist_id, 'uqId')['body']['results']
entries = [
self.url_result(
'https://www.hotstar.com/%s' % video['contentId'],
@@ -247,7 +286,7 @@ class HotStarPlaylistIE(HotStarBaseIE):
class HotStarSeriesIE(HotStarBaseIE):
IE_NAME = 'hotstar:series'
_VALID_URL = r'(?:https?://)(?:www\.)?hotstar\.com(?:/in)?/tv/[^/]+/(?P<id>\d{10})$'
_VALID_URL = r'(?:https?://)(?:www\.)?hotstar\.com(?:/in)?/tv/[^/]+/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.hotstar.com/in/tv/radhakrishn/1260000646',
'info_dict': {
@@ -260,6 +299,12 @@ class HotStarSeriesIE(HotStarBaseIE):
'id': '1260050431',
},
'playlist_mincount': 43,
}, {
'url': 'https://www.hotstar.com/in/tv/mahabharat/435/',
'info_dict': {
'id': '435',
},
'playlist_mincount': 269,
}]
def _real_extract(self, url):
@@ -275,7 +320,7 @@ class HotStarSeriesIE(HotStarBaseIE):
video_id=series_id, headers=headers)
entries = [
self.url_result(
'https://www.hotstar.com/%d' % video['contentId'],
'hotstar:episode:%d' % video['contentId'],
ie=HotStarIE.ie_key(), video_id=video['contentId'])
for video in item_json['body']['results']['items']
if video.get('contentId')]

View File

@@ -195,18 +195,23 @@ class InstagramIE(InfoExtractor):
lambda x: x['%ss' % kind]['count'])))
if count is not None:
return count
like_count = get_count('preview_like', 'like')
comment_count = get_count(
('preview_comment', 'to_comment', 'to_parent_comment'), 'comment')
comments = [{
'author': comment.get('user', {}).get('username'),
'author_id': comment.get('user', {}).get('id'),
'id': comment.get('id'),
'text': comment.get('text'),
'timestamp': int_or_none(comment.get('created_at')),
} for comment in media.get(
'comments', {}).get('nodes', []) if comment.get('text')]
comments = []
for comment in try_get(media, lambda x: x['edge_media_to_parent_comment']['edges']):
comment_dict = comment.get('node', {})
comment_text = comment_dict.get('text')
if comment_text:
comments.append({
'author': try_get(comment_dict, lambda x: x['owner']['username']),
'author_id': try_get(comment_dict, lambda x: x['owner']['id']),
'id': comment_dict.get('id'),
'text': comment_text,
'timestamp': int_or_none(comment_dict.get('created_at')),
})
if not video_url:
edges = try_get(
media, lambda x: x['edge_sidecar_to_children']['edges'],

View File

@@ -30,20 +30,20 @@ class MediasetIE(ThePlatformBaseIE):
'''
_TESTS = [{
# full episode
'url': 'https://www.mediasetplay.mediaset.it/video/hellogoodbye/quarta-puntata_FAFU000000661824',
'md5': '9b75534d42c44ecef7bf1ffeacb7f85d',
'url': 'https://www.mediasetplay.mediaset.it/video/mrwronglezionidamore/episodio-1_F310575103000102',
'md5': 'a7e75c6384871f322adb781d3bd72c26',
'info_dict': {
'id': 'FAFU000000661824',
'id': 'F310575103000102',
'ext': 'mp4',
'title': 'Quarta puntata',
'title': 'Episodio 1',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1414.26,
'upload_date': '20161107',
'series': 'Hello Goodbye',
'timestamp': 1478532900,
'uploader': 'Rete 4',
'uploader_id': 'R4',
'duration': 2682.0,
'upload_date': '20210530',
'series': 'Mr Wrong - Lezioni d\'amore',
'timestamp': 1622413946,
'uploader': 'Canale 5',
'uploader_id': 'C5',
},
}, {
'url': 'https://www.mediasetplay.mediaset.it/video/matrix/puntata-del-25-maggio_F309013801000501',
@@ -54,10 +54,10 @@ class MediasetIE(ThePlatformBaseIE):
'title': 'Puntata del 25 maggio',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 6565.007,
'upload_date': '20180526',
'duration': 6565.008,
'upload_date': '20200903',
'series': 'Matrix',
'timestamp': 1527326245,
'timestamp': 1599172492,
'uploader': 'Canale 5',
'uploader_id': 'C5',
},
@@ -135,36 +135,38 @@ class MediasetIE(ThePlatformBaseIE):
formats = []
subtitles = {}
first_e = None
for asset_type in ('SD', 'HD'):
# TODO: fixup ISM+none manifest URLs
for f in ('MPEG4', 'MPEG-DASH+none', 'M3U+none'):
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query('http://link.theplatform.%s/s/%s' % (self._TP_TLD, tp_path), {
'mbr': 'true',
'formats': f,
'assetTypes': asset_type,
}), guid, 'Downloading %s %s SMIL data' % (f.split('+')[0], asset_type))
except ExtractorError as e:
if not first_e:
first_e = e
break
for tp_f in tp_formats:
tp_f['quality'] = 1 if asset_type == 'HD' else 0
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
asset_type = 'HD,browser,geoIT|SD,browser,geoIT|geoNo:HD,browser,geoIT|geoNo:SD,browser,geoIT|geoNo'
# TODO: fixup ISM+none manifest URLs
for f in ('MPEG4', 'MPEG-DASH+none', 'M3U+none'):
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query('http://link.theplatform.%s/s/%s' % (self._TP_TLD, tp_path), {
'mbr': 'true',
'formats': f,
'assetTypes': asset_type,
}), guid, 'Downloading %s SMIL data' % (f.split('+')[0]))
except ExtractorError as e:
if not first_e:
first_e = e
break
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
if first_e and not formats:
raise first_e
self._sort_formats(formats)
fields = []
for templ, repls in (('tvSeason%sNumber', ('', 'Episode')), ('mediasetprogram$%s', ('brandTitle', 'numberOfViews', 'publishInfo'))):
fields.extend(templ % repl for repl in repls)
feed_data = self._download_json(
'https://feed.entertainment.tv.theplatform.eu/f/PR1GhC/mediaset-prod-all-programs/guid/-/' + guid,
guid, fatal=False, query={'fields': ','.join(fields)})
'https://feed.entertainment.tv.theplatform.eu/f/PR1GhC/mediaset-prod-all-programs-v2/guid/-/' + guid,
guid, fatal=False)
if feed_data:
publish_info = feed_data.get('mediasetprogram$publishInfo') or {}
thumbnails = feed_data.get('thumbnails') or {}
thumbnail = None
for key, value in thumbnails.items():
if key.startswith('image_keyframe_poster-'):
thumbnail = value.get('url')
break
info.update({
'episode_number': int_or_none(feed_data.get('tvSeasonEpisodeNumber')),
'season_number': int_or_none(feed_data.get('tvSeasonNumber')),
@@ -172,6 +174,7 @@ class MediasetIE(ThePlatformBaseIE):
'uploader': publish_info.get('description'),
'uploader_id': publish_info.get('channel'),
'view_count': int_or_none(feed_data.get('mediasetprogram$numberOfViews')),
'thumbnail': thumbnail,
})
info.update({

View File

@@ -0,0 +1,134 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
dict_get,
traverse_obj,
try_get,
)
class MirrativBaseIE(InfoExtractor):
def assert_error(self, response):
error_message = traverse_obj(response, ('status', 'error'))
if error_message:
raise ExtractorError('Mirrativ says: %s' % error_message, expected=True)
class MirrativIE(MirrativBaseIE):
IE_NAME = 'mirrativ'
_VALID_URL = r'https?://(?:www\.)?mirrativ\.com/live/(?P<id>[^/?#&]+)'
LIVE_API_URL = 'https://www.mirrativ.com/api/live/live?live_id=%s'
TESTS = [{
'url': 'https://mirrativ.com/live/POxyuG1KmW2982lqlDTuPw',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage('https://www.mirrativ.com/live/%s' % video_id, video_id)
live_response = self._download_json(self.LIVE_API_URL % video_id, video_id)
self.assert_error(live_response)
hls_url = dict_get(live_response, ('archive_url_hls', 'streaming_url_hls'))
is_live = bool(live_response.get('is_live'))
was_live = bool(live_response.get('is_archive'))
if not hls_url:
raise ExtractorError('Neither archive nor live is available.', expected=True)
formats = self._extract_m3u8_formats(
hls_url, video_id,
ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls', live=is_live)
rtmp_url = live_response.get('streaming_url_edge')
if rtmp_url:
keys_to_copy = ('width', 'height', 'vcodec', 'acodec', 'tbr')
fmt = {
'format_id': 'rtmp',
'url': rtmp_url,
'protocol': 'rtmp',
'ext': 'mp4',
}
fmt.update({k: traverse_obj(formats, (0, k)) for k in keys_to_copy})
formats.append(fmt)
self._sort_formats(formats)
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<title>\s*(.+?) - Mirrativ\s*</title>', webpage) or live_response.get('title')
description = live_response.get('description')
thumbnail = live_response.get('image_url')
duration = try_get(live_response, lambda x: x['ended_at'] - x['started_at'])
view_count = live_response.get('total_viewer_num')
release_timestamp = live_response.get('started_at')
timestamp = live_response.get('created_at')
owner = live_response.get('owner', {})
uploader = owner.get('name')
uploader_id = owner.get('user_id')
return {
'id': video_id,
'title': title,
'is_live': is_live,
'description': description,
'formats': formats,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'view_count': view_count,
'release_timestamp': release_timestamp,
'timestamp': timestamp,
'was_live': was_live,
}
class MirrativUserIE(MirrativBaseIE):
IE_NAME = 'mirrativ:user'
_VALID_URL = r'https?://(?:www\.)?mirrativ\.com/user/(?P<id>\d+)'
LIVE_HISTORY_API_URL = 'https://www.mirrativ.com/api/live/live_history?user_id=%s&page=%d'
USER_INFO_API_URL = 'https://www.mirrativ.com/api/user/profile?user_id=%s'
_TESTS = [{
# Live archive is available up to 3 days
# see: https://helpfeel.com/mirrativ/%E9%8C%B2%E7%94%BB-5e26d3ad7b59ef0017fb49ac (Japanese)
'url': 'https://www.mirrativ.com/user/110943130',
'note': 'multiple archives available',
'only_matching': True,
}]
def _entries(self, user_id):
page = 1
while page is not None:
api_response = self._download_json(
self.LIVE_HISTORY_API_URL % (user_id, page), user_id,
note='Downloading page %d' % page)
self.assert_error(api_response)
lives = api_response.get('lives')
if not lives:
break
for live in lives:
if not live.get('is_archive') and not live.get('is_live'):
# neither archive nor live is available, so skip it
# or the service will ban your IP address for a while
continue
live_id = live.get('live_id')
url = 'https://www.mirrativ.com/live/%s' % live_id
yield self.url_result(url, video_id=live_id, video_title=live.get('title'))
page = api_response.get('next_page')
def _real_extract(self, url):
user_id = self._match_id(url)
user_info = self._download_json(
self.USER_INFO_API_URL % user_id, user_id,
note='Downloading user info', fatal=False)
self.assert_error(user_info)
uploader = user_info.get('name')
description = user_info.get('description')
entries = self._entries(user_id)
return self.playlist_result(entries, user_id, uploader, description)

View File

@@ -110,10 +110,15 @@ class MxplayerIE(InfoExtractor):
for frmt in dash_formats:
frmt['quality'] = get_quality(quality)
formats.extend(dash_formats)
dash_formats_h265 = self._extract_mpd_formats(
format_url.replace('h264_high', 'h265_main'), video_id, mpd_id='dash-%s' % quality, headers={'Referer': url}, fatal=False)
for frmt in dash_formats_h265:
frmt['quality'] = get_quality(quality)
formats.extend(dash_formats_h265)
elif stream_type == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, fatal=False,
m3u8_id='hls-%s' % quality, quality=get_quality(quality)))
m3u8_id='hls-%s' % quality, quality=get_quality(quality), ext='mp4'))
self._sort_formats(formats)
return {

View File

@@ -12,6 +12,7 @@ from ..utils import (
int_or_none,
parse_age_limit,
parse_duration,
RegexNotFoundError,
smuggle_url,
try_get,
unified_timestamp,
@@ -460,7 +461,7 @@ class NBCNewsIE(ThePlatformIE):
class NBCOlympicsIE(InfoExtractor):
IE_NAME = 'nbcolympics'
_VALID_URL = r'https?://www\.nbcolympics\.com/video/(?P<id>[a-z-]+)'
_VALID_URL = r'https?://www\.nbcolympics\.com/videos?/(?P<id>[0-9a-z-]+)'
_TEST = {
# Geo-restricted to US
@@ -483,13 +484,18 @@ class NBCOlympicsIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
drupal_settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), display_id)
try:
drupal_settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), display_id)
iframe_url = drupal_settings['vod']['iframe_url']
theplatform_url = iframe_url.replace(
'vplayer.nbcolympics.com', 'player.theplatform.com')
iframe_url = drupal_settings['vod']['iframe_url']
theplatform_url = iframe_url.replace(
'vplayer.nbcolympics.com', 'player.theplatform.com')
except RegexNotFoundError:
theplatform_url = self._search_regex(
r"([\"'])embedUrl\1: *([\"'])(?P<embedUrl>.+)\2",
webpage, 'embedding URL', group="embedUrl")
return {
'_type': 'url_transparent',
@@ -502,43 +508,79 @@ class NBCOlympicsIE(InfoExtractor):
class NBCOlympicsStreamIE(AdobePassIE):
IE_NAME = 'nbcolympics:stream'
_VALID_URL = r'https?://stream\.nbcolympics\.com/(?P<id>[0-9a-z-]+)'
_TEST = {
'url': 'http://stream.nbcolympics.com/2018-winter-olympics-nbcsn-evening-feb-8',
'info_dict': {
'id': '203493',
'ext': 'mp4',
'title': 're:Curling, Alpine, Luge [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
_TESTS = [
{
'note': 'Tokenized m3u8 source URL',
'url': 'https://stream.nbcolympics.com/womens-soccer-group-round-11',
'info_dict': {
'id': '2019740',
'ext': 'mp4',
'title': r"re:Women's Group Stage - Netherlands vs\. Brazil [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$",
},
'params': {
'skip_download': 'm3u8',
},
}, {
'note': 'Plain m3u8 source URL',
'url': 'https://stream.nbcolympics.com/gymnastics-event-finals-mens-floor-pommel-horse-womens-vault-bars',
'info_dict': {
'id': '2021729',
'ext': 'mp4',
'title': r're:Event Finals: M Floor, W Vault, M Pommel, W Uneven Bars [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
},
'params': {
'skip_download': 'm3u8',
},
},
'params': {
# m3u8 download
'skip_download': True,
},
}
_DATA_URL_TEMPLATE = 'http://stream.nbcolympics.com/data/%s_%s.json'
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
pid = self._search_regex(r'pid\s*=\s*(\d+);', webpage, 'pid')
resource = self._search_regex(
r"resource\s*=\s*'(.+)';", webpage,
'resource').replace("' + pid + '", pid)
event_config = self._download_json(
self._DATA_URL_TEMPLATE % ('event_config', pid),
pid)['eventConfig']
title = self._live_title(event_config['eventTitle'])
f'http://stream.nbcolympics.com/data/event_config_{pid}.json',
pid, 'Downloading event config')['eventConfig']
title = event_config['eventTitle']
is_live = {'live': True, 'replay': False}.get(event_config.get('eventStatus'))
if is_live:
title = self._live_title(title)
source_url = self._download_json(
self._DATA_URL_TEMPLATE % ('live_sources', pid),
pid)['videoSources'][0]['sourceUrl']
media_token = self._extract_mvpd_auth(
url, pid, event_config.get('requestorId', 'NBCOlympics'), resource)
formats = self._extract_m3u8_formats(self._download_webpage(
'http://sp.auth.adobe.com/tvs/v1/sign', pid, query={
'cdn': 'akamai',
'mediaToken': base64.b64encode(media_token.encode()),
'resource': base64.b64encode(resource.encode()),
'url': source_url,
}), pid, 'mp4')
f'https://api-leap.nbcsports.com/feeds/assets/{pid}?application=NBCOlympics&platform=desktop&format=nbc-player&env=staging',
pid, 'Downloading leap config'
)['videoSources'][0]['cdnSources']['primary'][0]['sourceUrl']
if event_config.get('cdnToken'):
ap_resource = self._get_mvpd_resource(
event_config.get('resourceId', 'NBCOlympics'),
re.sub(r'[^\w\d ]+', '', event_config['eventTitle']), pid,
event_config.get('ratingId', 'NO VALUE'))
media_token = self._extract_mvpd_auth(url, pid, event_config.get('requestorId', 'NBCOlympics'), ap_resource)
source_url = self._download_json(
'https://tokens.playmakerservices.com/', pid, 'Retrieving tokenized URL',
data=json.dumps({
'application': 'NBCSports',
'authentication-type': 'adobe-pass',
'cdn': 'akamai',
'pid': pid,
'platform': 'desktop',
'requestorId': 'NBCOlympics',
'resourceId': base64.b64encode(ap_resource.encode()).decode(),
'token': base64.b64encode(media_token.encode()).decode(),
'url': source_url,
'version': 'v1',
}).encode(),
)['akamai'][0]['tokenizedUrl']
formats = self._extract_m3u8_formats(source_url, pid, 'mp4', live=is_live)
for f in formats:
# -http_seekable requires ffmpeg 4.3+ but it doesnt seem possible to
# download with ffmpeg without this option
f['_ffmpeg_args'] = ['-seekable', '0', '-http_seekable', '0', '-icy', '0']
self._sort_formats(formats)
return {
@@ -546,5 +588,5 @@ class NBCOlympicsStreamIE(AdobePassIE):
'display_id': display_id,
'title': title,
'formats': formats,
'is_live': True,
'is_live': is_live,
}

View File

@@ -4,9 +4,9 @@ import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
extract_attributes,
int_or_none,
parse_count,
parse_duration,
parse_filesize,
unified_timestamp,
@@ -14,18 +14,19 @@ from ..utils import (
class NewgroundsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>\d+)(?:/format/flash)?'
_TESTS = [{
'url': 'https://www.newgrounds.com/audio/listen/549479',
'md5': 'fe6033d297591288fa1c1f780386f07a',
'info_dict': {
'id': '549479',
'ext': 'mp3',
'title': 'Burn7 - B7 - BusMode',
'title': 'B7 - BusMode',
'uploader': 'Burn7',
'timestamp': 1378878540,
'upload_date': '20130911',
'duration': 143,
'description': 'md5:6d885138814015dfd656c2ddb00dacfc',
},
}, {
'url': 'https://www.newgrounds.com/portal/view/1',
@@ -33,10 +34,11 @@ class NewgroundsIE(InfoExtractor):
'info_dict': {
'id': '1',
'ext': 'mp4',
'title': 'Brian-Beaton - Scrotum 1',
'title': 'Scrotum 1',
'uploader': 'Brian-Beaton',
'timestamp': 955064100,
'upload_date': '20000406',
'description': 'Scrotum plays "catch."',
},
}, {
# source format unavailable, additional mp4 formats
@@ -44,14 +46,39 @@ class NewgroundsIE(InfoExtractor):
'info_dict': {
'id': '689400',
'ext': 'mp4',
'title': 'Bennettthesage - ZTV News Episode 8',
'uploader': 'BennettTheSage',
'title': 'ZTV News Episode 8',
'uploader': 'ZONE-SAMA',
'timestamp': 1487965140,
'upload_date': '20170224',
'description': 'ZTV News Episode 8 (February 2017)',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.newgrounds.com/portal/view/297383',
'md5': '2c11f5fd8cb6b433a63c89ba3141436c',
'info_dict': {
'id': '297383',
'ext': 'mp4',
'title': 'Metal Gear Awesome',
'uploader': 'Egoraptor',
'timestamp': 1140663240,
'upload_date': '20060223',
'description': 'Metal Gear is awesome is so is this movie.',
}
}, {
'url': 'https://www.newgrounds.com/portal/view/297383/format/flash',
'md5': '5d05585a9a0caca059f5abfbd3865524',
'info_dict': {
'id': '297383',
'ext': 'swf',
'title': 'Metal Gear Awesome',
'description': 'Metal Gear is awesome is so is this movie.',
'uploader': 'Egoraptor',
'upload_date': '20060223',
'timestamp': 1140663240,
}
}]
def _real_extract(self, url):
@@ -73,38 +100,14 @@ class NewgroundsIE(InfoExtractor):
'format_id': 'source',
'quality': 1,
}]
max_resolution = int_or_none(self._search_regex(
r'max_resolution["\']\s*:\s*(\d+)', webpage, 'max resolution',
default=None))
if max_resolution:
url_base = media_url.rpartition('.')[0]
for resolution in (360, 720, 1080):
if resolution > max_resolution:
break
formats.append({
'url': '%s.%dp.mp4' % (url_base, resolution),
'format_id': '%dp' % resolution,
'height': resolution,
})
else:
video_id = int_or_none(self._search_regex(
r'data-movie-id=\\"([0-9]+)\\"', webpage, ''))
if not video_id:
raise ExtractorError('Could not extract media data')
url_video_data = 'https://www.newgrounds.com/portal/video/%s' % video_id
headers = {
json_video = self._download_json('https://www.newgrounds.com/portal/video/' + media_id, media_id, headers={
'Accept': 'application/json',
'Referer': url,
'X-Requested-With': 'XMLHttpRequest'
}
json_video = self._download_json(url_video_data, video_id, headers=headers, fatal=False)
if not json_video:
raise ExtractorError('Could not fetch media data')
})
uploader = json_video.get('author')
title = json_video.get('title')
media_formats = json_video.get('sources', [])
for media_format in media_formats:
media_sources = media_formats[media_format]
@@ -115,9 +118,6 @@ class NewgroundsIE(InfoExtractor):
'url': source.get('src')
})
self._check_formats(formats, media_id)
self._sort_formats(formats)
if not uploader:
uploader = self._html_search_regex(
(r'(?s)<h4[^>]*>(.+?)</h4>.*?<em>\s*(?:Author|Artist)\s*</em>',
@@ -132,6 +132,9 @@ class NewgroundsIE(InfoExtractor):
r'(?s)<dd>\s*Song\s*</dd>\s*<dd>.+?</dd>\s*<dd>([^<]+)', webpage,
'duration', default=None))
view_count = parse_count(self._html_search_regex(r'(?s)<dt>\s*Views\s*</dt>\s*<dd>([\d\.,]+)</dd>', webpage,
'view_count', fatal=False, default=None))
filesize_approx = parse_filesize(self._html_search_regex(
r'(?s)<dd>\s*Song\s*</dd>\s*<dd>(.+?)</dd>', webpage, 'filesize',
default=None))
@@ -140,9 +143,8 @@ class NewgroundsIE(InfoExtractor):
if '<dd>Song' in webpage:
formats[0]['vcodec'] = 'none'
if uploader:
title = "%s - %s" % (uploader, title)
self._check_formats(formats, media_id)
self._sort_formats(formats)
return {
'id': media_id,
@@ -151,6 +153,9 @@ class NewgroundsIE(InfoExtractor):
'timestamp': timestamp,
'duration': duration,
'formats': formats,
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage),
'view_count': view_count,
}
@@ -162,14 +167,14 @@ class NewgroundsPlaylistIE(InfoExtractor):
'id': 'cats',
'title': 'Cats',
},
'playlist_mincount': 46,
'playlist_mincount': 45,
}, {
'url': 'http://www.newgrounds.com/portal/search/author/ZONE-SAMA',
'url': 'https://www.newgrounds.com/collection/dogs',
'info_dict': {
'id': 'ZONE-SAMA',
'title': 'Portal Search: ZONE-SAMA',
'id': 'dogs',
'title': 'Dogs',
},
'playlist_mincount': 47,
'playlist_mincount': 26,
}, {
'url': 'http://www.newgrounds.com/audio/search/title/cats',
'only_matching': True,
@@ -190,7 +195,7 @@ class NewgroundsPlaylistIE(InfoExtractor):
entries = []
for a, path, media_id in re.findall(
r'(<a[^>]+\bhref=["\']/?((?:portal/view|audio/listen)/(\d+))[^>]+>)',
r'(<a[^>]+\bhref=["\'][^"\']+((?:portal/view|audio/listen)/(\d+))[^>]+>)',
webpage):
a_class = extract_attributes(a).get('class')
if a_class not in ('item-portalsubmission', 'item-audiosubmission'):

View File

@@ -13,16 +13,16 @@ from ..compat import (
compat_urllib_parse_urlparse,
)
from ..utils import (
dict_get,
ExtractorError,
int_or_none,
dict_get,
float_or_none,
int_or_none,
OnDemandPagedList,
parse_duration,
parse_iso8601,
PostProcessingError,
str_or_none,
remove_start,
str_or_none,
try_get,
unified_timestamp,
urlencode_postdata,

126
yt_dlp/extractor/openrec.py Normal file
View File

@@ -0,0 +1,126 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
traverse_obj,
try_get,
unified_strdate
)
from ..compat import compat_str
class OpenRecIE(InfoExtractor):
IE_NAME = 'openrec'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/live/(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.openrec.tv/live/2p8v31qe4zy',
'only_matching': True,
}, {
'url': 'https://www.openrec.tv/live/wez93eqvjzl',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage('https://www.openrec.tv/live/%s' % video_id, video_id)
window_stores = self._parse_json(
self._search_regex(r'(?m)window\.pageStore\s*=\s*(\{.+?\});$', webpage, 'window.pageStore'), video_id)
movie_store = traverse_obj(
window_stores,
('v8', 'state', 'movie'),
('v8', 'movie'),
expected_type=dict)
if not movie_store:
raise ExtractorError('Failed to extract live info')
title = movie_store.get('title')
description = movie_store.get('introduction')
thumbnail = movie_store.get('thumbnailUrl')
channel_user = movie_store.get('channel', {}).get('user')
uploader = try_get(channel_user, lambda x: x['name'], compat_str)
uploader_id = try_get(channel_user, lambda x: x['id'], compat_str)
timestamp = traverse_obj(movie_store, ('startedAt', 'time'), expected_type=int)
m3u8_playlists = movie_store.get('media')
formats = []
for (name, m3u8_url) in m3u8_playlists.items():
if not m3u8_url:
continue
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8',
m3u8_id='hls-%s' % name, live=True))
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
'uploader': uploader,
'uploader_id': uploader_id,
'timestamp': timestamp,
'is_live': True,
}
class OpenRecCaptureIE(InfoExtractor):
IE_NAME = 'openrec:capture'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/capture/(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.openrec.tv/capture/l9nk2x4gn14',
'only_matching': True,
}, {
'url': 'https://www.openrec.tv/capture/mldjr82p7qk',
'info_dict': {
'id': 'mldjr82p7qk',
'title': 'たいじの恥ずかしい英語力',
'uploader': 'たいちゃんねる',
'uploader_id': 'Yaritaiji',
'upload_date': '20210803',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage('https://www.openrec.tv/capture/%s' % video_id, video_id)
window_stores = self._parse_json(
self._search_regex(r'(?m)window\.pageStore\s*=\s*(\{.+?\});$', webpage, 'window.pageStore'), video_id)
movie_store = window_stores.get('movie')
capture_data = window_stores.get('capture')
if not capture_data:
raise ExtractorError('Cannot extract title')
title = capture_data.get('title')
thumbnail = capture_data.get('thumbnailUrl')
upload_date = unified_strdate(capture_data.get('createdAt'))
channel_info = movie_store.get('channel') or {}
uploader = channel_info.get('name')
uploader_id = channel_info.get('id')
m3u8_url = capture_data.get('source')
if not m3u8_url:
raise ExtractorError('Cannot extract m3u8 url')
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date,
}

View File

@@ -0,0 +1,145 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .cbs import CBSBaseIE
from ..utils import (
int_or_none,
url_or_none,
)
class ParamountPlusIE(CBSBaseIE):
_VALID_URL = r'''(?x)
(?:
paramountplus:|
https?://(?:www\.)?(?:
paramountplus\.com/(?:shows/[^/]+/video|movies/[^/]+)/
)(?P<id>[\w-]+))'''
# All tests are blocked outside US
_TESTS = [{
'url': 'https://www.paramountplus.com/shows/catdog/video/Oe44g5_NrlgiZE3aQVONleD6vXc8kP0k/catdog-climb-every-catdog-the-canine-mutiny/',
'info_dict': {
'id': 'Oe44g5_NrlgiZE3aQVONleD6vXc8kP0k',
'ext': 'mp4',
'title': 'CatDog - Climb Every CatDog/The Canine Mutiny',
'description': 'md5:7ac835000645a69933df226940e3c859',
'duration': 1418,
'timestamp': 920264400,
'upload_date': '19990301',
'uploader': 'CBSI-NEW',
},
'params': {
'skip_download': 'm3u8',
},
}, {
'url': 'https://www.paramountplus.com/shows/tooning-out-the-news/video/6hSWYWRrR9EUTz7IEe5fJKBhYvSUfexd/7-23-21-week-in-review-rep-jahana-hayes-howard-fineman-sen-michael-bennet-sheera-frenkel-cecilia-kang-/',
'info_dict': {
'id': '6hSWYWRrR9EUTz7IEe5fJKBhYvSUfexd',
'ext': 'mp4',
'title': '7/23/21 WEEK IN REVIEW (Rep. Jahana Hayes/Howard Fineman/Sen. Michael Bennet/Sheera Frenkel & Cecilia Kang)',
'description': 'md5:f4adcea3e8b106192022e121f1565bae',
'duration': 2506,
'timestamp': 1627063200,
'upload_date': '20210723',
'uploader': 'CBSI-NEW',
},
'params': {
'skip_download': 'm3u8',
},
}, {
'url': 'https://www.paramountplus.com/movies/daddys-home/vM2vm0kE6vsS2U41VhMRKTOVHyQAr6pC',
'info_dict': {
'id': 'vM2vm0kE6vsS2U41VhMRKTOVHyQAr6pC',
'ext': 'mp4',
'title': 'Daddy\'s Home',
'upload_date': '20151225',
'description': 'md5:a0beaf24e8d3b0e81b2ee41d47c06f33',
'uploader': 'CBSI-NEW',
'timestamp': 1451030400,
},
'params': {
'skip_download': 'm3u8',
'format': 'bestvideo',
},
'expected_warnings': ['Ignoring subtitle tracks'], # TODO: Investigate this
}, {
'url': 'https://www.paramountplus.com/movies/sonic-the-hedgehog/5EKDXPOzdVf9voUqW6oRuocyAEeJGbEc',
'info_dict': {
'id': '5EKDXPOzdVf9voUqW6oRuocyAEeJGbEc',
'ext': 'mp4',
'uploader': 'CBSI-NEW',
'description': 'md5:bc7b6fea84ba631ef77a9bda9f2ff911',
'timestamp': 1577865600,
'title': 'Sonic the Hedgehog',
'upload_date': '20200101',
},
'params': {
'skip_download': 'm3u8',
'format': 'bestvideo',
},
'expected_warnings': ['Ignoring subtitle tracks'],
}, {
'url': 'https://www.paramountplus.com/shows/all-rise/video/QmR1WhNkh1a_IrdHZrbcRklm176X_rVc/all-rise-space/',
'only_matching': True,
}, {
'url': 'https://www.paramountplus.com/movies/million-dollar-american-princesses-meghan-and-harry/C0LpgNwXYeB8txxycdWdR9TjxpJOsdCq',
'only_matching': True,
}]
def _extract_video_info(self, content_id, mpx_acc=2198311517):
items_data = self._download_json(
'https://www.paramountplus.com/apps-api/v2.0/androidtv/video/cid/%s.json' % content_id,
content_id, query={'locale': 'en-us', 'at': 'ABCqWNNSwhIqINWIIAG+DFzcFUvF8/vcN6cNyXFFfNzWAIvXuoVgX+fK4naOC7V8MLI='})
asset_types = {
item.get('assetType'): {
'format': 'SMIL',
'formats': 'MPEG4,M3U',
} for item in items_data['itemList']
}
item = items_data['itemList'][-1]
return self._extract_common_video_info(content_id, asset_types, mpx_acc, extra_info={
'title': item.get('title'),
'series': item.get('seriesTitle'),
'season_number': int_or_none(item.get('seasonNum')),
'episode_number': int_or_none(item.get('episodeNum')),
'duration': int_or_none(item.get('duration')),
'thumbnail': url_or_none(item.get('thumbnail')),
})
class ParamountPlusSeriesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?paramountplus\.com/shows/(?P<id>[a-zA-Z0-9-_]+)/?(?:[#?]|$)'
_TESTS = [{
'url': 'https://www.paramountplus.com/shows/drake-josh',
'playlist_mincount': 50,
'info_dict': {
'id': 'drake-josh',
}
}, {
'url': 'https://www.paramountplus.com/shows/hawaii_five_0/',
'playlist_mincount': 240,
'info_dict': {
'id': 'hawaii_five_0',
}
}, {
'url': 'https://www.paramountplus.com/shows/spongebob-squarepants/',
'playlist_mincount': 248,
'info_dict': {
'id': 'spongebob-squarepants',
}
}]
_API_URL = 'https://www.paramountplus.com/shows/{}/xhr/episodes/page/0/size/100000/xs/0/season/0/'
def _entries(self, show_name):
show_json = self._download_json(self._API_URL.format(show_name), video_id=show_name)
if show_json.get('success'):
for episode in show_json['result']['data']:
yield self.url_result(
'https://www.paramountplus.com%s' % episode['url'],
ie=ParamountPlusIE.ie_key(), video_id=episode['content_id'])
def _real_extract(self, url):
show_name = self._match_id(url)
return self.playlist_result(self._entries(show_name), playlist_id=show_name)

View File

@@ -427,7 +427,7 @@ class PeerTubeIE(InfoExtractor):
''' % (_INSTANCES_RE, _UUID_RE)
_TESTS = [{
'url': 'https://framatube.org/videos/watch/9c9de5e8-0a1e-484a-b099-e80766180a6d',
'md5': '9bed8c0137913e17b86334e5885aacff',
'md5': '8563064d245a4be5705bddb22bb00a28',
'info_dict': {
'id': '9c9de5e8-0a1e-484a-b099-e80766180a6d',
'ext': 'mp4',
@@ -570,7 +570,7 @@ class PeerTubeIE(InfoExtractor):
self._sort_formats(formats)
description = video.get('description')
if len(description) >= 250:
if description and len(description) >= 250:
# description is shortened
full_description = self._call_api(
host, video_id, 'description', note='Downloading description JSON',

View File

@@ -2,14 +2,16 @@
from __future__ import unicode_literals
import itertools
import json
import random
import re
from .openload import PhantomJSwrapper
import time
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
dict_get,
ExtractorError,
RegexNotFoundError,
strip_or_none,
try_get
)
@@ -30,7 +32,7 @@ class RCTIPlusBaseIE(InfoExtractor):
class RCTIPlusIE(RCTIPlusBaseIE):
_VALID_URL = r'https://www\.rctiplus\.com/programs/\d+?/.*?/(?P<type>episode|clip|extra)/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
_VALID_URL = r'https://www\.rctiplus\.com/(?:programs/\d+?/.*?/)?(?P<type>episode|clip|extra|live-event|missed-event)/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.rctiplus.com/programs/1259/kiko-untuk-lola/episode/22124/untuk-lola',
'md5': '56ed45affad45fa18d5592a1bc199997',
@@ -87,33 +89,93 @@ class RCTIPlusIE(RCTIPlusBaseIE):
'params': {
'format': 'bestvideo',
},
}, { # Missed event/replay
'url': 'https://www.rctiplus.com/missed-event/2507/mou-signing-ceremony-27-juli-2021-1400-wib',
'md5': '649c5f27250faed1452ca8b91e06922d',
'info_dict': {
'id': 'v_pe2507',
'title': 'MOU Signing Ceremony | 27 Juli 2021 | 14.00 WIB',
'display_id': 'mou-signing-ceremony-27-juli-2021-1400-wib',
'ext': 'mp4',
'timestamp': 1627142400,
'upload_date': '20210724',
'was_live': True,
'release_timestamp': 1627369200,
},
'params': {
'fixup': 'never',
},
}, { # Live event; Cloudfront CDN
'url': 'https://www.rctiplus.com/live-event/2530/dai-muda-charging-imun-dengan-iman-4-agustus-2021-1600-wib',
'info_dict': {
'id': 'v_le2530',
'title': 'Dai Muda : Charging Imun dengan Iman | 4 Agustus 2021 | 16.00 WIB',
'display_id': 'dai-muda-charging-imun-dengan-iman-4-agustus-2021-1600-wib',
'ext': 'mp4',
'timestamp': 1627898400,
'upload_date': '20210802',
'release_timestamp': 1628067600,
},
'params': {
'skip_download': True,
},
'skip': 'This live event has ended.',
}, { # TV; live_at is null
'url': 'https://www.rctiplus.com/live-event/1/rcti',
'info_dict': {
'id': 'v_lt1',
'title': 'RCTI',
'display_id': 'rcti',
'ext': 'mp4',
'timestamp': 1546344000,
'upload_date': '20190101',
'is_live': True,
},
'params': {
'skip_download': True,
'format': 'bestvideo',
},
}]
def _search_auth_key(self, webpage):
try:
self._AUTH_KEY = self._search_regex(
r'\'Authorization\':"(?P<auth>[^"]+)"', webpage, 'auth-key')
except RegexNotFoundError:
pass
_CONVIVA_JSON_TEMPLATE = {
't': 'CwsSessionHb',
'cid': 'ff84ae928c3b33064b76dec08f12500465e59a6f',
'clid': '0',
'sid': 0,
'seq': 0,
'caps': 0,
'sf': 7,
'sdk': True,
}
def _real_extract(self, url):
video_type, video_id, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
self._search_auth_key(webpage)
match = re.match(self._VALID_URL, url).groupdict()
video_type, video_id, display_id = match['type'], match['id'], match['display_id']
url_api_version = 'v2' if video_type == 'missed-event' else 'v1'
appier_id = '23984824_' + str(random.randint(0, 10000000000)) # Based on the webpage's uuidRandom generator
video_json = self._call_api(
'https://api.rctiplus.com/api/v1/%s/%s/url?appierid=.1' % (video_type, video_id), display_id, 'Downloading video URL JSON')[0]
f'https://api.rctiplus.com/api/{url_api_version}/{video_type}/{video_id}/url?appierid={appier_id}', display_id, 'Downloading video URL JSON')[0]
video_url = video_json['url']
is_upcoming = try_get(video_json, lambda x: x['current_date'] < x['live_at'])
if is_upcoming is None:
is_upcoming = try_get(video_json, lambda x: x['current_date'] < x['start_date'])
if is_upcoming:
self.raise_no_formats(
'This event will start at %s.' % video_json['live_label'] if video_json.get('live_label') else 'This event has not started yet.', expected=True)
if 'akamaized' in video_url:
# Akamai's CDN requires a session to at least be made via Conviva's API
# TODO: Reverse-engineer Conviva's heartbeat code to avoid phantomJS
phantom = None
try:
phantom = PhantomJSwrapper(self)
phantom.get(url, webpage, display_id, note2='Initiating video session')
except ExtractorError:
self.report_warning('PhantomJS is highly recommended for this video, as it might load incredibly slowly otherwise.'
'You can also try opening the page in this device\'s browser first')
# For some videos hosted on Akamai's CDN (possibly AES-encrypted ones?), a session needs to at least be made via Conviva's API
conviva_json_data = {
**self._CONVIVA_JSON_TEMPLATE,
'url': video_url,
'sst': int(time.time())
}
conviva_json_res = self._download_json(
'https://ff84ae928c3b33064b76dec08f12500465e59a6f.cws.conviva.com/0/wsg', display_id,
'Creating Conviva session', 'Failed to create Conviva session',
fatal=False, data=json.dumps(conviva_json_data).encode('utf-8'))
if conviva_json_res and conviva_json_res.get('err') != 'ok':
self.report_warning('Conviva said: %s' % str(conviva_json_res.get('err')))
video_meta, meta_paths = self._call_api(
'https://api.rctiplus.com/api/v1/%s/%s' % (video_type, video_id), display_id, 'Downloading video metadata')
@@ -129,22 +191,27 @@ class RCTIPlusIE(RCTIPlusBaseIE):
'id': 'landscape_image',
'url': '%s%d%s' % (image_path, 2000, video_meta['landscape_image'])
})
formats = self._extract_m3u8_formats(video_url, display_id, 'mp4', headers={'Referer': 'https://www.rctiplus.com/'})
try:
formats = self._extract_m3u8_formats(video_url, display_id, 'mp4', headers={'Referer': 'https://www.rctiplus.com/'})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
self.raise_geo_restricted(countries=['ID'], metadata_available=True)
else:
raise e
for f in formats:
if 'akamaized' in f['url']:
f.setdefault('http_headers', {})['Referer'] = 'https://www.rctiplus.com/' # Referer header is required for akamai CDNs
if 'akamaized' in f['url'] or 'cloudfront' in f['url']:
f.setdefault('http_headers', {})['Referer'] = 'https://www.rctiplus.com/' # Referer header is required for akamai/cloudfront CDNs
self._sort_formats(formats)
return {
'id': video_meta.get('product_id') or video_json.get('product_id'),
'title': video_meta.get('title') or video_json.get('content_name'),
'title': dict_get(video_meta, ('title', 'name')) or dict_get(video_json, ('content_name', 'assets_name')),
'display_id': display_id,
'description': video_meta.get('summary'),
'timestamp': video_meta.get('release_date'),
'timestamp': video_meta.get('release_date') or video_json.get('start_date'),
'duration': video_meta.get('duration'),
'categories': [video_meta.get('genre')],
'categories': [video_meta['genre']] if video_meta.get('genre') else None,
'average_rating': video_meta.get('star_rating'),
'series': video_meta.get('program_title') or video_json.get('program_title'),
'season_number': video_meta.get('season'),
@@ -152,12 +219,16 @@ class RCTIPlusIE(RCTIPlusBaseIE):
'channel': video_json.get('tv_name'),
'channel_id': video_json.get('tv_id'),
'formats': formats,
'thumbnails': thumbnails
'thumbnails': thumbnails,
'is_live': video_type == 'live-event' and not is_upcoming,
'was_live': video_type == 'missed-event',
'live_status': 'is_upcoming' if is_upcoming else None,
'release_timestamp': video_json.get('live_at'),
}
class RCTIPlusSeriesIE(RCTIPlusBaseIE):
_VALID_URL = r'https://www\.rctiplus\.com/programs/(?P<id>\d+)/(?P<display_id>[^/?#&]+)(?:\W)*$'
_VALID_URL = r'https://www\.rctiplus\.com/programs/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.rctiplus.com/programs/540/upin-ipin',
'playlist_mincount': 417,
@@ -167,7 +238,7 @@ class RCTIPlusSeriesIE(RCTIPlusBaseIE):
'description': 'md5:22cc912381f389664416844e1ec4f86b',
},
}, {
'url': 'https://www.rctiplus.com/programs/540/upin-ipin/#',
'url': 'https://www.rctiplus.com/programs/540/upin-ipin/episodes?utm_source=Rplusdweb&utm_medium=share_copy&utm_campaign=programsupin-ipin',
'only_matching': True,
}]
_AGE_RATINGS = { # Based off https://id.wikipedia.org/wiki/Sistem_rating_konten_televisi with additional ratings
@@ -180,6 +251,10 @@ class RCTIPlusSeriesIE(RCTIPlusBaseIE):
'D': 18,
}
@classmethod
def suitable(cls, url):
return False if RCTIPlusIE.suitable(url) else super(RCTIPlusSeriesIE, cls).suitable(url)
def _entries(self, url, display_id=None, note='Downloading entries JSON', metadata={}):
total_pages = 0
try:
@@ -240,3 +315,41 @@ class RCTIPlusSeriesIE(RCTIPlusBaseIE):
display_id, 'Downloading extra entries', metadata))
return self.playlist_result(itertools.chain(*entries), series_id, series_meta.get('title'), series_meta.get('summary'), **metadata)
class RCTIPlusTVIE(RCTIPlusBaseIE):
_VALID_URL = r'https://www\.rctiplus\.com/((tv/(?P<tvname>\w+))|(?P<eventname>live-event|missed-event))'
_TESTS = [{
'url': 'https://www.rctiplus.com/tv/rcti',
'info_dict': {
'id': 'v_lt1',
'title': 'RCTI',
'ext': 'mp4',
'timestamp': 1546344000,
'upload_date': '20190101',
},
'params': {
'skip_download': True,
'format': 'bestvideo',
}
}, {
# Returned video will always change
'url': 'https://www.rctiplus.com/live-event',
'only_matching': True,
}, {
# Returned video will also always change
'url': 'https://www.rctiplus.com/missed-event',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if RCTIPlusIE.suitable(url) else super(RCTIPlusTVIE, cls).suitable(url)
def _real_extract(self, url):
match = re.match(self._VALID_URL, url).groupdict()
tv_id = match.get('tvname') or match.get('eventname')
webpage = self._download_webpage(url, tv_id)
video_type, video_id = self._search_regex(
r'url\s*:\s*["\']https://api\.rctiplus\.com/api/v./(?P<type>[^/]+)/(?P<id>\d+)/url', webpage, 'video link', group=('type', 'id'))
return self.url_result(f'https://www.rctiplus.com/{video_type}/{video_id}/{tv_id}', 'RCTIPlus')

View File

@@ -41,6 +41,7 @@ class TenPlayIE(InfoExtractor):
'PG': 15,
'M': 15,
'MA': 15,
'MA15+': 15,
'R': 18,
'X': 18
}
@@ -79,7 +80,7 @@ class TenPlayIE(InfoExtractor):
'id': data.get('altId') or content_id,
'title': data.get('title'),
'description': data.get('description'),
'age_limit': self._AUS_AGES[data.get('classification')],
'age_limit': self._AUS_AGES.get(data.get('classification')),
'series': data.get('showName'),
'season': data.get('showContentSeason'),
'timestamp': data.get('published'),

View File

@@ -1,97 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class ThisVidIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisvid\.com/(?P<type>videos|embed)/(?P<id>[A-Za-z0-9-]+/?)'
_TESTS = [{
'url': 'https://thisvid.com/videos/french-boy-pantsed/',
'md5': '3397979512c682f6b85b3b04989df224',
'info_dict': {
'id': '2400174',
'ext': 'mp4',
'title': 'French Boy Pantsed',
'thumbnail': 'https://media.thisvid.com/contents/videos_screenshots/2400000/2400174/preview.mp4.jpg',
'age_limit': 18,
}
}, {
'url': 'https://thisvid.com/embed/2400174/',
'md5': '3397979512c682f6b85b3b04989df224',
'info_dict': {
'id': '2400174',
'ext': 'mp4',
'title': 'French Boy Pantsed',
'thumbnail': 'https://media.thisvid.com/contents/videos_screenshots/2400000/2400174/preview.mp4.jpg',
'age_limit': 18,
}
}]
def _real_extract(self, url):
main_id = self._match_id(url)
webpage = self._download_webpage(url, main_id)
# URL decryptor was reversed from version 4.0.4, later verified working with 5.2.0 and may change in the future.
kvs_version = self._html_search_regex(r'<script [^>]+?src="https://thisvid\.com/player/kt_player\.js\?v=(\d+(\.\d+)+)">', webpage, 'kvs_version', fatal=False)
if not kvs_version.startswith("5."):
self.report_warning("Major version change (" + kvs_version + ") in player engine--Download may fail.")
title = self._html_search_regex(r'<title>(?:Video: )?(.+?)(?: - (?:\w+ porn at )?ThisVid(?:.com| tube))?</title>', webpage, 'title')
# video_id, video_url and license_code from the 'flashvars' JSON object:
video_id = self._html_search_regex(r"video_id: '([0-9]+)',", webpage, 'video_id')
video_url = self._html_search_regex(r"video_url: '(function/0/.+?)',", webpage, 'video_url')
license_code = self._html_search_regex(r"license_code: '([0-9$]{16})',", webpage, 'license_code')
thumbnail = self._html_search_regex(r"preview_url: '((?:https?:)?//media.thisvid.com/.+?.jpg)',", webpage, 'thumbnail', fatal=False)
if thumbnail.startswith("//"):
thumbnail = "https:" + thumbnail
if (re.match(self._VALID_URL, url).group('type') == "videos"):
display_id = main_id
else:
display_id = self._search_regex(r'<link rel="canonical" href="' + self._VALID_URL + r'">', webpage, 'display_id', fatal=False),
return {
'id': video_id,
'display_id': display_id,
'title': title,
'url': getrealurl(video_url, license_code),
'thumbnail': thumbnail,
'age_limit': 18,
}
def getrealurl(video_url, license_code):
urlparts = video_url.split('/')[2:]
license = getlicensetoken(license_code)
newmagic = urlparts[5][:32]
for o in range(len(newmagic) - 1, -1, -1):
new = ""
l = (o + sum([int(n) for n in license[o:]])) % 32
for i in range(0, len(newmagic)):
if i == o:
new += newmagic[l]
elif i == l:
new += newmagic[o]
else:
new += newmagic[i]
newmagic = new
urlparts[5] = newmagic + urlparts[5][32:]
return "/".join(urlparts)
def getlicensetoken(license):
modlicense = license.replace("$", "").replace("0", "1")
center = int(len(modlicense) / 2)
fronthalf = int(modlicense[:center + 1])
backhalf = int(modlicense[center:])
modlicense = str(4 * abs(fronthalf - backhalf))
retval = ""
for o in range(0, center + 1):
for i in range(1, 5):
retval += str((int(license[o + i]) + int(modlicense[o])) % 10)
return retval

View File

@@ -144,7 +144,7 @@ class TurnerBaseIE(AdobePassIE):
m3u8_id=format_id or 'hls', fatal=False)
if '/secure/' in video_url and '?hdnea=' in video_url:
for f in m3u8_formats:
f['_seekable'] = False
f['_ffmpeg_args'] = ['-seekable', '0']
formats.extend(m3u8_formats)
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(

View File

@@ -864,6 +864,7 @@ class TwitchClipsIE(TwitchBaseIE):
'md5': '761769e1eafce0ffebfb4089cb3847cd',
'info_dict': {
'id': '42850523',
'display_id': 'FaintLightGullWholeWheat',
'ext': 'mp4',
'title': 'EA Play 2016 Live from the Novo Theatre',
'thumbnail': r're:^https?://.*\.jpg',
@@ -976,6 +977,7 @@ class TwitchClipsIE(TwitchBaseIE):
return {
'id': clip.get('id') or video_id,
'display_id': video_id,
'title': clip.get('title') or video_id,
'formats': formats,
'duration': int_or_none(clip.get('durationSeconds')),

View File

@@ -0,0 +1,85 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
dict_get,
int_or_none,
str_or_none,
try_get,
unified_strdate,
url_or_none,
)
class UtreonIE(InfoExtractor):
_VALID_URL = r'(?:https?://)(?:www\.)?utreon.com/v/(?P<id>[a-zA-Z0-9_-]+)'
_TESTS = [{
'url': 'https://utreon.com/v/z_I7ikQbuDw',
'info_dict': {
'id': 'z_I7ikQbuDw',
'ext': 'mp4',
'title': 'Freedom Friday meditation - Rising in the wind',
'description': 'md5:a9bf15a42434a062fe313b938343ad1b',
'uploader': 'Heather Dawn Elemental Health',
'thumbnail': 'https://data-1.utreon.com/v/MG/M2/NT/z_I7ikQbuDw/z_I7ikQbuDw_preview.jpg',
'release_date': '20210723',
}
}, {
'url': 'https://utreon.com/v/jerJw5EOOVU',
'info_dict': {
'id': 'jerJw5EOOVU',
'ext': 'mp4',
'title': 'When I\'m alone, I love to reflect in peace, to make my dreams come true... [Quotes and Poems]',
'description': 'md5:61ee6c2da98be51b04b969ca80273aaa',
'uploader': 'Frases e Poemas Quotes and Poems',
'thumbnail': 'https://data-1.utreon.com/v/Mz/Zh/ND/jerJw5EOOVU/jerJw5EOOVU_89af85470a4b16eededde7f8674c96d9_cover.jpg',
'release_date': '20210723',
}
}, {
'url': 'https://utreon.com/v/C4ZxXhYBBmE',
'info_dict': {
'id': 'C4ZxXhYBBmE',
'ext': 'mp4',
'title': 'Bidens Capital Gains Tax Rate to Test Worlds Highest',
'description': 'md5:fb5a6c2e506f013cc76f133f673bc5c8',
'uploader': 'Nomad Capitalist',
'thumbnail': 'https://data-1.utreon.com/v/ZD/k1/Mj/C4ZxXhYBBmE/C4ZxXhYBBmE_628342076198c9c06dd6b2c665978584_cover.jpg',
'release_date': '20210723',
}
}, {
'url': 'https://utreon.com/v/Y-stEH-FBm8',
'info_dict': {
'id': 'Y-stEH-FBm8',
'ext': 'mp4',
'title': 'Creeper-Chan Pranks Steve! 💚 [MINECRAFT ANIME]',
'description': 'md5:7a48450b0d761b96dec194be0c5ecb5f',
'uploader': 'Merryweather Comics',
'thumbnail': 'https://data-1.utreon.com/v/MT/E4/Zj/Y-stEH-FBm8/Y-stEH-FBm8_5290676a41a4a1096db133b09f54f77b_cover.jpg',
'release_date': '20210718',
}},
]
def _real_extract(self, url):
video_id = self._match_id(url)
json_data = self._download_json(
'https://api.utreon.com/v1/videos/' + video_id,
video_id)
videos_json = json_data['videos']
formats = [{
'url': format_url,
'format_id': format_key.split('_')[1],
'height': int(format_key.split('_')[1][:-1]),
} for format_key, format_url in videos_json.items() if url_or_none(format_url)]
self._sort_formats(formats)
thumbnail = url_or_none(dict_get(json_data, ('cover_image_url', 'preview_image_url')))
return {
'id': video_id,
'title': json_data['title'],
'formats': formats,
'description': str_or_none(json_data.get('description')),
'duration': int_or_none(json_data.get('duration')),
'uploader': str_or_none(try_get(json_data, lambda x: x['channel']['title'])),
'thumbnail': thumbnail,
'release_date': unified_strdate(json_data.get('published_datetime')),
}

View File

@@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..utils import (
clean_html,
ExtractorError,
get_element_by_class,
int_or_none,
@@ -47,10 +48,19 @@ class VidioBaseIE(InfoExtractor):
self._LOGIN_URL, None, 'Logging in', data=urlencode_postdata(login_form), expected_status=[302, 401])
if login_post_urlh.status == 401:
reason = get_element_by_class('onboarding-form__general-error', login_post)
if reason:
if get_element_by_class('onboarding-content-register-popup__title', login_post):
raise ExtractorError(
'Unable to log in: %s' % reason, expected=True)
'Unable to log in: The provided email has not registered yet.', expected=True)
reason = get_element_by_class('onboarding-form__general-error', login_post) or get_element_by_class('onboarding-modal__title', login_post)
if 'Akun terhubung ke' in reason:
raise ExtractorError(
'Unable to log in: Your account is linked to a social media account. '
'Use --cookies to provide account credentials instead', expected=True)
elif reason:
subreason = get_element_by_class('onboarding-modal__description-text', login_post) or ''
raise ExtractorError(
'Unable to log in: %s. %s' % (reason, clean_html(subreason)), expected=True)
raise ExtractorError('Unable to log in')
def _real_initialize(self):

View File

@@ -73,7 +73,7 @@ class VikiBaseIE(InfoExtractor):
data=json.dumps(data).encode('utf-8') if data else None,
headers=({'x-viki-app-ver': self._APP_VERSION} if data
else self._stream_headers(timestamp, sig) if query is None
else None)) or {}
else None), expected_status=400) or {}
self._raise_error(resp.get('error'), fatal)
return resp

View File

@@ -253,6 +253,30 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'quality': 1,
}
jwt_response = self._download_json(
'https://vimeo.com/_rv/viewer', video_id, note='Downloading jwt token', fatal=False) or {}
if not jwt_response.get('jwt'):
return
headers = {'Authorization': 'jwt %s' % jwt_response['jwt']}
original_response = self._download_json(
f'https://api.vimeo.com/videos/{video_id}', video_id,
headers=headers, fatal=False) or {}
for download_data in original_response.get('download') or {}:
download_url = download_data.get('link')
if not download_url or download_data.get('quality') != 'source':
continue
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(download_url).query)
return {
'url': download_url,
'ext': determine_ext(query.get('filename', [''])[0].lower()),
'format_id': download_data.get('public_name', 'Original'),
'width': int_or_none(download_data.get('width')),
'height': int_or_none(download_data.get('height')),
'fps': int_or_none(download_data.get('fps')),
'filesize': int_or_none(download_data.get('size')),
'quality': 1,
}
class VimeoIE(VimeoBaseInfoExtractor):
"""Information extractor for vimeo.com."""
@@ -426,6 +450,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
'description': 'md5:ae23671e82d05415868f7ad1aec21147',
},
},
{
'note': 'Contains original format not accessible in webpage',
'url': 'https://vimeo.com/393756517',
'md5': 'c464af248b592190a5ffbb5d33f382b0',
'info_dict': {
'id': '393756517',
'ext': 'mov',
'timestamp': 1582642091,
'uploader_id': 'frameworkla',
'title': 'Straight To Hell - Sabrina: Netflix',
'uploader': 'Framework Studio',
'description': 'md5:f2edc61af3ea7a5592681ddbb683db73',
'upload_date': '20200225',
},
'expected_warnings': ['Unable to download JSON metadata'],
},
{
# only available via https://vimeo.com/channels/tributes/6213729 and
# not via https://vimeo.com/6213729

View File

@@ -88,6 +88,7 @@ class VineIE(InfoExtractor):
'format_id': format_id or 'standard',
'quality': quality,
})
self._check_formats(formats, video_id)
self._sort_formats(formats)
username = data.get('username')

View File

@@ -19,6 +19,7 @@ from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
traverse_obj,
)
@@ -217,7 +218,7 @@ class VRVIE(VRVBaseIE):
})
thumbnails = []
for thumbnail in video_data.get('images', {}).get('thumbnails', []):
for thumbnail in traverse_obj(video_data, ('images', 'thumbnail', ..., ...)):
thumbnail_url = thumbnail.get('source')
if not thumbnail_url:
continue

View File

@@ -67,25 +67,198 @@ def parse_qs(url):
return compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
# any clients starting with _ cannot be explicity requested by the user
INNERTUBE_CLIENTS = {
'web': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB',
'clientVersion': '2.20210622.10.00',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 1
},
'web_embedded': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB_EMBEDDED_PLAYER',
'clientVersion': '1.20210620.0.1',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 56
},
'web_music': {
'INNERTUBE_API_KEY': 'AIzaSyC9XL3ZjWddXya6X74dJoCTL-WEYFDNX30',
'INNERTUBE_HOST': 'music.youtube.com',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB_REMIX',
'clientVersion': '1.20210621.00.00',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 67,
},
'web_creator': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB_CREATOR',
'clientVersion': '1.20210621.00.00',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 62,
},
'android': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID',
'clientVersion': '16.20',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 3,
},
'android_embedded': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_EMBEDDED_PLAYER',
'clientVersion': '16.20',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 55
},
'android_music': {
'INNERTUBE_API_KEY': 'AIzaSyC9XL3ZjWddXya6X74dJoCTL-WEYFDNX30',
'INNERTUBE_HOST': 'music.youtube.com',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_MUSIC',
'clientVersion': '4.32',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 21,
},
'android_creator': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_CREATOR',
'clientVersion': '21.24.100',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 14
},
# ios has HLS live streams
# See: https://github.com/TeamNewPipe/NewPipeExtractor/issues/680
'ios': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS',
'clientVersion': '16.20',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 5
},
'ios_embedded': {
'INNERTUBE_API_KEY': 'AIzaSyDCU8hByM-4DrUqRUYnGn-3llEO78bcxq8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS_MESSAGES_EXTENSION',
'clientVersion': '16.20',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 66
},
'ios_music': {
'INNERTUBE_API_KEY': 'AIzaSyDK3iBpDP9nHVTk2qL73FLJICfOC3c51Og',
'INNERTUBE_HOST': 'music.youtube.com',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS_MUSIC',
'clientVersion': '4.32',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 26
},
'ios_creator': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS_CREATOR',
'clientVersion': '21.24.100',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 15
},
# mweb has 'ultralow' formats
# See: https://github.com/yt-dlp/yt-dlp/pull/557
'mweb': {
'INNERTUBE_API_KEY': 'AIzaSyDCU8hByM-4DrUqRUYnGn-3llEO78bcxq8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'MWEB',
'clientVersion': '2.20210721.07.00',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 2
},
}
def build_innertube_clients():
third_party = {
'embedUrl': 'https://google.com', # Can be any valid URL
}
base_clients = ('android', 'web', 'ios', 'mweb')
priority = qualities(base_clients[::-1])
for client, ytcfg in tuple(INNERTUBE_CLIENTS.items()):
ytcfg.setdefault('INNERTUBE_API_KEY', 'AIzaSyDCU8hByM-4DrUqRUYnGn-3llEO78bcxq8')
ytcfg.setdefault('INNERTUBE_HOST', 'www.youtube.com')
ytcfg['INNERTUBE_CONTEXT']['client'].setdefault('hl', 'en')
ytcfg['priority'] = 10 * priority(client.split('_', 1)[0])
if client in base_clients:
INNERTUBE_CLIENTS[f'{client}_agegate'] = agegate_ytcfg = copy.deepcopy(ytcfg)
agegate_ytcfg['INNERTUBE_CONTEXT']['client']['clientScreen'] = 'EMBED'
agegate_ytcfg['INNERTUBE_CONTEXT']['thirdParty'] = third_party
agegate_ytcfg['priority'] -= 1
elif client.endswith('_embedded'):
ytcfg['INNERTUBE_CONTEXT']['thirdParty'] = third_party
ytcfg['priority'] -= 2
else:
ytcfg['priority'] -= 3
build_innertube_clients()
class YoutubeBaseInfoExtractor(InfoExtractor):
"""Provide base functions for Youtube extractors"""
_RESERVED_NAMES = (
r'channel|c|user|playlist|watch|w|v|embed|e|watch_popup|'
r'shorts|movies|results|shared|hashtag|trending|feed|feeds|'
r'browse|oembed|get_video_info|iframe_api|s/player|'
r'storefront|oops|index|account|reporthistory|t/terms|about|upload|signin|logout')
_PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM|WL|LL|LM)'
_NETRC_MACHINE = 'youtube'
# If True it will raise an error if no login info is provided
_LOGIN_REQUIRED = False
r''' # Unused since login is broken
_LOGIN_URL = 'https://accounts.google.com/ServiceLogin'
_TWOFACTOR_URL = 'https://accounts.google.com/signin/challenge'
_LOOKUP_URL = 'https://accounts.google.com/_/signin/sl/lookup'
_CHALLENGE_URL = 'https://accounts.google.com/_/signin/sl/challenge'
_TFA_URL = 'https://accounts.google.com/_/signin/challenge?hl=en&TL={0}'
_RESERVED_NAMES = (
r'channel|c|user|browse|playlist|watch|w|v|embed|e|watch_popup|shorts|'
r'movies|results|shared|hashtag|trending|feed|feeds|oembed|get_video_info|'
r'storefront|oops|index|account|reporthistory|t/terms|about|upload|signin|logout')
_NETRC_MACHINE = 'youtube'
# If True it will raise an error if no login info is provided
_LOGIN_REQUIRED = False
_PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM|WL|LL|LM)'
'''
def _login(self):
"""
@@ -312,175 +485,21 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
_YT_INITIAL_PLAYER_RESPONSE_RE = r'ytInitialPlayerResponse\s*=\s*({.+?})\s*;'
_YT_INITIAL_BOUNDARY_RE = r'(?:var\s+meta|</script|\n)'
_YT_DEFAULT_YTCFGS = {
'WEB': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'WEB',
'INNERTUBE_CLIENT_VERSION': '2.20210622.10.00',
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB',
'clientVersion': '2.20210622.10.00',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 1
},
'WEB_REMIX': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'WEB_REMIX',
'INNERTUBE_CLIENT_VERSION': '1.20210621.00.00',
'INNERTUBE_API_KEY': 'AIzaSyC9XL3ZjWddXya6X74dJoCTL-WEYFDNX30',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB_REMIX',
'clientVersion': '1.20210621.00.00',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 67
},
'WEB_EMBEDDED_PLAYER': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'WEB_EMBEDDED_PLAYER',
'INNERTUBE_CLIENT_VERSION': '1.20210620.0.1',
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB_EMBEDDED_PLAYER',
'clientVersion': '1.20210620.0.1',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 56
},
'ANDROID': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'ANDROID',
'INNERTUBE_CLIENT_VERSION': '16.20',
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID',
'clientVersion': '16.20',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 3
},
'ANDROID_EMBEDDED_PLAYER': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'ANDROID_EMBEDDED_PLAYER',
'INNERTUBE_CLIENT_VERSION': '16.20',
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_EMBEDDED_PLAYER',
'clientVersion': '16.20',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 55
},
'ANDROID_MUSIC': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'ANDROID_MUSIC',
'INNERTUBE_CLIENT_VERSION': '4.32',
'INNERTUBE_API_KEY': 'AIzaSyC9XL3ZjWddXya6X74dJoCTL-WEYFDNX30',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_MUSIC',
'clientVersion': '4.32',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 21
},
'IOS': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'IOS',
'INNERTUBE_CLIENT_VERSION': '16.20',
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS',
'clientVersion': '16.20',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 5
def _get_default_ytcfg(self, client='web'):
return copy.deepcopy(INNERTUBE_CLIENTS[client])
},
'IOS_MUSIC': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'IOS_MUSIC',
'INNERTUBE_CLIENT_VERSION': '4.32',
'INNERTUBE_API_KEY': 'AIzaSyDK3iBpDP9nHVTk2qL73FLJICfOC3c51Og',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS_MUSIC',
'clientVersion': '4.32',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 26
},
'IOS_MESSAGES_EXTENSION': {
'INNERTUBE_API_VERSION': 'v1',
'INNERTUBE_CLIENT_NAME': 'IOS_MESSAGES_EXTENSION',
'INNERTUBE_CLIENT_VERSION': '16.20',
'INNERTUBE_API_KEY': 'AIzaSyDCU8hByM-4DrUqRUYnGn-3llEO78bcxq8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS_MESSAGES_EXTENSION',
'clientVersion': '16.20',
'hl': 'en',
}
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 66
}
}
def _get_innertube_host(self, client='web'):
return INNERTUBE_CLIENTS[client]['INNERTUBE_HOST']
_YT_DEFAULT_INNERTUBE_HOSTS = {
'DIRECT': 'youtubei.googleapis.com',
'WEB': 'www.youtube.com',
'WEB_REMIX': 'music.youtube.com',
'ANDROID_MUSIC': 'music.youtube.com'
}
# clients starting with _ cannot be explicity requested by the user
_YT_CLIENTS = {
'web': 'WEB',
'web_music': 'WEB_REMIX',
'_web_embedded': 'WEB_EMBEDDED_PLAYER',
'_web_agegate': 'TVHTML5',
'android': 'ANDROID',
'android_music': 'ANDROID_MUSIC',
'_android_embedded': 'ANDROID_EMBEDDED_PLAYER',
'_android_agegate': 'ANDROID',
'ios': 'IOS',
'ios_music': 'IOS_MUSIC',
'_ios_embedded': 'IOS_MESSAGES_EXTENSION',
'_ios_agegate': 'IOS'
}
def _get_default_ytcfg(self, client='WEB'):
if client in self._YT_DEFAULT_YTCFGS:
return copy.deepcopy(self._YT_DEFAULT_YTCFGS[client])
self.write_debug(f'INNERTUBE default client {client} does not exist - falling back to WEB client.')
return copy.deepcopy(self._YT_DEFAULT_YTCFGS['WEB'])
def _get_innertube_host(self, client='WEB'):
return dict_get(self._YT_DEFAULT_INNERTUBE_HOSTS, (client, 'WEB'))
def _ytcfg_get_safe(self, ytcfg, getter, expected_type=None, default_client='WEB'):
def _ytcfg_get_safe(self, ytcfg, getter, expected_type=None, default_client='web'):
# try_get but with fallback to default ytcfg client values when present
_func = lambda y: try_get(y, getter, expected_type)
return _func(ytcfg) or _func(self._get_default_ytcfg(default_client))
def _extract_client_name(self, ytcfg, default_client='WEB'):
return self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_CLIENT_NAME'], compat_str, default_client)
def _extract_client_name(self, ytcfg, default_client='web'):
return self._ytcfg_get_safe(
ytcfg, (lambda x: x['INNERTUBE_CLIENT_NAME'],
lambda x: x['INNERTUBE_CONTEXT']['client']['clientName']), compat_str, default_client)
@staticmethod
def _extract_session_index(*data):
@@ -489,13 +508,15 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
if session_index is not None:
return session_index
def _extract_client_version(self, ytcfg, default_client='WEB'):
return self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_CLIENT_VERSION'], compat_str, default_client)
def _extract_client_version(self, ytcfg, default_client='web'):
return self._ytcfg_get_safe(
ytcfg, (lambda x: x['INNERTUBE_CLIENT_VERSION'],
lambda x: x['INNERTUBE_CONTEXT']['client']['clientVersion']), compat_str, default_client)
def _extract_api_key(self, ytcfg=None, default_client='WEB'):
def _extract_api_key(self, ytcfg=None, default_client='web'):
return self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_API_KEY'], compat_str, default_client)
def _extract_context(self, ytcfg=None, default_client='WEB'):
def _extract_context(self, ytcfg=None, default_client='web'):
_get_context = lambda y: try_get(y, lambda x: x['INNERTUBE_CONTEXT'], dict)
context = _get_context(ytcfg)
if context:
@@ -515,29 +536,36 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
context['client']['visitorData'] = visitor_data
return context
_SAPISID = None
def _generate_sapisidhash_header(self, origin='https://www.youtube.com'):
# Sometimes SAPISID cookie isn't present but __Secure-3PAPISID is.
# See: https://github.com/yt-dlp/yt-dlp/issues/393
yt_cookies = self._get_cookies('https://www.youtube.com')
sapisid_cookie = dict_get(
yt_cookies, ('__Secure-3PAPISID', 'SAPISID'))
if sapisid_cookie is None or not sapisid_cookie.value:
return
time_now = round(time.time())
# SAPISID cookie is required if not already present
if not yt_cookies.get('SAPISID'):
self.write_debug('Copying __Secure-3PAPISID cookie to SAPISID cookie', only_once=True)
self._set_cookie(
'.youtube.com', 'SAPISID', sapisid_cookie.value, secure=True, expire_time=time_now + 3600)
self.write_debug('Extracted SAPISID cookie', only_once=True)
if self._SAPISID is None:
yt_cookies = self._get_cookies('https://www.youtube.com')
# Sometimes SAPISID cookie isn't present but __Secure-3PAPISID is.
# See: https://github.com/yt-dlp/yt-dlp/issues/393
sapisid_cookie = dict_get(
yt_cookies, ('__Secure-3PAPISID', 'SAPISID'))
if sapisid_cookie and sapisid_cookie.value:
self._SAPISID = sapisid_cookie.value
self.write_debug('Extracted SAPISID cookie')
# SAPISID cookie is required if not already present
if not yt_cookies.get('SAPISID'):
self.write_debug('Copying __Secure-3PAPISID cookie to SAPISID cookie')
self._set_cookie(
'.youtube.com', 'SAPISID', self._SAPISID, secure=True, expire_time=time_now + 3600)
else:
self._SAPISID = False
if not self._SAPISID:
return None
# SAPISIDHASH algorithm from https://stackoverflow.com/a/32065323
sapisidhash = hashlib.sha1(
f'{time_now} {sapisid_cookie.value} {origin}'.encode('utf-8')).hexdigest()
f'{time_now} {self._SAPISID} {origin}'.encode('utf-8')).hexdigest()
return f'SAPISIDHASH {time_now}_{sapisidhash}'
def _call_api(self, ep, query, video_id, fatal=True, headers=None,
note='Downloading API JSON', errnote='Unable to download API page',
context=None, api_key=None, api_hostname=None, default_client='WEB'):
context=None, api_key=None, api_hostname=None, default_client='web'):
data = {'context': context} if context else {'context': self._extract_context(default_client=default_client)}
data.update(query)
@@ -599,7 +627,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
def generate_api_headers(
self, ytcfg=None, identity_token=None, account_syncid=None,
visitor_data=None, api_hostname=None, default_client='WEB', session_index=None):
visitor_data=None, api_hostname=None, default_client='web', session_index=None):
origin = 'https://' + (api_hostname if api_hostname else self._get_innertube_host(default_client))
headers = {
'X-YouTube-Client-Name': compat_str(
@@ -744,7 +772,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
def _extract_response(self, item_id, query, note='Downloading API JSON', headers=None,
ytcfg=None, check_get_keys=None, ep='browse', fatal=True, api_hostname=None,
default_client='WEB'):
default_client='web'):
response = None
last_error = None
count = -1
@@ -1043,11 +1071,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'vtt')
_AGE_GATE_REASONS = (
'Sign in to confirm your age',
'This video may be inappropriate for some users.',
'Sorry, this content is age-restricted.')
_GEO_BYPASS = False
IE_NAME = 'youtube'
@@ -1152,8 +1175,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'format': '141/bestaudio[ext=m4a]',
},
},
# Normal age-gate video (embed allowed)
# Age-gate videos. See https://github.com/yt-dlp/yt-dlp/pull/575#issuecomment-888837000
{
'note': 'Embed allowed age-gate video',
'url': 'https://youtube.com/watch?v=HtVdAasjOgU',
'info_dict': {
'id': 'HtVdAasjOgU',
@@ -1168,6 +1192,52 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'age_limit': 18,
},
},
{
'note': 'Age-gate video with embed allowed in public site',
'url': 'https://youtube.com/watch?v=HsUATh_Nc2U',
'info_dict': {
'id': 'HsUATh_Nc2U',
'ext': 'mp4',
'title': 'Godzilla 2 (Official Video)',
'description': 'md5:bf77e03fcae5529475e500129b05668a',
'upload_date': '20200408',
'uploader_id': 'FlyingKitty900',
'uploader': 'FlyingKitty',
'age_limit': 18,
},
},
{
'note': 'Age-gate video embedable only with clientScreen=EMBED',
'url': 'https://youtube.com/watch?v=Tq92D6wQ1mg',
'info_dict': {
'id': 'Tq92D6wQ1mg',
'title': '[MMD] Adios - EVERGLOW [+Motion DL]',
'ext': 'mp4',
'upload_date': '20191227',
'uploader_id': 'UC1yoRdFoFJaCY-AGfD9W0wQ',
'uploader': 'Projekt Melody',
'description': 'md5:17eccca93a786d51bc67646756894066',
'age_limit': 18,
},
},
{
'note': 'Non-Agegated non-embeddable video',
'url': 'https://youtube.com/watch?v=MeJVWBSsPAY',
'info_dict': {
'id': 'MeJVWBSsPAY',
'ext': 'mp4',
'title': 'OOMPH! - Such Mich Find Mich (Lyrics)',
'uploader': 'Herr Lurik',
'uploader_id': 'st3in234',
'description': 'Fan Video. Music & Lyrics by OOMPH!.',
'upload_date': '20130730',
},
},
{
'note': 'Non-bypassable age-gated video',
'url': 'https://youtube.com/watch?v=Cr381pDsSsA',
'only_matching': True,
},
# video_info is None (https://github.com/ytdl-org/youtube-dl/issues/4421)
# YouTube Red ad is not captured for creator
{
@@ -1336,6 +1406,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'Not multifeed anymore',
},
{
# Multifeed video with comma in title (see https://github.com/ytdl-org/youtube-dl/issues/8536)
@@ -1874,10 +1945,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
funcname = self._search_regex(
(r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bm=(?P<sig>[a-zA-Z0-9$]{2})\(decodeURIComponent\(h\.s\)\)',
r'\bc&&\(c=(?P<sig>[a-zA-Z0-9$]{2})\(decodeURIComponent\(c\)\)',
r'(?:\b|[^a-zA-Z0-9$])(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\);[a-zA-Z0-9$]{2}\.[a-zA-Z0-9$]{2}\(a,\d+\)',
r'(?:\b|[^a-zA-Z0-9$])(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
r'\bm=(?P<sig>[a-zA-Z0-9$]{2,})\(decodeURIComponent\(h\.s\)\)',
r'\bc&&\(c=(?P<sig>[a-zA-Z0-9$]{2,})\(decodeURIComponent\(c\)\)',
r'(?:\b|[^a-zA-Z0-9$])(?P<sig>[a-zA-Z0-9$]{2,})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\);[a-zA-Z0-9$]{2}\.[a-zA-Z0-9$]{2}\(a,\d+\)',
r'(?:\b|[^a-zA-Z0-9$])(?P<sig>[a-zA-Z0-9$]{2,})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
# Obsolete patterns
r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
@@ -2319,7 +2390,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
known_entry_comment_renderers = ('itemSectionRenderer',)
estimated_total = 0
max_comments = int_or_none(self._configuration_arg('max_comments', [''])[0]) or float('inf')
# Force English regardless of account setting to prevent parsing issues
# See: https://github.com/yt-dlp/yt-dlp/issues/532
ytcfg = copy.deepcopy(ytcfg)
traverse_obj(
ytcfg, ('INNERTUBE_CONTEXT', 'client'), expected_type=dict, default={})['hl'] = 'en'
try:
for comment in _real_comment_extract(contents):
if len(comments) >= max_comments:
@@ -2352,28 +2427,20 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}
@staticmethod
def _get_video_info_params(video_id, client='TVHTML5'):
GVI_CLIENTS = {
'ANDROID': {
'c': 'ANDROID',
'cver': '16.20',
},
'TVHTML5': {
'c': 'TVHTML5',
'cver': '6.20180913',
},
'IOS': {
'c': 'IOS',
'cver': '16.20'
}
}
query = {
'video_id': video_id,
'eurl': 'https://youtube.googleapis.com/v/' + video_id,
'html5': '1'
}
query.update(GVI_CLIENTS.get(client))
return query
def _is_agegated(player_response):
if traverse_obj(player_response, ('playabilityStatus', 'desktopLegacyAgeGateReason')):
return True
reasons = traverse_obj(player_response, ('playabilityStatus', ('status', 'reason')), default=[])
AGE_GATE_REASONS = (
'confirm your age', 'age-restricted', 'inappropriate', # reason
'age_verification_required', 'age_check_required', # status
)
return any(expected in reason for expected in AGE_GATE_REASONS for reason in reasons)
@staticmethod
def _is_unplayable(player_response):
return traverse_obj(player_response, ('playabilityStatus', 'status')) == 'UNPLAYABLE'
def _extract_player_response(self, client, video_id, master_ytcfg, player_ytcfg, identity_token, player_url, initial_pr):
@@ -2382,65 +2449,48 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
sts = self._extract_signature_timestamp(video_id, player_url, master_ytcfg, fatal=False)
headers = self.generate_api_headers(
player_ytcfg, identity_token, syncid,
default_client=self._YT_CLIENTS[client], session_index=session_index)
default_client=client, session_index=session_index)
yt_query = {'videoId': video_id}
yt_query.update(self._generate_player_context(sts))
return self._extract_response(
item_id=video_id, ep='player', query=yt_query,
ytcfg=player_ytcfg, headers=headers, fatal=False,
default_client=self._YT_CLIENTS[client],
ytcfg=player_ytcfg, headers=headers, fatal=True,
default_client=client,
note='Downloading %s player API JSON' % client.replace('_', ' ').strip()
) or None
def _extract_age_gated_player_response(self, client, video_id, ytcfg, identity_token, player_url, initial_pr):
# get_video_info endpoint seems to be completely dead
gvi_client = None # self._YT_CLIENTS.get(f'_{client}_agegate')
if gvi_client:
pr = self._parse_json(traverse_obj(
compat_parse_qs(self._download_webpage(
self.http_scheme() + '//www.youtube.com/get_video_info', video_id,
'Refetching age-gated %s info webpage' % gvi_client.lower(),
'unable to download video info webpage', fatal=False,
query=self._get_video_info_params(video_id, client=gvi_client))),
('player_response', 0), expected_type=str) or '{}', video_id)
if pr:
return pr
self.report_warning('Falling back to embedded-only age-gate workaround')
if not self._YT_CLIENTS.get(f'_{client}_embedded'):
return
embed_webpage = None
if client == 'web' and 'configs' not in self._configuration_arg('player_skip'):
embed_webpage = self._download_webpage(
'https://www.youtube.com/embed/%s?html5=1' % video_id,
video_id=video_id, note=f'Downloading age-gated {client} embed config')
ytcfg_age = self.extract_ytcfg(video_id, embed_webpage) or {}
# If we extracted the embed webpage, it'll tell us if we can view the video
embedded_pr = self._parse_json(
traverse_obj(ytcfg_age, ('PLAYER_VARS', 'embedded_player_response'), expected_type=str) or '{}',
video_id=video_id)
embedded_ps_reason = traverse_obj(embedded_pr, ('playabilityStatus', 'reason'), expected_type=str) or ''
if embedded_ps_reason in self._AGE_GATE_REASONS:
return
return self._extract_player_response(
f'_{client}_embedded', video_id,
ytcfg_age or ytcfg, ytcfg_age if client == 'web' else {},
identity_token, player_url, initial_pr)
def _get_requested_clients(self, url, smuggled_data):
requested_clients = [client for client in self._configuration_arg('player_client')
if client[:0] != '_' and client in self._YT_CLIENTS]
requested_clients = []
allowed_clients = sorted(
[client for client in INNERTUBE_CLIENTS.keys() if client[:1] != '_'],
key=lambda client: INNERTUBE_CLIENTS[client]['priority'], reverse=True)
for client in self._configuration_arg('player_client'):
if client in allowed_clients:
requested_clients.append(client)
elif client == 'all':
requested_clients.extend(allowed_clients)
else:
self.report_warning(f'Skipping unsupported client {client}')
if not requested_clients:
requested_clients = ['android', 'web']
if smuggled_data.get('is_music_url') or self.is_music_url(url):
requested_clients.extend(
f'{client}_music' for client in requested_clients if not client.endswith('_music'))
f'{client}_music' for client in requested_clients if f'{client}_music' in INNERTUBE_CLIENTS)
return orderedSet(requested_clients)
def _extract_player_ytcfg(self, client, video_id):
url = {
'web_music': 'https://music.youtube.com',
'web_embedded': f'https://www.youtube.com/embed/{video_id}?html5=1'
}.get(client)
if not url:
return {}
webpage = self._download_webpage(url, video_id, fatal=False, note=f'Downloading {client} config')
return self.extract_ytcfg(video_id, webpage) or {}
def _extract_player_responses(self, clients, video_id, webpage, master_ytcfg, player_url, identity_token):
initial_pr = None
if webpage:
@@ -2448,40 +2498,63 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
webpage, self._YT_INITIAL_PLAYER_RESPONSE_RE,
video_id, 'initial player response')
for client in clients:
player_ytcfg = master_ytcfg if client == 'web' else {}
if client == 'web' and initial_pr:
pr = initial_pr
else:
if client == 'web_music' and 'configs' not in self._configuration_arg('player_skip'):
ytm_webpage = self._download_webpage(
'https://music.youtube.com',
video_id, fatal=False, note='Downloading remix client config')
player_ytcfg = self.extract_ytcfg(video_id, ytm_webpage) or {}
pr = self._extract_player_response(
client, video_id, player_ytcfg or master_ytcfg, player_ytcfg, identity_token, player_url, initial_pr)
if pr:
yield pr
if traverse_obj(pr, ('playabilityStatus', 'reason')) in self._AGE_GATE_REASONS:
pr = self._extract_age_gated_player_response(
client, video_id, player_ytcfg or master_ytcfg, identity_token, player_url, initial_pr)
if pr:
yield pr
original_clients = clients
clients = clients[::-1]
def append_client(client_name):
if client_name in INNERTUBE_CLIENTS and client_name not in original_clients:
clients.append(client_name)
# Android player_response does not have microFormats which are needed for
# extraction of some data. So we return the initial_pr with formats
# stripped out even if not requested by the user
# See: https://github.com/yt-dlp/yt-dlp/issues/501
if initial_pr and 'web' not in clients:
initial_pr['streamingData'] = None
yield initial_pr
yielded_pr = False
if initial_pr:
pr = dict(initial_pr)
pr['streamingData'] = None
yielded_pr = True
yield pr
last_error = None
while clients:
client = clients.pop()
player_ytcfg = master_ytcfg if client == 'web' else {}
if 'configs' not in self._configuration_arg('player_skip'):
player_ytcfg = self._extract_player_ytcfg(client, video_id) or player_ytcfg
try:
pr = initial_pr if client == 'web' and initial_pr else self._extract_player_response(
client, video_id, player_ytcfg or master_ytcfg, player_ytcfg, identity_token, player_url, initial_pr)
except ExtractorError as e:
if last_error:
self.report_warning(last_error)
last_error = e
continue
if pr:
yielded_pr = True
yield pr
# creator clients can bypass AGE_VERIFICATION_REQUIRED if logged in
if client.endswith('_agegate') and self._is_unplayable(pr) and self._generate_sapisidhash_header():
append_client(client.replace('_agegate', '_creator'))
elif self._is_agegated(pr):
append_client(f'{client}_agegate')
if last_error:
if not yielded_pr:
raise last_error
self.report_warning(last_error)
def _extract_formats(self, streaming_data, video_id, player_url, is_live):
itags, stream_ids = [], []
itag_qualities = {}
itag_qualities, res_qualities = {}, {}
q = qualities([
# "tiny" is the smallest video-only format. But some audio-only formats
# was also labeled "tiny". It is not clear if such formats still exist
'tiny', 'audio_quality_low', 'audio_quality_medium', 'audio_quality_high', # Audio only formats
# Normally tiny is the smallest video-only formats. But
# audio-only formats with unknown quality may get tagged as tiny
'tiny',
'audio_quality_ultralow', 'audio_quality_low', 'audio_quality_medium', 'audio_quality_high', # Audio only formats
'small', 'medium', 'large', 'hd720', 'hd1080', 'hd1440', 'hd2160', 'hd2880', 'highres'
])
streaming_formats = traverse_obj(streaming_data, (..., ('formats', 'adaptiveFormats'), ...), default=[])
@@ -2497,10 +2570,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
continue
quality = fmt.get('quality')
height = int_or_none(fmt.get('height'))
if quality == 'tiny' or not quality:
quality = fmt.get('audioQuality', '').lower() or quality
if itag and quality:
itag_qualities[itag] = quality
# The 3gp format (17) in android client has a quality of "small",
# but is actually worse than other formats
if itag == '17':
quality = 'tiny'
if quality:
if itag:
itag_qualities[itag] = quality
if height:
res_qualities[height] = quality
# FORMAT_STREAM_TYPE_OTF(otf=1) requires downloading the init fragment
# (adding `&sq=0` to the URL) and parsing emsg box to determine the
# number of fragment that would subsequently requested with (`&sq=N`)
@@ -2531,13 +2612,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'filesize': int_or_none(fmt.get('contentLength')),
'format_id': itag,
'format_note': ', '.join(filter(None, (
audio_track.get('displayName'), fmt.get('qualityLabel') or quality))),
audio_track.get('displayName'),
fmt.get('qualityLabel') or quality.replace('audio_quality_', '')))),
'fps': int_or_none(fmt.get('fps')),
'height': int_or_none(fmt.get('height')),
'height': height,
'quality': q(quality),
'tbr': tbr,
'url': fmt_url,
'width': fmt.get('width'),
'width': int_or_none(fmt.get('width')),
'language': audio_track.get('id', '').split('.')[0],
}
mime_mobj = re.match(
@@ -2545,11 +2627,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if mime_mobj:
dct['ext'] = mimetype2ext(mime_mobj.group(1))
dct.update(parse_codecs(mime_mobj.group(2)))
# The 3gp format in android client has a quality of "small",
# but is actually worse than all other formats
if dct['ext'] == '3gp':
dct['quality'] = q('tiny')
dct['preference'] = -10
no_audio = dct.get('acodec') == 'none'
no_video = dct.get('vcodec') == 'none'
if no_audio:
@@ -2566,14 +2643,21 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
yield dct
skip_manifests = self._configuration_arg('skip')
get_dash = not is_live and 'dash' not in skip_manifests and self.get_param('youtube_include_dash_manifest', True)
get_dash = (
(not is_live or self._configuration_arg('include_live_dash'))
and 'dash' not in skip_manifests and self.get_param('youtube_include_dash_manifest', True))
get_hls = 'hls' not in skip_manifests and self.get_param('youtube_include_hls_manifest', True)
def guess_quality(f):
for val, qdict in ((f.get('format_id'), itag_qualities), (f.get('height'), res_qualities)):
if val in qdict:
return q(qdict[val])
return -1
for sd in streaming_data:
hls_manifest_url = get_hls and sd.get('hlsManifestUrl')
if hls_manifest_url:
for f in self._extract_m3u8_formats(
hls_manifest_url, video_id, 'mp4', fatal=False):
for f in self._extract_m3u8_formats(hls_manifest_url, video_id, 'mp4', fatal=False):
itag = self._search_regex(
r'/itag/(\d+)', f['url'], 'itag', default=None)
if itag in itags:
@@ -2581,19 +2665,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if itag:
f['format_id'] = itag
itags.append(itag)
f['quality'] = guess_quality(f)
yield f
dash_manifest_url = get_dash and sd.get('dashManifestUrl')
if dash_manifest_url:
for f in self._extract_mpd_formats(
dash_manifest_url, video_id, fatal=False):
for f in self._extract_mpd_formats(dash_manifest_url, video_id, fatal=False):
itag = f['format_id']
if itag in itags:
continue
if itag:
itags.append(itag)
if itag in itag_qualities:
f['quality'] = q(itag_qualities[itag])
f['quality'] = guess_quality(f)
filesize = int_or_none(self._search_regex(
r'/clen/(\d+)', f.get('fragment_base_url')
or f['url'], 'file size', default=None))
@@ -2718,13 +2801,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self.raise_no_formats(reason, expected=True)
for f in formats:
# TODO: detect if throttled
if '&n=' in f['url']: # possibly throttled
if '&c=WEB&' in f['url'] and '&ratebypass=yes&' not in f['url']: # throttled
f['source_preference'] = -10
# note = f.get('format_note')
# f['format_note'] = f'{note} (throttled)' if note else '(throttled)'
# TODO: this method is not reliable
f['format_note'] = format_field(f, 'format_note', '%s ') + '(maybe throttled)'
self._sort_formats(formats)
# Source is given priority since formats that throttle are given lower source_preference
# When throttling issue is fully fixed, remove this
self._sort_formats(formats, ('quality', 'height', 'fps', 'source'))
keywords = get_first(video_details, 'keywords', expected_type=list) or []
if not keywords and webpage:
@@ -3391,7 +3475,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
}, {
'url': 'https://www.youtube.com/channel/UCoMdktPbSTixAyNGwb-UYkQ/live',
'info_dict': {
'id': 'FMtPN8yp5LU', # This will keep changing
'id': '3yImotZU3tw', # This will keep changing
'ext': 'mp4',
'title': compat_str,
'uploader': 'Sky News',

View File

@@ -23,7 +23,7 @@ from .cookies import SUPPORTED_BROWSERS
from .version import __version__
from .downloader.external import list_external_downloaders
from .postprocessor.ffmpeg import (
from .postprocessor import (
FFmpegExtractAudioPP,
FFmpegSubtitlesConvertorPP,
FFmpegThumbnailsConvertorPP,
@@ -190,15 +190,15 @@ def parseOpts(overrideArguments=None):
general.add_option(
'--dump-user-agent',
action='store_true', dest='dump_user_agent', default=False,
help='Display the current browser identification')
help='Display the current user-agent and exit')
general.add_option(
'--list-extractors',
action='store_true', dest='list_extractors', default=False,
help='List all supported extractors')
help='List all supported extractors and exit')
general.add_option(
'--extractor-descriptions',
action='store_true', dest='list_extractor_descriptions', default=False,
help='Output descriptions of all supported extractors')
help='Output descriptions of all supported extractors and exit')
general.add_option(
'--force-generic-extractor',
action='store_true', dest='force_generic_extractor', default=False,
@@ -223,12 +223,6 @@ def parseOpts(overrideArguments=None):
'--flat-playlist',
action='store_const', dest='extract_flat', const='in_playlist', default=False,
help='Do not extract the videos of a playlist, only list them')
general.add_option(
'--flat-videos',
action='store_true', dest='extract_flat',
# help='Do not resolve the video urls')
# doesn't work
help=optparse.SUPPRESS_HELP)
general.add_option(
'--no-flat-playlist',
action='store_false', dest='extract_flat',
@@ -375,22 +369,17 @@ def parseOpts(overrideArguments=None):
'--match-filter',
metavar='FILTER', dest='match_filter', default=None,
help=(
'Generic video filter. '
'Specify any key (see "OUTPUT TEMPLATE" for a list of available keys) to '
'match if the key is present, '
'!key to check if the key is not present, '
'key>NUMBER (like "view_count > 12", also works with '
'>=, <, <=, !=, =) to compare against a number, '
'key = \'LITERAL\' (like "uploader = \'Mike Smith\'", also works with !=) '
'to match against a string literal '
'and & to require multiple matches. '
'Values which are not known are excluded unless you '
'put a question mark (?) after the operator. '
'For example, to only match videos that have been liked more than '
'100 times and disliked less than 50 times (or the dislike '
'functionality is not available at the given service), but who '
'also have a description, use --match-filter '
'"like_count > 100 & dislike_count <? 50 & description"'))
'Generic video filter. Any field (see "OUTPUT TEMPLATE") can be compared with a '
'number or a string using the operators defined in "Filtering formats". '
'You can also simply specify a field to match if the field is present '
'and "!field" to check if the field is not present. In addition, '
'Python style regular expression matching can be done using "~=", '
'and multiple filters can be checked with "&". '
'Use a "\\" to escape "&" or quotes if needed. Eg: --match-filter '
r'"!is_live & like_count>?100 & description~=\'(?i)\bcats \& dogs\b\'" '
'matches only videos that are not live, has a like count more than 100 '
'(or the like field is not available), and also has a description '
'that contains the phrase "cats & dogs" (ignoring case)'))
selection.add_option(
'--no-match-filter',
metavar='FILTER', dest='match_filter', action='store_const', const=None,
@@ -537,7 +526,7 @@ def parseOpts(overrideArguments=None):
video_format.add_option(
'-F', '--list-formats',
action='store_true', dest='listformats',
help='List all available formats of requested videos')
help='List available formats of each video. Simulate unless --no-simulate is used')
video_format.add_option(
'--list-formats-as-table',
action='store_true', dest='listformats_table', default=True,
@@ -588,7 +577,7 @@ def parseOpts(overrideArguments=None):
subtitles.add_option(
'--list-subs',
action='store_true', dest='listsubtitles', default=False,
help='List all available subtitles for the video')
help='List available subtitles of each video. Simulate unless --no-simulate is used')
subtitles.add_option(
'--sub-format',
action='store', dest='subtitlesformat', metavar='FORMAT', default='best',
@@ -706,9 +695,8 @@ def parseOpts(overrideArguments=None):
callback_kwargs={
'allowed_keys': 'http|ftp|m3u8|dash|rtsp|rtmp|mms',
'default_key': 'default',
'process': lambda x: x.strip()
},
help=(
'process': str.strip
}, help=(
'Name or path of the external downloader to use (optionally) prefixed by '
'the protocols (http, ftp, m3u8, dash, rstp, rtmp, mms) to use it for. '
'Currently supports native, %s (Recommended: aria2c). '
@@ -724,8 +712,7 @@ def parseOpts(overrideArguments=None):
'allowed_keys': '|'.join(list_external_downloaders()),
'default_key': 'default',
'process': compat_shlex_split
},
help=(
}, help=(
'Give these arguments to the external downloader. '
'Specify the downloader name and the arguments separated by a colon ":". '
'You can use this option multiple times to give different arguments to different downloaders '
@@ -788,21 +775,25 @@ def parseOpts(overrideArguments=None):
verbosity.add_option(
'-q', '--quiet',
action='store_true', dest='quiet', default=False,
help='Activate quiet mode')
help='Activate quiet mode. If used with --verbose, print the log to stderr')
verbosity.add_option(
'--no-warnings',
dest='no_warnings', action='store_true', default=False,
help='Ignore warnings')
verbosity.add_option(
'-s', '--simulate',
action='store_true', dest='simulate', default=False,
action='store_true', dest='simulate', default=None,
help='Do not download the video and do not write anything to disk')
verbosity.add_option(
'--no-simulate',
action='store_false', dest='simulate',
help='Download the video even if printing/listing options are used')
verbosity.add_option(
'--ignore-no-formats-error',
action='store_true', dest='ignore_no_formats_error', default=False,
help=(
'Ignore "No video formats" error. Usefull for extracting metadata '
'even if the video is not actually available for download (experimental)'))
'even if the videos are not actually available for download (experimental)'))
verbosity.add_option(
'--no-ignore-no-formats-error',
action='store_false', dest='ignore_no_formats_error',
@@ -812,12 +803,11 @@ def parseOpts(overrideArguments=None):
action='store_true', dest='skip_download', default=False,
help='Do not download the video but write all related files (Alias: --no-download)')
verbosity.add_option(
'-O', '--print', metavar='TEMPLATE',
action='callback', dest='forceprint', type='str', default=[],
callback=_list_from_options_callback, callback_kwargs={'delim': None},
'-O', '--print',
metavar='TEMPLATE', action='append', dest='forceprint',
help=(
'Simulate, quiet but print the given fields. Either a field name '
'or similar formatting as the output template can be used'))
'Quiet, but print the given fields for each video. Simulate unless --no-simulate is used. '
'Either a field name or same syntax as the output template can be used'))
verbosity.add_option(
'-g', '--get-url',
action='store_true', dest='geturl', default=False,
@@ -853,17 +843,17 @@ def parseOpts(overrideArguments=None):
verbosity.add_option(
'-j', '--dump-json',
action='store_true', dest='dumpjson', default=False,
help='Simulate, quiet but print JSON information. See "OUTPUT TEMPLATE" for a description of available keys')
help='Quiet, but print JSON information for each video. Simulate unless --no-simulate is used. See "OUTPUT TEMPLATE" for a description of available keys')
verbosity.add_option(
'-J', '--dump-single-json',
action='store_true', dest='dump_single_json', default=False,
help=(
'Simulate, quiet but print JSON information for each command-line argument. '
'If the URL refers to a playlist, dump the whole playlist information in a single line'))
'Quiet, but print JSON information for each url or infojson passed. Simulate unless --no-simulate is used. '
'If the URL refers to a playlist, the whole playlist information is dumped in a single line'))
verbosity.add_option(
'--print-json',
action='store_true', dest='print_json', default=False,
help='Be quiet and print the video information as JSON (video is still being downloaded)')
help=optparse.SUPPRESS_HELP)
verbosity.add_option(
'--force-write-archive', '--force-write-download-archive', '--force-download-archive',
action='store_true', dest='force_write_download_archive', default=False,
@@ -924,14 +914,16 @@ def parseOpts(overrideArguments=None):
action='store_true', dest='useid', help=optparse.SUPPRESS_HELP)
filesystem.add_option(
'-P', '--paths',
metavar='TYPES:PATH', dest='paths', default={}, type='str',
metavar='[TYPES:]PATH', dest='paths', default={}, type='str',
action='callback', callback=_dict_from_options_callback,
callback_kwargs={'allowed_keys': 'home|temp|%s' % '|'.join(OUTTMPL_TYPES.keys())},
help=(
callback_kwargs={
'allowed_keys': 'home|temp|%s' % '|'.join(OUTTMPL_TYPES.keys()),
'default_key': 'home'
}, help=(
'The paths where the files should be downloaded. '
'Specify the type of file and the path separated by a colon ":". '
'All the same types as --output are supported. '
'Additionally, you can also provide "home" and "temp" paths. '
'Additionally, you can also provide "home" (default) and "temp" paths. '
'All intermediary files are first downloaded to the temp path and '
'then the final files are moved over to the home path after download is finished. '
'This option is ignored if --output is an absolute path'))
@@ -942,8 +934,7 @@ def parseOpts(overrideArguments=None):
callback_kwargs={
'allowed_keys': '|'.join(OUTTMPL_TYPES.keys()),
'default_key': 'default'
},
help='Output filename template; see "OUTPUT TEMPLATE" for details')
}, help='Output filename template; see "OUTPUT TEMPLATE" for details')
filesystem.add_option(
'--output-na-placeholder',
dest='outtmpl_na_placeholder', metavar='TEXT', default='NA',
@@ -1062,7 +1053,7 @@ def parseOpts(overrideArguments=None):
help='Do not write playlist metadata when using --write-info-json, --write-description etc.')
filesystem.add_option(
'--clean-infojson',
action='store_true', dest='clean_infojson', default=True,
action='store_true', dest='clean_infojson', default=None,
help=(
'Remove some private fields such as filenames from the infojson. '
'Note that it could still contain some personal information (default)'))
@@ -1133,7 +1124,7 @@ def parseOpts(overrideArguments=None):
thumbnail.add_option(
'--list-thumbnails',
action='store_true', dest='list_thumbnails', default=False,
help='Simulate and list all available thumbnail formats')
help='List available thumbnails of each video. Simulate unless --no-simulate is used')
link = optparse.OptionGroup(parser, 'Internet Shortcut Options')
link.add_option(
@@ -1189,8 +1180,7 @@ def parseOpts(overrideArguments=None):
'allowed_keys': r'\w+(?:\+\w+)?', 'default_key': 'default-compat',
'process': compat_shlex_split,
'multiple_keys': False
},
help=(
}, help=(
'Give these arguments to the postprocessors. '
'Specify the postprocessor/executable name and the arguments separated by a colon ":" '
'to give the argument to the specified postprocessor/executable. Supported PP are: '
@@ -1250,10 +1240,14 @@ def parseOpts(overrideArguments=None):
help=optparse.SUPPRESS_HELP)
postproc.add_option(
'--parse-metadata',
metavar='FROM:TO', dest='metafromfield', action='append',
metavar='FROM:TO', dest='parse_metadata', action='append',
help=(
'Parse additional metadata like title/artist from other fields; '
'see "MODIFYING METADATA" for details'))
postproc.add_option(
'--replace-in-metadata',
dest='parse_metadata', metavar='FIELDS REGEX REPLACE', action='append', nargs=3,
help='Replace text in a metadata field using the given regex. This option can be used multiple times')
postproc.add_option(
'--xattrs',
action='store_true', dest='xattrs', default=False,
@@ -1280,17 +1274,29 @@ def parseOpts(overrideArguments=None):
dest='ffmpeg_location',
help='Location of the ffmpeg binary; either the path to the binary or its containing directory')
postproc.add_option(
'--exec',
metavar='CMD', dest='exec_cmd',
'--exec', metavar='CMD',
action='append', dest='exec_cmd',
help=(
'Execute a command on the file after downloading and post-processing. '
'Similar syntax to the output template can be used to pass any field as arguments to the command. '
'Same syntax as the output template can be used to pass any field as arguments to the command. '
'An additional field "filepath" that contains the final path of the downloaded file is also available. '
'If no fields are passed, "%(filepath)s" is appended to the end of the command'))
'If no fields are passed, %(filepath)q is appended to the end of the command. '
'This option can be used multiple times'))
postproc.add_option(
'--exec-before-download',
metavar='CMD', dest='exec_before_dl_cmd',
help='Execute a command before the actual download. The syntax is the same as --exec')
'--no-exec',
action='store_const', dest='exec_cmd', const=[],
help='Remove any previously defined --exec')
postproc.add_option(
'--exec-before-download', metavar='CMD',
action='append', dest='exec_before_dl_cmd',
help=(
'Execute a command before the actual download. '
'The syntax is the same as --exec but "filepath" is not available. '
'This option can be used multiple times'))
postproc.add_option(
'--no-exec-before-download',
action='store_const', dest='exec_before_dl_cmd', const=[],
help='Remove any previously defined --exec-before-download')
postproc.add_option(
'--convert-subs', '--convert-sub', '--convert-subtitles',
metavar='FORMAT', dest='convertsubtitles', default=None,
@@ -1374,7 +1380,7 @@ def parseOpts(overrideArguments=None):
'--no-hls-split-discontinuity',
dest='hls_split_discontinuity', action='store_false',
help='Do not split HLS playlists to different formats at discontinuities such as ad breaks (default)')
_extractor_arg_parser = lambda key, vals='': (key.strip().lower(), [val.strip() for val in vals.split(',')])
_extractor_arg_parser = lambda key, vals='': (key.strip().lower().replace('-', '_'), [val.strip() for val in vals.split(',')])
extractor.add_option(
'--extractor-args',
metavar='KEY:ARGS', dest='extractor_args', default={}, type='str',
@@ -1383,8 +1389,7 @@ def parseOpts(overrideArguments=None):
'multiple_keys': False,
'process': lambda val: dict(
_extractor_arg_parser(*arg.split('=', 1)) for arg in val.split(';'))
},
help=(
}, help=(
'Pass these arguments to the extractor. See "EXTRACTOR ARGUMENTS" for details. '
'You can use this option multiple times to give arguments for different extractors'))
extractor.add_option(

View File

@@ -19,9 +19,12 @@ from .ffmpeg import (
FFmpegVideoRemuxerPP,
)
from .xattrpp import XAttrMetadataPP
from .execafterdownload import ExecAfterDownloadPP
from .metadatafromfield import MetadataFromFieldPP
from .metadatafromfield import MetadataFromTitlePP
from .exec import ExecPP, ExecAfterDownloadPP
from .metadataparser import (
MetadataFromFieldPP,
MetadataFromTitlePP,
MetadataParserPP,
)
from .movefilesafterdownload import MoveFilesAfterDownloadPP
from .sponskrub import SponSkrubPP
@@ -33,6 +36,7 @@ def get_postprocessor(key):
__all__ = [
'FFmpegPostProcessor',
'EmbedThumbnailPP',
'ExecPP',
'ExecAfterDownloadPP',
'FFmpegEmbedSubtitlePP',
'FFmpegExtractAudioPP',
@@ -48,6 +52,7 @@ __all__ = [
'FFmpegThumbnailsConvertorPP',
'FFmpegVideoConvertorPP',
'FFmpegVideoRemuxerPP',
'MetadataParserPP',
'MetadataFromFieldPP',
'MetadataFromTitlePP',
'MoveFilesAfterDownloadPP',

View File

@@ -7,23 +7,20 @@ from ..compat import compat_shlex_quote
from ..utils import (
encodeArgument,
PostProcessingError,
variadic,
)
class ExecAfterDownloadPP(PostProcessor):
class ExecPP(PostProcessor):
def __init__(self, downloader, exec_cmd):
super(ExecAfterDownloadPP, self).__init__(downloader)
self.exec_cmd = exec_cmd
@classmethod
def pp_key(cls):
return 'Exec'
PostProcessor.__init__(self, downloader)
self.exec_cmd = variadic(exec_cmd)
def parse_cmd(self, cmd, info):
tmpl, tmpl_dict = self._downloader.prepare_outtmpl(cmd, info)
if tmpl_dict: # if there are no replacements, tmpl_dict = {}
return tmpl % tmpl_dict
return self._downloader.escape_outtmpl(tmpl) % tmpl_dict
# If no replacements are found, replace {} for backard compatibility
if '{}' not in cmd:
@@ -32,9 +29,14 @@ class ExecAfterDownloadPP(PostProcessor):
info.get('filepath') or info['_filename']))
def run(self, info):
cmd = self.parse_cmd(self.exec_cmd, info)
self.to_screen('Executing command: %s' % cmd)
retCode = subprocess.call(encodeArgument(cmd), shell=True)
if retCode != 0:
raise PostProcessingError('Command returned error code %d' % retCode)
for tmpl in self.exec_cmd:
cmd = self.parse_cmd(tmpl, info)
self.to_screen('Executing command: %s' % cmd)
retCode = subprocess.call(encodeArgument(cmd), shell=True)
if retCode != 0:
raise PostProcessingError('Command returned error code %d' % retCode)
return [], info
class ExecAfterDownloadPP(ExecPP): # for backward compatibility
pass

View File

@@ -109,21 +109,19 @@ class FFmpegPostProcessor(PostProcessor):
'Continuing without ffmpeg.' % (location))
self._versions = {}
return
elif not os.path.isdir(location):
elif os.path.isdir(location):
dirname, basename = location, None
else:
basename = os.path.splitext(os.path.basename(location))[0]
if basename not in programs:
self.report_warning(
'Cannot identify executable %s, its basename should be one of %s. '
'Continuing without ffmpeg.' %
(location, ', '.join(programs)))
self._versions = {}
return None
location = os.path.dirname(os.path.abspath(location))
basename = next((p for p in programs if basename.startswith(p)), 'ffmpeg')
dirname = os.path.dirname(os.path.abspath(location))
if basename in ('ffmpeg', 'ffprobe'):
prefer_ffmpeg = True
self._paths = dict(
(p, os.path.join(location, p)) for p in programs)
(p, os.path.join(dirname, p)) for p in programs)
if basename:
self._paths[basename] = location
self._versions = dict(
(p, get_ffmpeg_version(self._paths[p])) for p in programs)
if self._versions is None:

View File

@@ -1,74 +0,0 @@
from __future__ import unicode_literals
import re
from .common import PostProcessor
from ..compat import compat_str
class MetadataFromFieldPP(PostProcessor):
regex = r'(?P<in>.*?)(?<!\\):(?P<out>.+)$'
def __init__(self, downloader, formats):
PostProcessor.__init__(self, downloader)
assert isinstance(formats, (list, tuple))
self._data = []
for f in formats:
assert isinstance(f, compat_str)
match = re.match(self.regex, f)
assert match is not None
inp = match.group('in').replace('\\:', ':')
self._data.append({
'in': inp,
'out': match.group('out'),
'tmpl': self.field_to_template(inp),
'regex': self.format_to_regex(match.group('out')),
})
@staticmethod
def field_to_template(tmpl):
if re.match(r'[a-zA-Z_]+$', tmpl):
return '%%(%s)s' % tmpl
return tmpl
@staticmethod
def format_to_regex(fmt):
r"""
Converts a string like
'%(title)s - %(artist)s'
to a regex like
'(?P<title>.+)\ \-\ (?P<artist>.+)'
"""
if not re.search(r'%\(\w+\)s', fmt):
return fmt
lastpos = 0
regex = ''
# replace %(..)s with regex group and escape other string parts
for match in re.finditer(r'%\((\w+)\)s', fmt):
regex += re.escape(fmt[lastpos:match.start()])
regex += r'(?P<%s>.+)' % match.group(1)
lastpos = match.end()
if lastpos < len(fmt):
regex += re.escape(fmt[lastpos:])
return regex
def run(self, info):
for dictn in self._data:
tmpl, tmpl_dict = self._downloader.prepare_outtmpl(dictn['tmpl'], info)
data_to_parse = tmpl % tmpl_dict
self.write_debug('Searching for r"%s" in %s' % (dictn['regex'], dictn['tmpl']))
match = re.search(dictn['regex'], data_to_parse)
if match is None:
self.report_warning('Could not interpret video %s as "%s"' % (dictn['in'], dictn['out']))
continue
for attribute, value in match.groupdict().items():
info[attribute] = value
self.to_screen('parsed %s from "%s": %s' % (attribute, dictn['tmpl'], value if value is not None else 'NA'))
return [], info
class MetadataFromTitlePP(MetadataFromFieldPP): # for backward compatibility
def __init__(self, downloader, titleformat):
super(MetadataFromTitlePP, self).__init__(downloader, ['%%(title)s:%s' % titleformat])
self._titleformat = titleformat
self._titleregex = self._data[0]['regex']

View File

@@ -0,0 +1,117 @@
import re
from enum import Enum
from .common import PostProcessor
class MetadataParserPP(PostProcessor):
class Actions(Enum):
INTERPRET = 'interpretter'
REPLACE = 'replacer'
def __init__(self, downloader, actions):
PostProcessor.__init__(self, downloader)
self._actions = []
for f in actions:
action = f[0]
assert isinstance(action, self.Actions)
self._actions.append(getattr(self, action._value_)(*f[1:]))
@classmethod
def validate_action(cls, action, *data):
''' Each action can be:
(Actions.INTERPRET, from, to) OR
(Actions.REPLACE, field, search, replace)
'''
if not isinstance(action, cls.Actions):
raise ValueError(f'{action!r} is not a valid action')
getattr(cls, action._value_)(cls, *data)
@staticmethod
def field_to_template(tmpl):
if re.match(r'[a-zA-Z_]+$', tmpl):
return f'%({tmpl})s'
return tmpl
@staticmethod
def format_to_regex(fmt):
r"""
Converts a string like
'%(title)s - %(artist)s'
to a regex like
'(?P<title>.+)\ \-\ (?P<artist>.+)'
"""
if not re.search(r'%\(\w+\)s', fmt):
return fmt
lastpos = 0
regex = ''
# replace %(..)s with regex group and escape other string parts
for match in re.finditer(r'%\((\w+)\)s', fmt):
regex += re.escape(fmt[lastpos:match.start()])
regex += rf'(?P<{match.group(1)}>.+)'
lastpos = match.end()
if lastpos < len(fmt):
regex += re.escape(fmt[lastpos:])
return regex
def run(self, info):
for f in self._actions:
f(info)
return [], info
def interpretter(self, inp, out):
def f(info):
outtmpl, tmpl_dict = self._downloader.prepare_outtmpl(template, info)
data_to_parse = self._downloader.escape_outtmpl(outtmpl) % tmpl_dict
self.write_debug(f'Searching for r{out_re.pattern!r} in {template!r}')
match = out_re.search(data_to_parse)
if match is None:
self.report_warning('Could not interpret {inp!r} as {out!r}')
return
for attribute, value in match.groupdict().items():
info[attribute] = value
self.to_screen('Parsed %s from %r: %r' % (attribute, template, value if value is not None else 'NA'))
template = self.field_to_template(inp)
out_re = re.compile(self.format_to_regex(out))
return f
def replacer(self, field, search, replace):
def f(info):
val = info.get(field)
if val is None:
self.report_warning(f'Video does not have a {field}')
return
elif not isinstance(val, str):
self.report_warning(f'Cannot replace in field {field} since it is a {type(val).__name__}')
return
self.write_debug(f'Replacing all r{search!r} in {field} with {replace!r}')
info[field], n = search_re.subn(replace, val)
if n:
self.to_screen(f'Changed {field} to: {info[field]}')
else:
self.to_screen(f'Did not find r{search!r} in {field}')
search_re = re.compile(search)
return f
class MetadataFromFieldPP(MetadataParserPP):
@classmethod
def to_action(cls, f):
match = re.match(r'(?P<in>.*?)(?<!\\):(?P<out>.+)$', f)
if match is None:
raise ValueError(f'it should be FROM:TO, not {f!r}')
return (
cls.Actions.INTERPRET,
match.group('in').replace('\\:', ':'),
match.group('out'))
def __init__(self, downloader, formats):
MetadataParserPP.__init__(self, downloader, [self.to_action(f) for f in formats])
class MetadataFromTitlePP(MetadataParserPP): # for backward compatibility
def __init__(self, downloader, titleformat):
MetadataParserPP.__init__(self, downloader, [(self.Actions.INTERPRET, 'title', titleformat)])

View File

@@ -1836,7 +1836,7 @@ def write_json_file(obj, fn):
try:
with tf:
json.dump(obj, tf, default=repr)
json.dump(obj, tf)
if sys.platform == 'win32':
# Need to remove existing file on Windows, else os.rename raises
# WindowsError or FileExistsError.
@@ -3993,28 +3993,27 @@ class LazyList(collections.abc.Sequence):
@staticmethod
def __reverse_index(x):
return -(x + 1)
return None if x is None else -(x + 1)
def __getitem__(self, idx):
if isinstance(idx, slice):
step = idx.step or 1
start = idx.start if idx.start is not None else 0 if step > 0 else -1
stop = idx.stop if idx.stop is not None else -1 if step > 0 else 0
if self.__reversed:
(start, stop), step = map(self.__reverse_index, (start, stop)), -step
idx = slice(start, stop, step)
idx = slice(self.__reverse_index(idx.start), self.__reverse_index(idx.stop), -(idx.step or 1))
start, stop, step = idx.start, idx.stop, idx.step or 1
elif isinstance(idx, int):
if self.__reversed:
idx = self.__reverse_index(idx)
start = stop = idx
start, stop, step = idx, idx, 0
else:
raise TypeError('indices must be integers or slices')
if start < 0 or stop < 0:
if ((start or 0) < 0 or (stop or 0) < 0
or (start is None and step < 0)
or (stop is None and step > 0)):
# We need to consume the entire iterable to be able to slice from the end
# Obviously, never use this with infinite iterables
return self.__exhaust()[idx]
n = max(start, stop) - len(self.__cache) + 1
n = max(start or 0, stop or 0) - len(self.__cache) + 1
if n > 0:
self.__cache.extend(itertools.islice(self.__iterable, n))
return self.__cache[idx]
@@ -4042,15 +4041,31 @@ class LazyList(collections.abc.Sequence):
return repr(self.exhaust())
class PagedList(object):
class PagedList:
def __len__(self):
# This is only useful for tests
return len(self.getslice())
def getslice(self, start, end):
def __init__(self, pagefunc, pagesize, use_cache=True):
self._pagefunc = pagefunc
self._pagesize = pagesize
self._use_cache = use_cache
self._cache = {}
def getpage(self, pagenum):
page_results = self._cache.get(pagenum) or list(self._pagefunc(pagenum))
if self._use_cache:
self._cache[pagenum] = page_results
return page_results
def getslice(self, start=0, end=None):
return list(self._getslice(start, end))
def _getslice(self, start, end):
raise NotImplementedError('This method must be implemented by subclasses')
def __getitem__(self, idx):
# NOTE: cache must be enabled if this is used
if not isinstance(idx, int) or idx < 0:
raise TypeError('indices must be non-negative integers')
entries = self.getslice(idx, idx + 1)
@@ -4058,42 +4073,26 @@ class PagedList(object):
class OnDemandPagedList(PagedList):
def __init__(self, pagefunc, pagesize, use_cache=True):
self._pagefunc = pagefunc
self._pagesize = pagesize
self._use_cache = use_cache
if use_cache:
self._cache = {}
def getslice(self, start=0, end=None):
res = []
def _getslice(self, start, end):
for pagenum in itertools.count(start // self._pagesize):
firstid = pagenum * self._pagesize
nextfirstid = pagenum * self._pagesize + self._pagesize
if start >= nextfirstid:
continue
page_results = None
if self._use_cache:
page_results = self._cache.get(pagenum)
if page_results is None:
page_results = list(self._pagefunc(pagenum))
if self._use_cache:
self._cache[pagenum] = page_results
startv = (
start % self._pagesize
if firstid <= start < nextfirstid
else 0)
endv = (
((end - 1) % self._pagesize) + 1
if (end is not None and firstid <= end <= nextfirstid)
else None)
page_results = self.getpage(pagenum)
if startv != 0 or endv is not None:
page_results = page_results[startv:endv]
res.extend(page_results)
yield from page_results
# A little optimization - if current page is not "full", ie. does
# not contain page_size videos then we can assume that this page
@@ -4106,36 +4105,31 @@ class OnDemandPagedList(PagedList):
# break out early as well
if end == nextfirstid:
break
return res
class InAdvancePagedList(PagedList):
def __init__(self, pagefunc, pagecount, pagesize):
self._pagefunc = pagefunc
self._pagecount = pagecount
self._pagesize = pagesize
PagedList.__init__(self, pagefunc, pagesize, True)
def getslice(self, start=0, end=None):
res = []
def _getslice(self, start, end):
start_page = start // self._pagesize
end_page = (
self._pagecount if end is None else (end // self._pagesize + 1))
skip_elems = start - start_page * self._pagesize
only_more = None if end is None else end - start
for pagenum in range(start_page, end_page):
page = list(self._pagefunc(pagenum))
page_results = self.getpage(pagenum)
if skip_elems:
page = page[skip_elems:]
page_results = page_results[skip_elems:]
skip_elems = None
if only_more is not None:
if len(page) < only_more:
only_more -= len(page)
if len(page_results) < only_more:
only_more -= len(page_results)
else:
page = page[:only_more]
res.extend(page)
yield from page_results[:only_more]
break
res.extend(page)
return res
yield from page_results
def uppercase_escape(s):
@@ -4438,8 +4432,8 @@ OUTTMPL_TYPES = {
# As of [1] format syntax is:
# %[mapping_key][conversion_flags][minimum_width][.precision][length_modifier]type
# 1. https://docs.python.org/2/library/stdtypes.html#string-formatting
STR_FORMAT_RE = r'''(?x)
(?<!%)
STR_FORMAT_RE_TMPL = r'''(?x)
(?<!%)(?P<prefix>(?:%%)*)
%
(?P<has_key>\((?P<key>{0})\))? # mapping key
(?P<format>
@@ -4447,11 +4441,14 @@ STR_FORMAT_RE = r'''(?x)
(?:\d+)? # minimum field width (optional)
(?:\.\d+)? # precision (optional)
[hlL]? # length modifier (optional)
[diouxXeEfFgGcrs] # conversion type
{1} # conversion type
)
'''
STR_FORMAT_TYPES = 'diouxXeEfFgGcrs'
def limit_length(s, length):
""" Add ellipses to overly long strings """
if s is None:
@@ -4542,7 +4539,7 @@ def parse_codecs(codecs_str):
if not codecs_str:
return {}
split_codecs = list(filter(None, map(
lambda str: str.strip(), codecs_str.strip().strip(',').split(','))))
str.strip, codecs_str.strip().strip(',').split(','))))
vcodec, acodec = None, None
for full_codec in split_codecs:
codec = full_codec.split('.')[0]
@@ -4661,27 +4658,39 @@ def render_table(header_row, data, delim=False, extraGap=0, hideEmpty=False):
def _match_one(filter_part, dct):
COMPARISON_OPERATORS = {
'<': operator.lt,
'<=': operator.le,
'>': operator.gt,
'>=': operator.ge,
'=': operator.eq,
'!=': operator.ne,
# TODO: Generalize code with YoutubeDL._build_format_filter
STRING_OPERATORS = {
'*=': operator.contains,
'^=': lambda attr, value: attr.startswith(value),
'$=': lambda attr, value: attr.endswith(value),
'~=': lambda attr, value: re.search(value, attr),
}
COMPARISON_OPERATORS = {
**STRING_OPERATORS,
'<=': operator.le, # "<=" must be defined above "<"
'<': operator.lt,
'>=': operator.ge,
'>': operator.gt,
'=': operator.eq,
}
operator_rex = re.compile(r'''(?x)\s*
(?P<key>[a-z_]+)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
\s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?:
(?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)|
(?P<quote>["\'])(?P<quotedstrval>(?:\\.|(?!(?P=quote)|\\).)+?)(?P=quote)|
(?P<strval>(?![0-9.])[a-z0-9A-Z]*)
(?P<quote>["\'])(?P<quotedstrval>.+?)(?P=quote)|
(?P<strval>.+?)
)
\s*$
''' % '|'.join(map(re.escape, COMPARISON_OPERATORS.keys())))
m = operator_rex.search(filter_part)
if m:
op = COMPARISON_OPERATORS[m.group('op')]
unnegated_op = COMPARISON_OPERATORS[m.group('op')]
if m.group('negation'):
op = lambda attr, value: not unnegated_op(attr, value)
else:
op = unnegated_op
actual_value = dct.get(m.group('key'))
if (m.group('quotedstrval') is not None
or m.group('strval') is not None
@@ -4691,14 +4700,13 @@ def _match_one(filter_part, dct):
# https://github.com/ytdl-org/youtube-dl/issues/11082).
or actual_value is not None and m.group('intval') is not None
and isinstance(actual_value, compat_str)):
if m.group('op') not in ('=', '!='):
raise ValueError(
'Operator %s does not support string values!' % m.group('op'))
comparison_value = m.group('quotedstrval') or m.group('strval') or m.group('intval')
quote = m.group('quote')
if quote is not None:
comparison_value = comparison_value.replace(r'\%s' % quote, quote)
else:
if m.group('op') in STRING_OPERATORS:
raise ValueError('Operator %s only supports string values!' % m.group('op'))
try:
comparison_value = int(m.group('intval'))
except ValueError:
@@ -4734,7 +4742,8 @@ def match_str(filter_str, dct):
""" Filter a dictionary with a simple string syntax. Returns True (=passes filter) or false """
return all(
_match_one(filter_part, dct) for filter_part in filter_str.split('&'))
_match_one(filter_part.replace(r'\&', '&'), dct)
for filter_part in re.split(r'(?<!\\)&', filter_str))
def match_filter_func(filter_str):
@@ -6147,8 +6156,11 @@ def to_high_limit_path(path):
return path
def format_field(obj, field, template='%s', ignore=(None, ''), default='', func=None):
val = obj.get(field, default)
def format_field(obj, field=None, template='%s', ignore=(None, ''), default='', func=None):
if field is None:
val = obj if obj is not None else default
else:
val = obj.get(field, default)
if func and val not in ignore:
val = func(val)
return template % val if val not in ignore else default
@@ -6244,11 +6256,13 @@ def traverse_obj(
# TODO: Write tests
'''
if not casesense:
_lower = lambda k: k.lower() if isinstance(k, str) else k
_lower = lambda k: (k.lower() if isinstance(k, str) else k)
path_list = (map(_lower, variadic(path)) for path in path_list)
def _traverse_obj(obj, path, _current_depth=0):
nonlocal depth
if obj is None:
return None
path = tuple(variadic(path))
for i, key in enumerate(path):
if isinstance(key, (list, tuple)):
@@ -6261,7 +6275,7 @@ def traverse_obj(
_current_depth += 1
depth = max(depth, _current_depth)
return [_traverse_obj(inner_obj, path[i + 1:], _current_depth) for inner_obj in obj]
elif isinstance(obj, dict):
elif isinstance(obj, dict) and not (is_user_input and key == ':'):
obj = (obj.get(key) if casesense or (key in obj)
else next((v for k, v in obj.items() if _lower(k) == key), None))
else:
@@ -6269,7 +6283,7 @@ def traverse_obj(
key = (int_or_none(key) if ':' not in key
else slice(*map(int_or_none, key.split(':'))))
if key == slice(None):
return _traverse_obj(obj, (..., *path[i + 1:]))
return _traverse_obj(obj, (..., *path[i + 1:]), _current_depth)
if not isinstance(key, (int, slice)):
return None
if not isinstance(obj, (list, tuple, LazyList)):

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2021.07.21'
__version__ = '2021.08.02'

View File

@@ -331,6 +331,26 @@ class CueBlock(Block):
'settings': self.settings,
}
def __eq__(self, other):
return self.as_json == other.as_json
@classmethod
def from_json(cls, json):
return cls(
id=json['id'],
start=json['start'],
end=json['end'],
text=json['text'],
settings=json['settings']
)
def hinges(self, other):
if self.text != other.text:
return False
if self.settings != other.settings:
return False
return self.start <= self.end == other.start <= other.end
def parse_fragment(frag_content):
"""