Compare commits

..

49 Commits

Author SHA1 Message Date
Sergey M․
e2fc6df169 release 2018.01.18 2018-01-18 23:41:44 +07:00
Sergey M․
68da3d033c [ChangeLog] Actualize 2018-01-18 23:39:15 +07:00
Varun
67408fe0e9 [soundcloud] Update client id (closes #15306) 2018-01-18 22:30:43 +07:00
Sergey M․
cad9caf76b [kamcord] Remove extractor (closes #15322) 2018-01-18 22:26:43 +07:00
Sergey M․
4471affc34 [spiegel] Add support for nexx videos (closes #15285) 2018-01-17 22:03:56 +07:00
Sergey M․
1370dba59f [twitch] Fix authentication and error capture (closes #14090, closes #15264) 2018-01-16 22:34:16 +07:00
Sergey M․
1d1d60f6dd [vk] Detect more errors due to copyright complaints (#15259) 2018-01-16 00:51:50 +07:00
Reto Kromer
a86922c470 [README.md] Clarify macOS name 2018-01-14 00:58:38 +07:00
Sergey M․
e11ccd76c6 release 2018.01.14 2018-01-14 00:13:56 +07:00
Sergey M․
dd896a6a07 [ChangeLog] Actualize 2018-01-14 00:10:04 +07:00
Sergey M․
391dd6f094 [youtube] Fix live streams extraction (closes #15202) 2018-01-14 00:03:22 +07:00
Sergey M․
0ce39bc542 [wdr] Fix test 2018-01-13 23:33:52 +07:00
Sergey M․
1915662d4f [wdr] Bypass geo restriction 2018-01-13 23:30:56 +07:00
Sergey M․
54e8f62e01 [wdr] Rework extractors (closes #14598) 2018-01-13 23:30:25 +07:00
Sebastian Leske
2d8bb80c60 [wdr:elefant] Add extractor 2018-01-13 23:29:36 +07:00
Sergey M․
df16e645f6 [gamestar] Fix issues (closes #15179) 2018-01-13 19:38:58 +07:00
Hendrik v. Raven
d4aedca3bd [gamestar] Add support for gamepro.de (closes #3384) 2018-01-13 19:36:59 +07:00
Sergey M․
47e2a9bc53 [viafree] Skip rtmp formats (closes #15232) 2018-01-13 18:47:47 +07:00
Chih-Hsuan Yen
e565a6386e Credit @scil for ximalaya extractor (#14687)
[ci skip]
2018-01-12 15:36:01 +08:00
Sergey M․
609850acfb [pandoratv] Add support for mobile URLs (closes #12441) 2018-01-11 23:10:18 +07:00
Sergey M․
64287560e4 [pandoratv] Add support for new URL format (closes #15131) 2018-01-11 23:06:56 +07:00
Chih-Hsuan Yen
37941fe204 [ChangeLog] Update after #14687
[ci skip]
2018-01-11 20:36:06 +08:00
scil
a90641fe87 [ximalaya_extractor] Add new extractor ximalaya (#14687)
* [ximalaya_extractor] Add new extractor

* format change according by flake8

* changes accoring to review by @yan12125 at github pull #14687

* change %d to %s in a temp str

* seond changes accoring to review by @yan12125 at github pull #1468

* improve TESTS about contains

* changes accoring to third review by @yan12125 at github pull #1468

* forth changes accoring to forth review by @yan12125 at github pull #1468
2018-01-11 20:35:09 +08:00
Sergey M․
1b79daffd9 [digg] Improve extraction 2018-01-10 22:19:51 +07:00
Sergey M․
e654829b4c [digg] Add extractor (closes #15214) 2018-01-10 21:24:22 +07:00
Sergey M․
2b4e1ace4a [limelight] Tolerate empty pc formats (closes #15150, closes #15151, closes #15207) 2018-01-10 05:39:57 +07:00
Sergey M․
310ea4661d [ndr:embed:base] Make separate formats extraction non fatal (closes #15203) 2018-01-09 22:04:50 +07:00
Chih-Hsuan Yen
5b23845125 Credit @sprhawk for the Weibo extractor (#15079) 2018-01-09 19:35:39 +08:00
Yen Chi Hsuan
0f71de0761 [ChangeLog] Update after #15079 2018-01-09 18:13:49 +08:00
Yen Chi Hsuan
4df1098c3f Merge branch 'sprhawk-weibo' 2018-01-09 18:13:11 +08:00
Yen Chi Hsuan
5eca00a2e3 [weibo] Misc improvements 2018-01-09 18:12:55 +08:00
Yen Chi Hsuan
1dd38dc0f4 Merge branch 'weibo' of https://github.com/sprhawk/youtube-dl into sprhawk-weibo 2018-01-09 17:31:52 +08:00
Sergey M․
8005dc68cb [ok] Add support for live streams 2018-01-08 21:53:03 +07:00
Remita Amine
a39e15c516 [canalplus] fix extraction(closes #15072) 2018-01-07 22:15:44 +01:00
Chih-Hsuan Yen
7643916a37 [ChangeLog] update after #15188
[ci skip]
2018-01-08 01:32:13 +08:00
Luca Steeb
3a513f29ad fix bilibili extraction (closes #15171) 2018-01-08 01:30:04 +08:00
sprhawk
6648fd8ad6 changed to use .get to get field from json object 2018-01-01 18:33:14 +08:00
sprhawk
48058d82dc replace unused _download_webpage_handle with _download_webpage 2017-12-30 01:14:21 +08:00
sprhawk
6a41a12d29 replace split with strip_jsonp 2017-12-30 01:11:30 +08:00
sprhawk
5c97ec5ff5 replace urlencode.encode with urlencode_postdata 2017-12-30 01:08:56 +08:00
sprhawk
c33de004e1 Merge branch 'master' of github.com:rg3/youtube-dl into weibo 2017-12-26 22:27:26 +08:00
sprhawk
42a1012c77 fix according to "https://github.com/rg3/youtube-dl/pull/15079#discussion_r158688607" 2017-12-26 22:26:01 +08:00
sprhawk
2593651224 fix compat_urllib_request for python2.7 2017-12-26 16:46:01 +08:00
sprhawk
951043724f re-format code to pass flake8 2017-12-26 16:38:51 +08:00
sprhawk
d2be5bb5af change to use compat urllib 2017-12-26 16:28:47 +08:00
sprhawk
447a5a710d added weibo mobile site support 2017-12-26 16:24:56 +08:00
sprhawk
0c69958844 add other properties; remove print verbose 2017-12-11 16:02:14 +08:00
sprhawk
3281af3464 a working version 2017-12-11 15:56:54 +08:00
sprhawk
29ac31afaf simply get the correct webpage, but not parsed to extract information 2017-12-11 12:26:19 +08:00
26 changed files with 771 additions and 325 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.01.07*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.01.07**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.01.18*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.01.18**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.01.07
[debug] youtube-dl version 2018.01.18
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -231,3 +231,5 @@ John Dong
Tatsuyuki Ishi
Daniel Weber
Kay Bouché
Yang Hongbo
Lei Wang

View File

@@ -1,3 +1,34 @@
version 2018.01.18
Extractors
* [soundcloud] Update client id (#15306)
- [kamcord] Remove extractor (#15322)
+ [spiegel] Add support for nexx videos (#15285)
* [twitch] Fix authentication and error capture (#14090, #15264)
* [vk] Detect more errors due to copyright complaints (#15259)
version 2018.01.14
Extractors
* [youtube] Fix live streams extraction (#15202)
* [wdr] Bypass geo restriction
* [wdr] Rework extractors (#14598)
+ [wdr] Add support for wdrmaus.de/elefantenseite (#14598)
+ [gamestar] Add support for gamepro.de (#3384)
* [viafree] Skip rtmp formats (#15232)
+ [pandoratv] Add support for mobile URLs (#12441)
+ [pandoratv] Add support for new URL format (#15131)
+ [ximalaya] Add support for ximalaya.com (#14687)
+ [digg] Add support for digg.com (#15214)
* [limelight] Tolerate empty pc formats (#15150, #15151, #15207)
* [ndr:embed:base] Make separate formats extraction non fatal (#15203)
+ [weibo] Add extractor (#15079)
+ [ok] Add support for live streams
* [canalplus] Fix extraction (#15072)
* [bilibili] Fix extraction (#15188)
version 2018.01.07
Core

View File

@@ -46,7 +46,7 @@ Or with [MacPorts](https://www.macports.org/):
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
# DESCRIPTION
**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on Mac OS X. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on macOS. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
youtube-dl [OPTIONS] URL [URL...]
@@ -863,7 +863,7 @@ Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, Mac OS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).

View File

@@ -128,7 +128,7 @@
- **CamdemyFolder**
- **CamWithHer**
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canalplus**: mycanal.fr and piwiplus.fr
- **Canvas**
- **CanvasEen**: canvas.be and een.be
- **CarambaTV**
@@ -210,6 +210,7 @@
- **defense.gouv.fr**
- **democracynow**
- **DHM**: Filmarchiv - Deutsches Historisches Museum
- **Digg**
- **DigitallySpeaking**
- **Digiteka**
- **Discovery**
@@ -382,7 +383,6 @@
- **JWPlatform**
- **Kakao**
- **Kaltura**
- **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play
- **Kankan**
- **Karaoketv**
@@ -773,7 +773,6 @@
- **Sport5**
- **SportBoxEmbed**
- **SportDeutschland**
- **Sportschau**
- **Sprout**
- **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR**
@@ -1002,10 +1001,14 @@
- **WatchIndianPorn**: Watch Indian Porn
- **WDR**
- **wdr:mobile**
- **WDRElefant**
- **WDRPage**
- **Webcaster**
- **WebcasterFeed**
- **WebOfStories**
- **WebOfStoriesPlaylist**
- **Weibo**
- **WeiboMobile**
- **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud
- **Wimp**
@@ -1025,6 +1028,8 @@
- **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集
- **xiami:song**: 虾米音乐
- **ximalaya**: 喜马拉雅FM
- **ximalaya:album**: 喜马拉雅FM 专辑
- **XMinus**
- **XNXX**
- **Xstream**

View File

@@ -102,6 +102,7 @@ class BiliBiliIE(InfoExtractor):
video_id, anime_id, compat_urlparse.urljoin(url, '//bangumi.bilibili.com/anime/%s' % anime_id)))
headers = {
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Referer': url
}
headers.update(self.geo_verification_headers())
@@ -116,10 +117,15 @@ class BiliBiliIE(InfoExtractor):
payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
headers = {
'Referer': url
}
headers.update(self.geo_verification_headers())
video_info = self._download_json(
'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
video_id, note='Downloading video info page',
headers=self.geo_verification_headers())
headers=headers)
if 'durl' not in video_info:
self._report_error(video_info)

View File

@@ -4,59 +4,36 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import (
dict_get,
# ExtractorError,
# HEADRequest,
int_or_none,
qualities,
remove_end,
unified_strdate,
)
class CanalplusIE(InfoExtractor):
IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
_VALID_URL = r'''(?x)
https?://
(?:
(?:
(?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv|
(?:www\.)?c8\.fr|
(?:www\.)?d17\.tv|
(?:(?:football|www)\.)?cstar\.fr|
(?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
player\.canalplus\.fr/#/(?P<id>\d+)
)
'''
IE_DESC = 'mycanal.fr and piwiplus.fr'
_VALID_URL = r'https?://(?:www\.)?(?P<site>mycanal|piwiplus)\.fr/(?:[^/]+/)*(?P<display_id>[^?/]+)(?:\.html\?.*\bvid=|/p/)(?P<id>\d+)'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
_SITE_ID_MAP = {
'canalplus': 'cplus',
'mycanal': 'cplus',
'piwiplus': 'teletoon',
'd8': 'd8',
'c8': 'd8',
'd17': 'd17',
'cstar': 'd17',
'itele': 'itele',
}
# Only works for direct mp4 URLs
_GEO_COUNTRIES = ['FR']
_TESTS = [{
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
'url': 'https://www.mycanal.fr/d17-emissions/lolywood/p/1397061',
'info_dict': {
'id': '1405510',
'display_id': 'pid1830-c-zapping',
'id': '1397061',
'display_id': 'lolywood',
'ext': 'mp4',
'title': 'Zapping - 02/07/2016',
'description': 'Le meilleur de toutes les chaînes, tous les jours',
'upload_date': '20160702',
'title': 'Euro 2016 : Je préfère te prévenir - Lolywood - Episode 34',
'description': 'md5:7d97039d455cb29cdba0d652a0efaa5e',
'upload_date': '20160602',
},
}, {
# geo restricted, bypassed
@@ -70,64 +47,12 @@ class CanalplusIE(InfoExtractor):
'upload_date': '20140724',
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
}, {
# geo restricted, bypassed
'url': 'http://www.c8.fr/c8-divertissement/ms-touche-pas-a-mon-poste/pid6318-videos-integrales.html?vid=1443684',
'md5': 'bb6f9f343296ab7ebd88c97b660ecf8d',
'info_dict': {
'id': '1443684',
'display_id': 'pid6318-videos-integrales',
'ext': 'mp4',
'title': 'Guess my iep ! - TPMP - 07/04/2017',
'description': 'md5:6f005933f6e06760a9236d9b3b5f17fa',
'upload_date': '20170407',
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
}, {
'url': 'http://www.itele.fr/chroniques/invite-michael-darmon/rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'info_dict': {
'id': '1420176',
'display_id': 'rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'ext': 'mp4',
'title': 'L\'invité de Michaël Darmon du 14/10/2016 - ',
'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
'upload_date': '20161014',
},
}, {
'url': 'http://football.cstar.fr/cstar-minisite-foot/pid7566-feminines-videos.html?vid=1416769',
'info_dict': {
'id': '1416769',
'display_id': 'pid7566-feminines-videos',
'ext': 'mp4',
'title': 'France - Albanie : les temps forts de la soirée - 20/09/2016',
'description': 'md5:c3f30f2aaac294c1c969b3294de6904e',
'upload_date': '20160921',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://m.canalplus.fr/?vid=1398231',
'only_matching': True,
}, {
'url': 'http://www.d17.tv/emissions/pid8303-lolywood.html?vid=1397061',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site, display_id, video_id = re.match(self._VALID_URL, url).groups()
site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
# Beware, some subclasses do not define an id group
display_id = remove_end(dict_get(mobj.groupdict(), ('display_id', 'id', 'vid')), '.html')
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
r'id=["\']canal_video_player(?P<id>\d+)',
r'data-video=["\'](?P<id>\d+)'],
webpage, 'video id', default=mobj.group('vid'), group='id')
site_id = self._SITE_ID_MAP[site]
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
video_data = self._download_json(info_url, video_id, 'Downloading video JSON')
@@ -161,7 +86,7 @@ class CanalplusIE(InfoExtractor):
format_url + '?hdcore=2.11.3', video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
# the secret extracted ya function in http://player.canalplus.fr/common/js/canalPlayer.js
# the secret extracted from ya function in http://player.canalplus.fr/common/js/canalPlayer.js
'url': format_url + '?secret=pqzerjlsmdkjfoiuerhsdlfknaes',
'format_id': format_id,
'preference': preference(format_id),

View File

@@ -0,0 +1,56 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import js_to_json
class DiggIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?digg\.com/video/(?P<id>[^/?#&]+)'
_TESTS = [{
# JWPlatform via provider
'url': 'http://digg.com/video/sci-fi-short-jonah-daniel-kaluuya-get-out',
'info_dict': {
'id': 'LcqvmS0b',
'ext': 'mp4',
'title': "'Get Out' Star Daniel Kaluuya Goes On 'Moby Dick'-Like Journey In Sci-Fi Short 'Jonah'",
'description': 'md5:541bb847648b6ee3d6514bc84b82efda',
'upload_date': '20180109',
'timestamp': 1515530551,
},
'params': {
'skip_download': True,
},
}, {
# Youtube via provider
'url': 'http://digg.com/video/dog-boat-seal-play',
'only_matching': True,
}, {
# vimeo as regular embed
'url': 'http://digg.com/video/dream-girl-short-film',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
info = self._parse_json(
self._search_regex(
r'(?s)video_info\s*=\s*({.+?});\n', webpage, 'video info',
default='{}'), display_id, transform_source=js_to_json,
fatal=False)
video_id = info.get('video_id')
if video_id:
provider = info.get('provider_name')
if provider == 'youtube':
return self.url_result(
video_id, ie='Youtube', video_id=video_id)
elif provider == 'jwplayer':
return self.url_result(
'jwplatform:%s' % video_id, ie='JWPlatform',
video_id=video_id)
return self.url_result(url, 'Generic')

View File

@@ -259,6 +259,7 @@ from .deezer import DeezerPlaylistIE
from .democracynow import DemocracynowIE
from .dfb import DFBIE
from .dhm import DHMIE
from .digg import DiggIE
from .dotsub import DotsubIE
from .douyutv import (
DouyuShowIE,
@@ -489,7 +490,6 @@ from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE
from .kaltura import KalturaIE
from .kamcord import KamcordIE
from .kanalplay import KanalPlayIE
from .kankan import KankanIE
from .karaoketv import KaraoketvIE
@@ -990,7 +990,6 @@ from .stitcher import StitcherIE
from .sport5 import Sport5IE
from .sportbox import SportBoxEmbedIE
from .sportdeutschland import SportDeutschlandIE
from .sportschau import SportschauIE
from .sprout import SproutIE
from .srgssr import (
SRGSSRIE,
@@ -1288,6 +1287,8 @@ from .watchbox import WatchBoxIE
from .watchindianporn import WatchIndianPornIE
from .wdr import (
WDRIE,
WDRPageIE,
WDRElefantIE,
WDRMobileIE,
)
from .webcaster import (
@@ -1298,6 +1299,10 @@ from .webofstories import (
WebOfStoriesIE,
WebOfStoriesPlaylistIE,
)
from .weibo import (
WeiboIE,
WeiboMobileIE
)
from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
@@ -1323,6 +1328,10 @@ from .xiami import (
XiamiArtistIE,
XiamiCollectionIE
)
from .ximalaya import (
XimalayaIE,
XimalayaAlbumIE
)
from .xminus import XMinusIE
from .xnxx import XNXXIE
from .xstream import XstreamIE

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
@@ -9,27 +11,34 @@ from ..utils import (
class GameStarIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?game(?P<site>pro|star)\.de/videos/.*,(?P<id>[0-9]+)\.html'
_TESTS = [{
'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
'md5': '96974ecbb7fd8d0d20fca5a00810cea7',
'md5': 'ee782f1f8050448c95c5cacd63bc851c',
'info_dict': {
'id': '76110',
'ext': 'mp4',
'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1406542020,
'timestamp': 1406542380,
'upload_date': '20140728',
'duration': 17
'duration': 17,
}
}
}, {
'url': 'http://www.gamepro.de/videos/top-10-indie-spiele-fuer-nintendo-switch-video-tolle-nindies-games-zum-download,95316.html',
'only_matching': True,
}, {
'url': 'http://www.gamestar.de/videos/top-10-indie-spiele-fuer-nintendo-switch-video-tolle-nindies-games-zum-download,95316.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mobj = re.match(self._VALID_URL, url)
site = mobj.group('site')
video_id = mobj.group('id')
url = 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id
webpage = self._download_webpage(url, video_id)
# TODO: there are multiple ld+json objects in the webpage,
# while _search_json_ld finds only the first one
@@ -37,16 +46,17 @@ class GameStarIE(InfoExtractor):
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>[^<]+VideoObject[^<]+)</script>',
webpage, 'JSON-LD', group='json_ld'), video_id)
info_dict = self._json_ld(json_ld, video_id)
info_dict['title'] = remove_end(info_dict['title'], ' - GameStar')
info_dict['title'] = remove_end(
info_dict['title'], ' - Game%s' % site.title())
view_count = json_ld.get('interactionCount')
view_count = int_or_none(json_ld.get('interactionCount'))
comment_count = int_or_none(self._html_search_regex(
r'([0-9]+) Kommentare</span>', webpage, 'comment_count',
fatal=False))
r'<span>Kommentare</span>\s*<span[^>]+class=["\']count[^>]+>\s*\(\s*([0-9]+)',
webpage, 'comment count', fatal=False))
info_dict.update({
'id': video_id,
'url': url,
'url': 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id,
'ext': 'mp4',
'view_count': view_count,
'comment_count': comment_count

View File

@@ -1,71 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
qualities,
)
class KamcordIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?kamcord\.com/v/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.kamcord.com/v/hNYRduDgWb4',
'md5': 'c3180e8a9cfac2e86e1b88cb8751b54c',
'info_dict': {
'id': 'hNYRduDgWb4',
'ext': 'mp4',
'title': 'Drinking Madness',
'uploader': 'jacksfilms',
'uploader_id': '3044562',
'view_count': int,
'like_count': int,
'comment_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video = self._parse_json(
self._search_regex(
r'window\.__props\s*=\s*({.+?});?(?:\n|\s*</script)',
webpage, 'video'),
video_id)['video']
title = video['title']
formats = self._extract_m3u8_formats(
video['play']['hls'], video_id, 'mp4', entry_protocol='m3u8_native')
self._sort_formats(formats)
uploader = video.get('user', {}).get('username')
uploader_id = video.get('user', {}).get('id')
view_count = int_or_none(video.get('viewCount'))
like_count = int_or_none(video.get('heartCount'))
comment_count = int_or_none(video.get('messageCount'))
preference_key = qualities(('small', 'medium', 'large'))
thumbnails = [{
'url': thumbnail_url,
'id': thumbnail_id,
'preference': preference_key(thumbnail_id),
} for thumbnail_id, thumbnail_url in (video.get('thumbnail') or {}).items()
if isinstance(thumbnail_id, compat_str) and isinstance(thumbnail_url, compat_str)]
return {
'id': video_id,
'title': title,
'uploader': uploader,
'uploader_id': uploader_id,
'view_count': view_count,
'like_count': like_count,
'comment_count': comment_count,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@@ -10,6 +10,7 @@ from ..utils import (
float_or_none,
int_or_none,
smuggle_url,
try_get,
unsmuggle_url,
ExtractorError,
)
@@ -220,6 +221,12 @@ class LimelightBaseIE(InfoExtractor):
'subtitles': subtitles,
}
def _extract_info_helper(self, pc, mobile, i, metadata):
return self._extract_info(
try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
metadata)
class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight'
@@ -282,10 +289,7 @@ class LimelightMediaIE(LimelightBaseIE):
'getMobilePlaylistByMediaId', 'properties',
smuggled_data.get('source_url'))
return self._extract_info(
pc['playlistItems'][0].get('streams', []),
mobile['mediaList'][0].get('mobileUrls', []) if mobile else [],
metadata)
return self._extract_info_helper(pc, mobile, 0, metadata)
class LimelightChannelIE(LimelightBaseIE):
@@ -326,10 +330,7 @@ class LimelightChannelIE(LimelightBaseIE):
'media', smuggled_data.get('source_url'))
entries = [
self._extract_info(
pc['playlistItems'][i].get('streams', []),
mobile['mediaList'][i].get('mobileUrls', []) if mobile else [],
medias['media_list'][i])
self._extract_info_helper(pc, mobile, i, medias['media_list'][i])
for i in range(len(medias['media_list']))]
return self.playlist_result(entries, channel_id, pc['title'])

View File

@@ -190,10 +190,12 @@ class NDREmbedBaseIE(InfoExtractor):
ext = determine_ext(src, None)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, f4m_id='hds'))
src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id,
f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id='hls', entry_protocol='m3u8_native'))
src, video_id, 'mp4', m3u8_id='hls',
entry_protocol='m3u8_native', fatal=False))
else:
quality = f.get('quality')
ff = {

View File

@@ -19,11 +19,11 @@ from ..utils import (
class OdnoklassnikiIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer)/(?P<id>[\d-]+)'
_VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer|live)/(?P<id>[\d-]+)'
_TESTS = [{
# metadata in JSON
'url': 'http://ok.ru/video/20079905452',
'md5': '6ba728d85d60aa2e6dd37c9e70fdc6bc',
'md5': '0b62089b479e06681abaaca9d204f152',
'info_dict': {
'id': '20079905452',
'ext': 'mp4',
@@ -35,7 +35,6 @@ class OdnoklassnikiIE(InfoExtractor):
'like_count': int,
'age_limit': 0,
},
'skip': 'Video has been blocked',
}, {
# metadataUrl
'url': 'http://ok.ru/video/63567059965189-0?fromTime=5',
@@ -99,6 +98,9 @@ class OdnoklassnikiIE(InfoExtractor):
}, {
'url': 'http://mobile.ok.ru/video/20079905452',
'only_matching': True,
}, {
'url': 'https://www.ok.ru/live/484531969818',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -184,6 +186,10 @@ class OdnoklassnikiIE(InfoExtractor):
})
return info
assert title
if provider == 'LIVE_TV_APP':
info['title'] = self._live_title(title)
quality = qualities(('4', '0', '1', '2', '3', '5'))
formats = [{
@@ -210,6 +216,20 @@ class OdnoklassnikiIE(InfoExtractor):
if fmt_type:
fmt['quality'] = quality(fmt_type)
# Live formats
m3u8_url = metadata.get('hlsMasterPlaylistUrl')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8',
m3u8_id='hls', fatal=False))
rtmp_url = metadata.get('rtmpUrl')
if rtmp_url:
formats.append({
'url': rtmp_url,
'format_id': 'rtmp',
'ext': 'flv',
})
self._sort_formats(formats)
info['formats'] = formats

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
@@ -18,7 +20,14 @@ from ..utils import (
class PandoraTVIE(InfoExtractor):
IE_NAME = 'pandora.tv'
IE_DESC = '판도라TV'
_VALID_URL = r'https?://(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?'
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?pandora\.tv/view/(?P<user_id>[^/]+)/(?P<id>\d+)| # new format
(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?| # old format
m\.pandora\.tv/?\? # mobile
)
'''
_TESTS = [{
'url': 'http://jp.channel.pandora.tv/channel/video.ptv?c1=&prgid=53294230&ch_userid=mikakim&ref=main&lot=cate_01_2',
'info_dict': {
@@ -53,14 +62,25 @@ class PandoraTVIE(InfoExtractor):
# Test metadata only
'skip_download': True,
},
}, {
'url': 'http://www.pandora.tv/view/mikakim/53294230#36797454_new',
'only_matching': True,
}, {
'url': 'http://m.pandora.tv/?c=view&ch_userid=mikakim&prgid=54600346',
'only_matching': True,
}]
def _real_extract(self, url):
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = qs.get('prgid', [None])[0]
user_id = qs.get('ch_userid', [None])[0]
if any(not f for f in (video_id, user_id,)):
raise ExtractorError('Invalid URL', expected=True)
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user_id')
video_id = mobj.group('id')
if not user_id or not video_id:
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = qs.get('prgid', [None])[0]
user_id = qs.get('ch_userid', [None])[0]
if any(not f for f in (video_id, user_id,)):
raise ExtractorError('Invalid URL', expected=True)
data = self._download_json(
'http://m.pandora.tv/?c=view&m=viewJsonApi&ch_userid=%s&prgid=%s'

View File

@@ -157,7 +157,7 @@ class SoundcloudIE(InfoExtractor):
},
]
_CLIENT_ID = 'c6CU49JDMapyrQo06UxU9xouB9ZVzqCn'
_CLIENT_ID = 'DQskPX1pntALRzMp4HSxya3Mc0AO66Ro'
_IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
@staticmethod

View File

@@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .nexx import NexxEmbedIE
from .nexx import (
NexxIE,
NexxEmbedIE,
)
from .spiegeltv import SpiegeltvIE
from ..compat import compat_urlparse
from ..utils import (
@@ -51,6 +54,10 @@ class SpiegelIE(InfoExtractor):
}, {
'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-iframe.html',
'only_matching': True,
}, {
# nexx video
'url': 'http://www.spiegel.de/video/spiegel-tv-magazin-ueber-guellekrise-in-schleswig-holstein-video-99012776.html',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -61,6 +68,14 @@ class SpiegelIE(InfoExtractor):
if SpiegeltvIE.suitable(handle.geturl()):
return self.url_result(handle.geturl(), 'Spiegeltv')
nexx_id = self._search_regex(
r'nexxOmniaId\s*:\s*(\d+)', webpage, 'nexx id', default=None)
if nexx_id:
domain_id = NexxIE._extract_domain_id(webpage) or '748'
return self.url_result(
'nexx:%s:%s' % (domain_id, nexx_id), ie=NexxIE.ie_key(),
video_id=nexx_id)
video_data = extract_attributes(self._search_regex(r'(<div[^>]+id="spVideoElements"[^>]+>)', webpage, 'video element', default=''))
title = video_data.get('data-video-title') or get_element_by_attribute('class', 'module-title', webpage)

View File

@@ -1,38 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .wdr import WDRBaseIE
from ..utils import get_element_by_attribute
class SportschauIE(WDRBaseIE):
IE_NAME = 'Sportschau'
_VALID_URL = r'https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video-?(?P<id>[^/#?]+)\.html'
_TEST = {
'url': 'http://www.sportschau.de/uefaeuro2016/videos/video-dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100.html',
'info_dict': {
'id': 'mdb-1140188',
'display_id': 'dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100',
'ext': 'mp4',
'title': 'DFB-Team geht gut gelaunt ins Spiel gegen Polen',
'description': 'Vor dem zweiten Gruppenspiel gegen Polen herrscht gute Stimmung im deutschen Team. Insbesondere Bastian Schweinsteiger strotzt vor Optimismus nach seinem Tor gegen die Ukraine.',
'upload_date': '20160615',
},
'skip': 'Geo-restricted to Germany',
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = get_element_by_attribute('class', 'headline', webpage)
description = self._html_search_meta('description', webpage, 'description')
info = self._extract_wdr_video(webpage, video_id)
info.update({
'title': title,
'description': description,
})
return info

View File

@@ -273,6 +273,8 @@ class TVPlayIE(InfoExtractor):
'ext': ext,
}
if video_url.startswith('rtmp'):
if smuggled_data.get('skip_rtmp'):
continue
m = re.search(
r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
if not m:
@@ -434,6 +436,10 @@ class ViafreeIE(InfoExtractor):
return self.url_result(
smuggle_url(
'mtg:%s' % video_id,
{'geo_countries': [
compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]]}),
{
'geo_countries': [
compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]],
# rtmp host mtgfs.fplive.net for viafree is unresolvable
'skip_rtmp': True,
}),
ie=TVPlayIE.ie_key(), video_id=video_id)

View File

@@ -85,10 +85,15 @@ class TwitchBaseIE(InfoExtractor):
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
response = self._parse_json(
e.cause.read().decode('utf-8'), None)
fail(response['message'])
fail(response.get('message') or response['errors'][0])
raise
redirect_url = urljoin(post_url, response['redirect'])
if 'Authenticated successfully' in response.get('message', ''):
return None, None
redirect_url = urljoin(
post_url,
response.get('redirect') or response['redirect_path'])
return self._download_webpage_handle(
redirect_url, None, 'Downloading login redirect page',
headers=headers)
@@ -106,6 +111,10 @@ class TwitchBaseIE(InfoExtractor):
'password': password,
})
# Successful login
if not redirect_page:
return
if re.search(r'(?i)<form[^>]+id="two-factor-submit"', redirect_page) is not None:
# TODO: Add mechanism to request an SMS or phone call
tfa_token = self._get_tfa_info('two-factor authentication token')

View File

@@ -318,9 +318,14 @@ class VKIE(VKBaseIE):
'You are trying to log in from an unusual location. You should confirm ownership at vk.com to log in with this IP.',
expected=True)
ERROR_COPYRIGHT = 'Video %s has been removed from public access due to rightholder complaint.'
ERRORS = {
r'>Видеозапись .*? была изъята из публичного доступа в связи с обращением правообладателя.<':
'Video %s has been removed from public access due to rightholder complaint.',
ERROR_COPYRIGHT,
r'>The video .*? was removed from public access by request of the copyright holder.<':
ERROR_COPYRIGHT,
r'<!>Please log in or <':
'Video %s is only available for registered users, '

View File

@@ -4,49 +4,50 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
determine_ext,
ExtractorError,
js_to_json,
strip_jsonp,
try_get,
unified_strdate,
update_url_query,
urlhandle_detect_ext,
)
class WDRBaseIE(InfoExtractor):
def _extract_wdr_video(self, webpage, display_id):
# for wdr.de the data-extension is in a tag with the class "mediaLink"
# for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
# for wdrmaus, in a tag with the class "videoButton" (previously a link
# to the page in a multiline "videoLink"-tag)
json_metadata = self._html_search_regex(
r'''(?sx)class=
(?:
(["\'])(?:mediaLink|wdrrPlayerPlayBtn|videoButton)\b.*?\1[^>]+|
(["\'])videoLink\b.*?\2[\s]*>\n[^\n]*
)data-extension=(["\'])(?P<data>(?:(?!\3).)+)\3
''',
webpage, 'media link', default=None, group='data')
class WDRIE(InfoExtractor):
_VALID_URL = r'https?://deviceids-medp\.wdr\.de/ondemand/\d+/(?P<id>\d+)\.js'
_GEO_COUNTRIES = ['DE']
_TEST = {
'url': 'http://deviceids-medp.wdr.de/ondemand/155/1557833.js',
'info_dict': {
'id': 'mdb-1557833',
'ext': 'mp4',
'title': 'Biathlon-Staffel verpasst Podest bei Olympia-Generalprobe',
'upload_date': '20180112',
},
}
if not json_metadata:
return
media_link_obj = self._parse_json(json_metadata, display_id,
transform_source=js_to_json)
jsonp_url = media_link_obj['mediaObj']['url']
def _real_extract(self, url):
video_id = self._match_id(url)
metadata = self._download_json(
jsonp_url, display_id, transform_source=strip_jsonp)
url, video_id, transform_source=strip_jsonp)
metadata_tracker_data = metadata['trackerData']
metadata_media_resource = metadata['mediaResource']
is_live = metadata.get('mediaType') == 'live'
tracker_data = metadata['trackerData']
media_resource = metadata['mediaResource']
formats = []
# check if the metadata contains a direct URL to a file
for kind, media_resource in metadata_media_resource.items():
for kind, media_resource in media_resource.items():
if kind not in ('dflt', 'alt'):
continue
@@ -57,13 +58,13 @@ class WDRBaseIE(InfoExtractor):
ext = determine_ext(medium_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
medium_url, display_id, 'mp4', 'm3u8_native',
medium_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls'))
elif ext == 'f4m':
manifest_url = update_url_query(
medium_url, {'hdcore': '3.2.0', 'plugin': 'aasp-3.2.0.77.18'})
formats.extend(self._extract_f4m_formats(
manifest_url, display_id, f4m_id='hds', fatal=False))
manifest_url, video_id, f4m_id='hds', fatal=False))
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
medium_url, 'stream', fatal=False))
@@ -73,7 +74,7 @@ class WDRBaseIE(InfoExtractor):
}
if ext == 'unknown_video':
urlh = self._request_webpage(
medium_url, display_id, note='Determining extension')
medium_url, video_id, note='Determining extension')
ext = urlhandle_detect_ext(urlh)
a_format['ext'] = ext
formats.append(a_format)
@@ -81,30 +82,30 @@ class WDRBaseIE(InfoExtractor):
self._sort_formats(formats)
subtitles = {}
caption_url = metadata_media_resource.get('captionURL')
caption_url = media_resource.get('captionURL')
if caption_url:
subtitles['de'] = [{
'url': caption_url,
'ext': 'ttml',
}]
title = metadata_tracker_data['trackerClipTitle']
title = tracker_data['trackerClipTitle']
return {
'id': metadata_tracker_data.get('trackerClipId', display_id),
'display_id': display_id,
'title': title,
'alt_title': metadata_tracker_data.get('trackerClipSubcategory'),
'id': tracker_data.get('trackerClipId', video_id),
'title': self._live_title(title) if is_live else title,
'alt_title': tracker_data.get('trackerClipSubcategory'),
'formats': formats,
'subtitles': subtitles,
'upload_date': unified_strdate(metadata_tracker_data.get('trackerClipAirTime')),
'upload_date': unified_strdate(tracker_data.get('trackerClipAirTime')),
'is_live': is_live,
}
class WDRIE(WDRBaseIE):
class WDRPageIE(InfoExtractor):
_CURRENT_MAUS_URL = r'https?://(?:www\.)wdrmaus.de/(?:[^/]+/){1,2}[^/?#]+\.php5'
_PAGE_REGEX = r'/(?:mediathek/)?[^/]+/(?P<type>[^/]+)/(?P<display_id>.+)\.html'
_VALID_URL = r'(?P<page_url>https?://(?:www\d\.)?wdr\d?\.de)' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
_PAGE_REGEX = r'/(?:mediathek/)?(?:[^/]+/)*(?P<display_id>[^/]+)\.html'
_VALID_URL = r'https?://(?:www\d?\.)?(?:wdr\d?|sportschau)\.de' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
_TESTS = [
{
@@ -124,6 +125,7 @@ class WDRIE(WDRBaseIE):
'ext': 'ttml',
}]},
},
'skip': 'HTTP Error 404: Not Found',
},
{
'url': 'http://www1.wdr.de/mediathek/audio/wdr3/wdr3-gespraech-am-samstag/audio-schriftstellerin-juli-zeh-100.html',
@@ -139,19 +141,17 @@ class WDRIE(WDRBaseIE):
'is_live': False,
'subtitles': {}
},
'skip': 'HTTP Error 404: Not Found',
},
{
'url': 'http://www1.wdr.de/mediathek/video/live/index.html',
'info_dict': {
'id': 'mdb-103364',
'id': 'mdb-1406149',
'ext': 'mp4',
'display_id': 'index',
'title': r're:^WDR Fernsehen im Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'title': r're:^WDR Fernsehen im Livestream \(nur in Deutschland erreichbar\) [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'alt_title': 'WDR Fernsehen Live',
'upload_date': None,
'description': 'md5:ae2ff888510623bf8d4b115f95a9b7c9',
'upload_date': '20150101',
'is_live': True,
'subtitles': {}
},
'params': {
'skip_download': True, # m3u8 download
@@ -159,19 +159,18 @@ class WDRIE(WDRBaseIE):
},
{
'url': 'http://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html',
'playlist_mincount': 8,
'playlist_mincount': 7,
'info_dict': {
'id': 'aktuelle-stunde/aktuelle-stunde-120',
'id': 'aktuelle-stunde-120',
},
},
{
'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5',
'info_dict': {
'id': 'mdb-1323501',
'id': 'mdb-1552552',
'ext': 'mp4',
'upload_date': 're:^[0-9]{8}$',
'title': 're:^Die Sendung mit der Maus vom [0-9.]{10}$',
'description': 'Die Seite mit der Maus -',
},
'skip': 'The id changes from week to week because of the new episode'
},
@@ -183,7 +182,6 @@ class WDRIE(WDRBaseIE):
'ext': 'mp4',
'upload_date': '20130919',
'title': 'Sachgeschichte - Achterbahn ',
'description': 'Die Seite mit der Maus -',
},
},
{
@@ -191,52 +189,114 @@ class WDRIE(WDRBaseIE):
# Live stream, MD5 unstable
'info_dict': {
'id': 'mdb-869971',
'ext': 'flv',
'title': 'COSMO Livestream',
'description': 'md5:2309992a6716c347891c045be50992e4',
'ext': 'mp4',
'title': r're:^COSMO Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'upload_date': '20160101',
},
'params': {
'skip_download': True, # m3u8 download
}
},
{
'url': 'http://www.sportschau.de/handballem2018/handball-nationalmannschaft-em-stolperstein-vorrunde-100.html',
'info_dict': {
'id': 'mdb-1556012',
'ext': 'mp4',
'title': 'DHB-Vizepräsident Bob Hanning - "Die Weltspitze ist extrem breit"',
'upload_date': '20180111',
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://www.sportschau.de/handballem2018/audio-vorschau---die-handball-em-startet-mit-grossem-favoritenfeld-100.html',
'only_matching': True,
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
url_type = mobj.group('type')
page_url = mobj.group('page_url')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
info_dict = self._extract_wdr_video(webpage, display_id)
entries = []
if not info_dict:
# Article with several videos
# for wdr.de the data-extension is in a tag with the class "mediaLink"
# for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
# for wdrmaus, in a tag with the class "videoButton" (previously a link
# to the page in a multiline "videoLink"-tag)
for mobj in re.finditer(
r'''(?sx)class=
(?:
(["\'])(?:mediaLink|wdrrPlayerPlayBtn|videoButton)\b.*?\1[^>]+|
(["\'])videoLink\b.*?\2[\s]*>\n[^\n]*
)data-extension=(["\'])(?P<data>(?:(?!\3).)+)\3
''', webpage):
media_link_obj = self._parse_json(
mobj.group('data'), display_id, transform_source=js_to_json,
fatal=False)
if not media_link_obj:
continue
jsonp_url = try_get(
media_link_obj, lambda x: x['mediaObj']['url'], compat_str)
if jsonp_url:
entries.append(self.url_result(jsonp_url, ie=WDRIE.ie_key()))
# Playlist (e.g. https://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html)
if not entries:
entries = [
self.url_result(page_url + href[0], 'WDR')
for href in re.findall(
r'<a href="(%s)"[^>]+data-extension=' % self._PAGE_REGEX,
webpage)
self.url_result(
compat_urlparse.urljoin(url, mobj.group('href')),
ie=WDRPageIE.ie_key())
for mobj in re.finditer(
r'<a[^>]+\bhref=(["\'])(?P<href>(?:(?!\1).)+)\1[^>]+\bdata-extension=',
webpage) if re.match(self._PAGE_REGEX, mobj.group('href'))
]
if entries: # Playlist page
return self.playlist_result(entries, playlist_id=display_id)
return self.playlist_result(entries, playlist_id=display_id)
raise ExtractorError('No downloadable streams found', expected=True)
is_live = url_type == 'live'
class WDRElefantIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)wdrmaus\.de/elefantenseite/#(?P<id>.+)'
_TEST = {
'url': 'http://www.wdrmaus.de/elefantenseite/#folge_ostern_2015',
'info_dict': {
'title': 'Folge Oster-Spezial 2015',
'id': 'mdb-1088195',
'ext': 'mp4',
'age_limit': None,
'upload_date': '20150406'
},
'params': {
'skip_download': True,
},
}
if is_live:
info_dict.update({
'title': self._live_title(info_dict['title']),
'upload_date': None,
})
elif 'upload_date' not in info_dict:
info_dict['upload_date'] = unified_strdate(self._html_search_meta('DC.Date', webpage, 'upload date'))
def _real_extract(self, url):
display_id = self._match_id(url)
info_dict.update({
'description': self._html_search_meta('Description', webpage),
'is_live': is_live,
})
return info_dict
# Table of Contents seems to always be at this address, so fetch it directly.
# The website fetches configurationJS.php5, which links to tableOfContentsJS.php5.
table_of_contents = self._download_json(
'https://www.wdrmaus.de/elefantenseite/data/tableOfContentsJS.php5',
display_id)
if display_id not in table_of_contents:
raise ExtractorError(
'No entry in site\'s table of contents for this URL. '
'Is the fragment part of the URL (after the #) correct?',
expected=True)
xml_metadata_path = table_of_contents[display_id]['xmlPath']
xml_metadata = self._download_xml(
'https://www.wdrmaus.de/elefantenseite/' + xml_metadata_path,
display_id)
zmdb_url_element = xml_metadata.find('./movie/zmdb_url')
if zmdb_url_element is None:
raise ExtractorError(
'%s is not a video' % display_id, expected=True)
return self.url_result(zmdb_url_element.text, ie=WDRIE.ie_key())
class WDRMobileIE(InfoExtractor):

View File

@@ -0,0 +1,140 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
import json
import random
import re
from ..compat import (
compat_parse_qs,
compat_str,
)
from ..utils import (
js_to_json,
strip_jsonp,
urlencode_postdata,
)
class WeiboIE(InfoExtractor):
_VALID_URL = r'https?://weibo\.com/[0-9]+/(?P<id>[a-zA-Z0-9]+)'
_TEST = {
'url': 'https://weibo.com/6275294458/Fp6RGfbff?type=comment',
'info_dict': {
'id': 'Fp6RGfbff',
'ext': 'mp4',
'title': 'You should have servants to massage you,... 来自Hosico_猫 - 微博',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
# to get Referer url for genvisitor
webpage, urlh = self._download_webpage_handle(url, video_id)
visitor_url = urlh.geturl()
if 'passport.weibo.com' in visitor_url:
# first visit
visitor_data = self._download_json(
'https://passport.weibo.com/visitor/genvisitor', video_id,
note='Generating first-visit data',
transform_source=strip_jsonp,
headers={'Referer': visitor_url},
data=urlencode_postdata({
'cb': 'gen_callback',
'fp': json.dumps({
'os': '2',
'browser': 'Gecko57,0,0,0',
'fonts': 'undefined',
'screenInfo': '1440*900*24',
'plugins': '',
}),
}))
tid = visitor_data['data']['tid']
cnfd = '%03d' % visitor_data['data']['confidence']
self._download_webpage(
'https://passport.weibo.com/visitor/visitor', video_id,
note='Running first-visit callback',
query={
'a': 'incarnate',
't': tid,
'w': 2,
'c': cnfd,
'cb': 'cross_domain',
'from': 'weibo',
'_rand': random.random(),
})
webpage = self._download_webpage(
url, video_id, note='Revisiting webpage')
title = self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title')
video_formats = compat_parse_qs(self._search_regex(
r'video-sources=\\\"(.+?)\"', webpage, 'video_sources'))
formats = []
supported_resolutions = (480, 720)
for res in supported_resolutions:
vid_urls = video_formats.get(compat_str(res))
if not vid_urls or not isinstance(vid_urls, list):
continue
vid_url = vid_urls[0]
formats.append({
'url': vid_url,
'height': res,
})
self._sort_formats(formats)
uploader = self._og_search_property(
'nick-name', webpage, 'uploader', default=None)
return {
'id': video_id,
'title': title,
'uploader': uploader,
'formats': formats
}
class WeiboMobileIE(InfoExtractor):
_VALID_URL = r'https?://m\.weibo\.cn/status/(?P<id>[0-9]+)(\?.+)?'
_TEST = {
'url': 'https://m.weibo.cn/status/4189191225395228?wm=3333_2001&sourcetype=weixin&featurecode=newtitle&from=singlemessage&isappinstalled=0',
'info_dict': {
'id': '4189191225395228',
'ext': 'mp4',
'title': '午睡当然是要甜甜蜜蜜的啦',
'uploader': '柴犬柴犬'
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
# to get Referer url for genvisitor
webpage = self._download_webpage(url, video_id, note='visit the page')
weibo_info = self._parse_json(self._search_regex(
r'var\s+\$render_data\s*=\s*\[({.*})\]\[0\]\s*\|\|\s*{};',
webpage, 'js_code', flags=re.DOTALL),
video_id, transform_source=js_to_json)
status_data = weibo_info.get('status', {})
page_info = status_data.get('page_info')
title = status_data['status_title']
uploader = status_data.get('user', {}).get('screen_name')
return {
'id': video_id,
'title': title,
'uploader': uploader,
'url': page_info['media_info']['stream_url']
}

View File

@@ -0,0 +1,233 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import re
from .common import InfoExtractor
class XimalayaBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['CN']
class XimalayaIE(XimalayaBaseIE):
IE_NAME = 'ximalaya'
IE_DESC = '喜马拉雅FM'
_VALID_URL = r'https?://(?:www\.|m\.)?ximalaya\.com/(?P<uid>[0-9]+)/sound/(?P<id>[0-9]+)'
_USER_URL_FORMAT = '%s://www.ximalaya.com/zhubo/%i/'
_TESTS = [
{
'url': 'http://www.ximalaya.com/61425525/sound/47740352/',
'info_dict': {
'id': '47740352',
'ext': 'm4a',
'uploader': '小彬彬爱听书',
'uploader_id': 61425525,
'uploader_url': 'http://www.ximalaya.com/zhubo/61425525/',
'title': '261.唐诗三百首.卷八.送孟浩然之广陵.李白',
'description': "contains:《送孟浩然之广陵》\n作者:李白\n故人西辞黄鹤楼,烟花三月下扬州。\n孤帆远影碧空尽,惟见长江天际流。",
'thumbnails': [
{
'name': 'cover_url',
'url': r're:^https?://.*\.jpg$',
},
{
'name': 'cover_url_142',
'url': r're:^https?://.*\.jpg$',
'width': 180,
'height': 180
}
],
'categories': ['renwen', '人文'],
'duration': 93,
'view_count': int,
'like_count': int,
}
},
{
'url': 'http://m.ximalaya.com/61425525/sound/47740352/',
'info_dict': {
'id': '47740352',
'ext': 'm4a',
'uploader': '小彬彬爱听书',
'uploader_id': 61425525,
'uploader_url': 'http://www.ximalaya.com/zhubo/61425525/',
'title': '261.唐诗三百首.卷八.送孟浩然之广陵.李白',
'description': "contains:《送孟浩然之广陵》\n作者:李白\n故人西辞黄鹤楼,烟花三月下扬州。\n孤帆远影碧空尽,惟见长江天际流。",
'thumbnails': [
{
'name': 'cover_url',
'url': r're:^https?://.*\.jpg$',
},
{
'name': 'cover_url_142',
'url': r're:^https?://.*\.jpg$',
'width': 180,
'height': 180
}
],
'categories': ['renwen', '人文'],
'duration': 93,
'view_count': int,
'like_count': int,
}
},
{
'url': 'https://www.ximalaya.com/11045267/sound/15705996/',
'info_dict': {
'id': '15705996',
'ext': 'm4a',
'uploader': '李延隆老师',
'uploader_id': 11045267,
'uploader_url': 'https://www.ximalaya.com/zhubo/11045267/',
'title': 'Lesson 1 Excuse me!',
'description': "contains:Listen to the tape then answer\xa0this question. Whose handbag is it?\n"
"听录音,然后回答问题,这是谁的手袋?",
'thumbnails': [
{
'name': 'cover_url',
'url': r're:^https?://.*\.jpg$',
},
{
'name': 'cover_url_142',
'url': r're:^https?://.*\.jpg$',
'width': 180,
'height': 180
}
],
'categories': ['train', '外语'],
'duration': 40,
'view_count': int,
'like_count': int,
}
},
]
def _real_extract(self, url):
is_m = 'm.ximalaya' in url
scheme = 'https' if url.startswith('https') else 'http'
audio_id = self._match_id(url)
webpage = self._download_webpage(url, audio_id,
note='Download sound page for %s' % audio_id,
errnote='Unable to get sound page')
audio_info_file = '%s://m.ximalaya.com/tracks/%s.json' % (scheme, audio_id)
audio_info = self._download_json(audio_info_file, audio_id,
'Downloading info json %s' % audio_info_file,
'Unable to download info file')
formats = []
for bps, k in (('24k', 'play_path_32'), ('64k', 'play_path_64')):
if audio_info.get(k):
formats.append({
'format_id': bps,
'url': audio_info[k],
})
thumbnails = []
for k in audio_info.keys():
# cover pics kyes like: cover_url', 'cover_url_142'
if k.startswith('cover_url'):
thumbnail = {'name': k, 'url': audio_info[k]}
if k == 'cover_url_142':
thumbnail['width'] = 180
thumbnail['height'] = 180
thumbnails.append(thumbnail)
audio_uploader_id = audio_info.get('uid')
if is_m:
audio_description = self._html_search_regex(r'(?s)<section\s+class=["\']content[^>]+>(.+?)</section>',
webpage, 'audio_description', fatal=False)
else:
audio_description = self._html_search_regex(r'(?s)<div\s+class=["\']rich_intro[^>]*>(.+?</article>)',
webpage, 'audio_description', fatal=False)
if not audio_description:
audio_description_file = '%s://www.ximalaya.com/sounds/%s/rich_intro' % (scheme, audio_id)
audio_description = self._download_webpage(audio_description_file, audio_id,
note='Downloading description file %s' % audio_description_file,
errnote='Unable to download descrip file',
fatal=False)
audio_description = audio_description.strip() if audio_description else None
return {
'id': audio_id,
'uploader': audio_info.get('nickname'),
'uploader_id': audio_uploader_id,
'uploader_url': self._USER_URL_FORMAT % (scheme, audio_uploader_id) if audio_uploader_id else None,
'title': audio_info['title'],
'thumbnails': thumbnails,
'description': audio_description,
'categories': list(filter(None, (audio_info.get('category_name'), audio_info.get('category_title')))),
'duration': audio_info.get('duration'),
'view_count': audio_info.get('play_count'),
'like_count': audio_info.get('favorites_count'),
'formats': formats,
}
class XimalayaAlbumIE(XimalayaBaseIE):
IE_NAME = 'ximalaya:album'
IE_DESC = '喜马拉雅FM 专辑'
_VALID_URL = r'https?://(?:www\.|m\.)?ximalaya\.com/(?P<uid>[0-9]+)/album/(?P<id>[0-9]+)'
_TEMPLATE_URL = '%s://www.ximalaya.com/%s/album/%s/'
_BASE_URL_TEMPL = '%s://www.ximalaya.com%s'
_LIST_VIDEO_RE = r'<a[^>]+?href="(?P<url>/%s/sound/(?P<id>\d+)/?)"[^>]+?title="(?P<title>[^>]+)">'
_TESTS = [{
'url': 'http://www.ximalaya.com/61425525/album/5534601/',
'info_dict': {
'title': '唐诗三百首(含赏析)',
'id': '5534601',
},
'playlist_count': 312,
}, {
'url': 'http://m.ximalaya.com/61425525/album/5534601',
'info_dict': {
'title': '唐诗三百首(含赏析)',
'id': '5534601',
},
'playlist_count': 312,
},
]
def _real_extract(self, url):
self.scheme = scheme = 'https' if url.startswith('https') else 'http'
mobj = re.match(self._VALID_URL, url)
uid, playlist_id = mobj.group('uid'), mobj.group('id')
webpage = self._download_webpage(self._TEMPLATE_URL % (scheme, uid, playlist_id), playlist_id,
note='Download album page for %s' % playlist_id,
errnote='Unable to get album info')
title = self._html_search_regex(r'detailContent_title[^>]*><h1(?:[^>]+)?>([^<]+)</h1>',
webpage, 'title', fatal=False)
return self.playlist_result(self._entries(webpage, playlist_id, uid), playlist_id, title)
def _entries(self, page, playlist_id, uid):
html = page
for page_num in itertools.count(1):
for entry in self._process_page(html, uid):
yield entry
next_url = self._search_regex(r'<a\s+href=(["\'])(?P<more>[\S]+)\1[^>]+rel=(["\'])next\3',
html, 'list_next_url', default=None, group='more')
if not next_url:
break
next_full_url = self._BASE_URL_TEMPL % (self.scheme, next_url)
html = self._download_webpage(next_full_url, playlist_id)
def _process_page(self, html, uid):
find_from = html.index('album_soundlist')
for mobj in re.finditer(self._LIST_VIDEO_RE % uid, html[find_from:]):
yield self.url_result(self._BASE_URL_TEMPL % (self.scheme, mobj.group('url')),
XimalayaIE.ie_key(),
mobj.group('id'),
mobj.group('title'))

View File

@@ -1810,7 +1810,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': video_info['conn'][0],
'player_url': player_url,
}]
elif len(video_info.get('url_encoded_fmt_stream_map', [''])[0]) >= 1 or len(video_info.get('adaptive_fmts', [''])[0]) >= 1:
elif not is_live and (len(video_info.get('url_encoded_fmt_stream_map', [''])[0]) >= 1 or len(video_info.get('adaptive_fmts', [''])[0]) >= 1):
encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts', [''])[0]
if 'rtmpe%3Dyes' in encoded_url_map:
raise ExtractorError('rtmpe downloads are not supported, see https://github.com/rg3/youtube-dl/issues/343 for more information.', expected=True)

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2018.01.07'
__version__ = '2018.01.18'