Compare commits

..

123 Commits

Author SHA1 Message Date
Philipp Hagemeister
4dd09c9add release 2015.07.07 2015-07-07 10:36:07 +02:00
Yen Chi Hsuan
267dc07e6b [gfycat] Catch errors 2015-07-07 14:22:13 +08:00
Yen Chi Hsuan
d7b4d5dd50 [gfycat] Extract id correctly (fixes #6165) 2015-07-07 14:16:56 +08:00
Sergey M․
7f220b2fac [vk] Catch ownership confirmation request 2015-07-07 00:04:19 +06:00
Sergey M․
275c0423aa [vk] Fix extraction (Closes #6153) 2015-07-07 00:02:34 +06:00
Yen Chi Hsuan
d3ee4bbc5a Merge branch 'ping-qqmusic-format-fix' 2015-07-06 17:55:45 +08:00
Yen Chi Hsuan
85a064861f [qqmusic] Use regex for thumbnails in test cases 2015-07-06 17:54:41 +08:00
Yen Chi Hsuan
d0b436bff2 Merge branch 'qqmusic-format-fix' of https://github.com/ping/youtube-dl into ping-qqmusic-format-fix 2015-07-06 17:24:44 +08:00
Yen Chi Hsuan
92b2f18072 Merge branch 'ping-qqmusic-album-fix' 2015-07-06 17:09:56 +08:00
Yen Chi Hsuan
dfc4eca21f [qqmusic:album] Playlist names are optional 2015-07-06 17:09:17 +08:00
Yen Chi Hsuan
fc7ae675e2 [qqmusic:album] Strip description 2015-07-06 17:08:32 +08:00
Yen Chi Hsuan
804ad79985 Merge branch 'qqmusic-album-fix' of https://github.com/ping/youtube-dl into ping-qqmusic-album-fix 2015-07-06 17:01:59 +08:00
Yen Chi Hsuan
da839880e9 Merge branch 'ping-qqmusic-playlist' 2015-07-06 16:20:46 +08:00
Yen Chi Hsuan
e9d33454b5 [qqmusic:playlist] Playlist names are optional 2015-07-06 16:19:49 +08:00
Yen Chi Hsuan
d80891efc4 Merge branch 'qqmusic-playlist' of https://github.com/ping/youtube-dl into ping-qqmusic-playlist 2015-07-06 16:08:30 +08:00
Yen Chi Hsuan
59a83d3e5b [spiegeltv] Skip invalid m3u8 manifests (closes #6157) 2015-07-06 08:41:14 +08:00
Yen Chi Hsuan
13af92fdc4 [common] Add 'fatal' to _extract_m3u8_formats 2015-07-06 08:39:38 +08:00
Sergey M․
0c20ee7d4b [rtlnl] Clarify current adaptive -> flash workaround rationale 2015-07-06 04:16:56 +06:00
Sergey M․
89d42c2c75 [rtlnl] Clarify test 2015-07-06 02:58:02 +06:00
Sergey M․
04611765a4 Merge branch 'corone17-patch-1' 2015-07-05 19:07:51 +06:00
Sergey M․
9dfc4fa1a1 [rtlnl] Add test with encrypted m3u8 streams for reference 2015-07-05 19:07:07 +06:00
Sergey M․
43232d5c14 [rtlnl] Improve 2015-07-05 19:01:07 +06:00
Sergey M․
f7c272d4fa Merge branch 'patch-1' of https://github.com/corone17/youtube-dl into corone17-patch-1 2015-07-05 18:07:39 +06:00
Sergey M․
ede21449c8 [crunchyroll] Fix extraction (Closes #5855, closes #5881) 2015-07-05 06:29:36 +06:00
Sergey M․
d7c9a3e976 Credit @remitamine for snagfilms (#6096) 2015-07-04 17:22:11 +06:00
Philipp Hagemeister
35eb649e9d release 2015.07.04 2015-07-04 09:24:00 +02:00
Sergey M․
e56a4c9e9b [thisamericanlife] Improve and simplify 2015-07-04 05:42:53 +06:00
Eric Wong
95506e37af [thisamericanlife] Remove unnecessary comment 2015-07-04 05:12:28 +06:00
Eric Wong
e41840c522 [thisamericanlife] get info from <meta> tags 2015-07-04 05:12:20 +06:00
Eric Wong
2a46a27e6c [thisamericanlife] Add a new extractor 2015-07-04 05:12:10 +06:00
Sergey M․
0bcdc27653 [dailymotion:cloud] Extend _VALID_URL (Closes #6145) 2015-07-03 22:47:52 +06:00
Sergey M․
ddf0f74de7 [howcast] Fix extraction and modernize 2015-07-03 22:32:56 +06:00
Yen Chi Hsuan
91b21b2334 [infoq] Fix extraction (closes #6141) 2015-07-03 11:54:36 +08:00
Sergey M․
66e568de3b [extractor/generic] Improve kaltura embeds support (Closes #6137) 2015-07-02 21:39:46 +06:00
Sergey M․
f5ca97e393 [npo] Clarify token decryption algorithm source 2015-07-02 20:20:09 +06:00
Yen Chi Hsuan
8d06a62485 [npo] Decrypting token (closes #6136) 2015-07-02 16:47:55 +08:00
Yen Chi Hsuan
93f9420993 [pbs] Add coding declaration
Python 2.x does not work without it.
2015-07-02 13:13:27 +08:00
Yen Chi Hsuan
5b61070c70 [pbs] skip_download for m3u8 test cases 2015-07-02 13:08:48 +08:00
Yen Chi Hsuan
dbe1a93526 [pbs] Fix player URL (closes #6139) 2015-07-02 13:05:43 +08:00
Sergey M․
86511ea417 [drtuber] Fix extraction 2015-07-01 21:47:56 +06:00
Sergey M.
33f1f81b8b Merge pull request #6132 from alarig/master
Add support of HTTPS for ina.fr
2015-06-30 20:53:49 +06:00
Sergey M․
9d0b581fea [youtube] Prefer meta for upload date and modernize 2015-06-30 20:52:26 +06:00
alarig
c05724cb18 Add support of HTTPS for ina.fr 2015-06-30 16:47:14 +02:00
Sergey M․
f0714c9f86 [youtube] Speed up upload date regex (#6125) 2015-06-30 01:02:48 +06:00
Sergey M․
cf386750c9 [hentaistigma] Modernize 2015-06-29 22:21:09 +06:00
Sergey M.
54f428f645 Merge pull request #6120 from nawl/master
[hentaistigma] Fix video extractor
2015-06-29 21:14:49 +05:00
Sergey M.
dc2bd20e55 Merge pull request #6098 from dstftw/use-codecs-from-dash-manifest
[youtube] Pick up codecs info from DASH manifest when not set explicitly
2015-06-29 20:58:52 +05:00
Sergey M.
c608ee491f Merge pull request #6097 from dstftw/union-itags-from-multiple-dashmpd
[youtube] Extract formats from multiple DASH manifests (Closes #6093)
2015-06-29 20:58:34 +05:00
nawl
738b926322 [hentaistigma] Fix video extractor 2015-06-28 17:24:00 -06:00
corone17
bea41c7f3f Update rtlnl.py
Better to extract 'http://manifest.us.rtl.nl' from the json, I'd say. And I think it's better to use the default json-url to make it more futureproof.
Succesfully tested with tarball.
2015-06-29 00:59:18 +02:00
Sergey M.
1bbe660dfa Merge pull request #6117 from Kagee/patch-1
NRK now supports / requires HTTPS
2015-06-29 03:15:53 +05:00
Anders Einar Hilden
c4bd188da4 NRK now supports / requires HTTPS
Add s? to regexp to support new urls. Update testcases to use HTTPS.
2015-06-29 00:11:31 +02:00
Sergey M․
5414623791 [extractor/common] Remove superfluous line 2015-06-29 00:49:19 +06:00
Sergey M․
c93d53f5e3 [youtube] Fix likes/dislike extraction 2015-06-29 00:48:06 +06:00
Sergey M․
507683780e Credit @gebn for moviefap 2015-06-28 23:08:05 +06:00
Sergey M․
e8b9ee5e08 Merge branch 'gebn-moviefap' 2015-06-28 23:05:49 +06:00
Sergey M․
d16154d163 [tnaflix] Generalize tnaflix extractors 2015-06-28 23:05:09 +06:00
Sergey M․
c342041fba [extractor/common] Use NO_DEFAULT from utils 2015-06-28 22:56:45 +06:00
Sergey M․
bf42a9906d [utils] Add default value for xpath_text 2015-06-28 22:56:07 +06:00
Sergey M․
9603e8a7d9 [YoutubeDL] Handle None width and height similarly to formats 2015-06-28 22:55:28 +06:00
Sergey M․
c7c040b825 Merge branch 'moviefap' of https://github.com/gebn/youtube-dl into gebn-moviefap 2015-06-28 18:00:49 +06:00
Yen Chi Hsuan
ac0474f89d [twitch:vod] Update _TEST
The original test case is gone
2015-06-28 13:33:09 +08:00
Yen Chi Hsuan
bb512e57dc [twitch:vod] Fix 'Source' format in m3u8 (closes #6115) 2015-06-28 13:33:09 +08:00
George Brighton
db652ea186 [moviefap] Fix flake8 warnings introduced in 1a5fd4e 2015-06-27 23:04:55 +01:00
George Brighton
5a9cc19972 [moviefap] Move flv videos to formats in the metadata 2015-06-27 23:03:06 +01:00
George Brighton
1a5fd4eebc [moviefap] Wrap long lines 2015-06-27 22:32:56 +01:00
George Brighton
8a1b49ff19 [moviefap] Explicitly sort formats to handle possible site changes 2015-06-27 22:28:17 +01:00
George Brighton
b971abe897 [moviefap] Replace call to str() with compat.compat_str() 2015-06-27 21:04:53 +01:00
George Brighton
43b925ce74 [moviefap] Replace calls to find() with util.xpath_text(). 2015-06-27 20:52:12 +01:00
George Brighton
62b742ece3 [moviefap] Remove redundant comments 2015-06-27 20:51:11 +01:00
George Brighton
d16ef949ca [moviefap] Allow non-critical fields to change without breaking extraction 2015-06-27 20:36:46 +01:00
Sergey M․
23e7cba87f [twitter:card] Add extractor (#5239) 2015-06-28 01:22:25 +06:00
George Brighton
a8e6f30d8e [moviefap] Swap and justify tests 2015-06-27 20:16:53 +01:00
George Brighton
9c49410898 [moviefap] Add categories to tests 2015-06-27 20:16:53 +01:00
George Brighton
802d74aa6b [moviefap] Swap test for an alternative non-copyrighted video 2015-06-27 20:16:53 +01:00
George Brighton
71f9e49e67 [moviefap] Fix dictionary comprehension syntax incompatible with Python 2.6 2015-06-27 20:16:53 +01:00
George Brighton
82ea1051b5 [moviefap] Add new extractor 2015-06-27 20:16:53 +01:00
Sergey M․
6c4d20cd6f [downloader/external] Fix externals downloaders specified with extension on Windows 2015-06-28 00:08:52 +06:00
Sergey M․
04c27802c0 [smotri] Add tests for password protected videos 2015-06-27 23:31:27 +06:00
Sergey M․
c3b7202f4f [smotri] Remove non relevant test 2015-06-27 23:03:26 +06:00
Sergey M․
81103ef35d [smotri] Fix password protected video extraction 2015-06-27 23:00:27 +06:00
Sergey M.
0eb5c1c62a Merge pull request #6081 from yan12125/skip_problematic_sites
[planetaplay/quickvid/vube] Skip inaccessible sites
2015-06-27 18:49:29 +05:00
Sergey M․
a9de951744 [snagfilms] More tests 2015-06-27 18:57:01 +06:00
Sergey M․
a42a1bb09d [snagfilms] Capture not available error 2015-06-27 18:54:08 +06:00
Sergey M․
9fbfc9bd4d [snagfilms:embed] Capture geolocation restriction error 2015-06-27 18:50:26 +06:00
Sergey M․
242a998bdc [snagfilms] Add support for shows 2015-06-27 18:40:01 +06:00
Sergey M․
9d1bf70234 Merge branch 'remitamine-snagfilms' 2015-06-27 18:29:16 +06:00
Sergey M․
b8c1cc1a51 [extractor/generic] Add test for snagfilms embeds 2015-06-27 18:28:10 +06:00
Sergey M․
eedd20ef96 [extractor/generic] Add support for snagfilms embeds 2015-06-27 18:26:14 +06:00
Sergey M․
7c197ad96d [snagfilms] Add routine for generic embeds extractions 2015-06-27 18:25:50 +06:00
Sergey M․
654fd03c73 [snagfilms] Improve and simplify 2015-06-27 18:20:42 +06:00
Jaime Marquínez Ferrándiz
cee16e0fa3 [newstube] style: fix alignment 2015-06-27 14:20:33 +02:00
Jaime Marquínez Ferrándiz
73c471e9ef [newstube] Fix GUID extraction (fixes #6109) 2015-06-27 14:18:01 +02:00
Sergey M․
533b99fbf9 Merge branch 'snagfilms' of https://github.com/remitamine/youtube-dl into remitamine-snagfilms 2015-06-27 16:52:51 +06:00
remitamine
f39eb98bab download all pages before start extracting info 2015-06-27 10:55:25 +01:00
Sergey M․
da77d856a1 [youtube] Add test for #6093 2015-06-27 14:55:46 +06:00
Sergey M․
b2575b38e7 [options] Clarify --youtube-skip-dash-manifest 2015-06-27 14:38:41 +06:00
Sergey M․
0a3cf9ad3d [youtube] Skip get_video_info requests when --youtube-skip-dash-manifest is specified 2015-06-27 14:31:18 +06:00
Sergey M․
00334d0de0 [options] Add missing whitespace and split lines 2015-06-27 14:26:51 +06:00
Sergey M․
226b886ca8 [vk] Fix authentication (Closes #6105) 2015-06-27 14:04:55 +06:00
Sergey M․
bc93bdb5bb [youtube] Fix reference before assignment for video_info 2015-06-27 13:19:46 +06:00
Yen Chi Hsuan
af214c3a79 [youtube] More useful messages for georestricted videos (#5716) 2015-06-27 13:15:57 +08:00
Yen Chi Hsuan
4eb10f6621 [utils] Add ISO3166Utils 2015-06-27 13:13:57 +08:00
remitamine
7d7d469025 add support for embed links 2015-06-27 00:13:14 +01:00
remitamine
fd40bdc0be remove unnecessary symbolic name for group 2015-06-26 21:56:15 +01:00
remitamine
7e0480ae0e convert tabs to 4 spaces identation 2015-06-26 21:50:27 +01:00
Sergey M․
d80265ccd6 [youtube] Simplify non-DASH formats exclusion 2015-06-27 02:48:50 +06:00
Sergey M․
1b5a1ae257 [youtube] Pick up codecs info from DASH manifest when not set explicitly 2015-06-27 00:41:26 +06:00
Sergey M․
d8d24a922a [youtube] Extract formats from multiple DASH manifests (Closes #6093)
DASH manifest pointed by dashmpd from the video webpage and one pointed by get_video_info may
be different (namely different itag set) - some itags are missing from DASH manifest pointed by
webpage's dashmpd, some - from DASH manifest pointed by get_video_info's dashmpd).
The general idea is to take a union of itags of both DASH manifests (for example video with such
'manifest behavior' see https://github.com/rg3/youtube-dl/issues/6093).
2015-06-27 00:36:23 +06:00
remitamine
03339b7b5b [snagfilms] Add new extractor 2015-06-26 18:25:43 +01:00
Sergey M․
2988835af5 [lynda] Fix non-ASCII logins/passwords on python 2 2015-06-26 19:48:23 +06:00
Sergey M․
62cca96b72 [lynda] Fix confirm login request (#6088) 2015-06-26 19:46:42 +06:00
Sergey M․
b4dea075a3 [lynda] Fix login request (Closes #6088) 2015-06-26 19:36:04 +06:00
Sergey M․
533f67d3fa [infoq] Relax _VALID_URL (Closes #6071) 2015-06-25 19:54:44 +06:00
Jaime Marquínez Ferrándiz
906e2f0eac [downloader/external] Add downloader for httpie (closes #6079) 2015-06-25 15:48:04 +02:00
Yen Chi Hsuan
b8091db6b9 [planetaplay/quickvid/vube] Skip inaccessible sites 2015-06-25 16:40:29 +08:00
Yen Chi Hsuan
381c067755 [thesixtyone] Modernize 2015-06-25 16:19:04 +08:00
Yen Chi Hsuan
2182ab5187 [thesixtyone] Fix audio_server
Some of the songs are moved to Amazon AWS
2015-06-25 16:15:13 +08:00
ping
4d58b24c15 [qqmusic] Use _check_formats instead 2015-06-18 23:09:04 +08:00
ping
0392ac98d2 [qqmusic] Fix code formatting 2015-06-18 21:13:03 +08:00
ping
5e3915cbe3 [qqmusic] Fix song extraction when certain formats are unavailable 2015-06-18 21:06:25 +08:00
ping
29b809de68 [qqmusic] Fix album extraction 2015-06-18 15:52:04 +08:00
ping
0d0d5d3717 [qqmusic] Add support for playlists 2015-06-18 13:59:37 +08:00
40 changed files with 1287 additions and 275 deletions

View File

@@ -128,3 +128,5 @@ Ping O.
Mister Hat
Peter Ding
jackyzy823
George Brighton
Remita Amine

View File

@@ -108,7 +108,7 @@ which means you can modify it, redistribute it or use it however you like.
--playlist-reverse Download playlist videos in reverse order
--xattr-set-filesize Set file xattribute ytdl.filesize with expected filesize (experimental)
--hls-prefer-native Use the native HLS downloader instead of ffmpeg (experimental)
--external-downloader COMMAND Use the specified external downloader. Currently supports aria2c,curl,wget
--external-downloader COMMAND Use the specified external downloader. Currently supports aria2c,curl,httpie,wget
--external-downloader-args ARGS Give these arguments to the external downloader
## Filesystem Options:
@@ -190,8 +190,8 @@ which means you can modify it, redistribute it or use it however you like.
--all-formats Download all available video formats
--prefer-free-formats Prefer free video formats unless a specific one is requested
-F, --list-formats List all available formats
--youtube-skip-dash-manifest Do not download the DASH manifest on YouTube videos
--merge-output-format FORMAT If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv.Ignored if no
--youtube-skip-dash-manifest Do not download the DASH manifests and related data on YouTube videos
--merge-output-format FORMAT If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv. Ignored if no
merge is required
## Subtitle Options:

View File

@@ -283,6 +283,7 @@
- **Motherless**
- **Motorsport**: motorsport.com
- **MovieClips**
- **MovieFap**
- **Moviezine**
- **movshare**: MovShare
- **MPORA**
@@ -383,6 +384,7 @@
- **Pyvideo**
- **qqmusic**
- **qqmusic:album**
- **qqmusic:playlist**
- **qqmusic:singer**
- **qqmusic:toplist**
- **QuickVid**
@@ -440,6 +442,8 @@
- **smotri:broadcast**: Smotri.com broadcasts
- **smotri:community**: Smotri.com community videos
- **smotri:user**: Smotri.com user videos
- **SnagFilms**
- **SnagFilmsEmbed**
- **Snotr**
- **Sohu**
- **soompi**
@@ -502,6 +506,7 @@
- **TheOnion**
- **ThePlatform**
- **TheSixtyOne**
- **ThisAmericanLife**
- **ThisAV**
- **THVideo**
- **THVideoPlaylist**
@@ -542,6 +547,7 @@
- **twitch:stream**
- **twitch:video**
- **twitch:vod**
- **TwitterCard**
- **Ubu**
- **udemy**
- **udemy:course**

View File

@@ -1008,7 +1008,7 @@ class YoutubeDL(object):
t.get('preference'), t.get('width'), t.get('height'),
t.get('id'), t.get('url')))
for i, t in enumerate(thumbnails):
if 'width' in t and 'height' in t:
if t.get('width') and t.get('height'):
t['resolution'] = '%dx%d' % (t['width'], t['height'])
if t.get('id') is None:
t['id'] = '%d' % i

View File

@@ -109,6 +109,14 @@ class Aria2cFD(ExternalFD):
cmd += ['--', info_dict['url']]
return cmd
class HttpieFD(ExternalFD):
def _make_cmd(self, tmpfilename, info_dict):
cmd = ['http', '--download', '--output', tmpfilename, info_dict['url']]
for key, val in info_dict['http_headers'].items():
cmd += ['%s:%s' % (key, val)]
return cmd
_BY_NAME = dict(
(klass.get_basename(), klass)
for name, klass in globals().items()
@@ -123,5 +131,6 @@ def list_external_downloaders():
def get_external_downloader(external_downloader):
""" Given the name of the executable, see whether we support the given
downloader . """
bn = os.path.basename(external_downloader)
# Drop .exe extension on Windows
bn = os.path.splitext(os.path.basename(external_downloader))[0]
return _BY_NAME[bn]

View File

@@ -144,7 +144,6 @@ from .ellentv import (
)
from .elpais import ElPaisIE
from .embedly import EmbedlyIE
from .empflix import EMPFlixIE
from .engadget import EngadgetIE
from .eporner import EpornerIE
from .eroprofile import EroProfileIE
@@ -433,6 +432,7 @@ from .qqmusic import (
QQMusicSingerIE,
QQMusicAlbumIE,
QQMusicToplistIE,
QQMusicPlaylistIE,
)
from .quickvid import QuickVidIE
from .r7 import R7IE
@@ -493,6 +493,10 @@ from .smotri import (
SmotriUserIE,
SmotriBroadcastIE,
)
from .snagfilms import (
SnagFilmsIE,
SnagFilmsEmbedIE,
)
from .snotr import SnotrIE
from .sohu import SohuIE
from .soompi import (
@@ -566,6 +570,7 @@ from .tf1 import TF1IE
from .theonion import TheOnionIE
from .theplatform import ThePlatformIE
from .thesixtyone import TheSixtyOneIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .tinypic import TinyPicIE
from .tlc import TlcIE, TlcDeIE
@@ -573,7 +578,11 @@ from .tmz import (
TMZIE,
TMZArticleIE,
)
from .tnaflix import TNAFlixIE
from .tnaflix import (
TNAFlixIE,
EMPFlixIE,
MovieFapIE,
)
from .thvideo import (
THVideoIE,
THVideoPlaylistIE
@@ -617,6 +626,7 @@ from .twitch import (
TwitchBookmarksIE,
TwitchStreamIE,
)
from .twitter import TwitterCardIE
from .ubu import UbuIE
from .udemy import (
UdemyIE,

View File

@@ -22,6 +22,7 @@ from ..compat import (
compat_str,
)
from ..utils import (
NO_DEFAULT,
age_restricted,
bug_reports_message,
clean_html,
@@ -33,7 +34,6 @@ from ..utils import (
sanitize_filename,
unescapeHTML,
)
_NO_DEFAULT = object()
class InfoExtractor(object):
@@ -523,7 +523,7 @@ class InfoExtractor(object):
video_info['description'] = playlist_description
return video_info
def _search_regex(self, pattern, string, name, default=_NO_DEFAULT, fatal=True, flags=0, group=None):
def _search_regex(self, pattern, string, name, default=NO_DEFAULT, fatal=True, flags=0, group=None):
"""
Perform a regex search on the given string, using a single or a list of
patterns returning the first matching group.
@@ -549,7 +549,7 @@ class InfoExtractor(object):
return next(g for g in mobj.groups() if g is not None)
else:
return mobj.group(group)
elif default is not _NO_DEFAULT:
elif default is not NO_DEFAULT:
return default
elif fatal:
raise RegexNotFoundError('Unable to extract %s' % _name)
@@ -557,7 +557,7 @@ class InfoExtractor(object):
self._downloader.report_warning('unable to extract %s' % _name + bug_reports_message())
return None
def _html_search_regex(self, pattern, string, name, default=_NO_DEFAULT, fatal=True, flags=0, group=None):
def _html_search_regex(self, pattern, string, name, default=NO_DEFAULT, fatal=True, flags=0, group=None):
"""
Like _search_regex, but strips HTML tags and unescapes entities.
"""
@@ -846,7 +846,8 @@ class InfoExtractor(object):
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None):
m3u8_id=None, note=None, errnote=None,
fatal=True):
formats = [{
'format_id': '-'.join(filter(None, [m3u8_id, 'meta'])),
@@ -866,7 +867,10 @@ class InfoExtractor(object):
m3u8_doc = self._download_webpage(
m3u8_url, video_id,
note=note or 'Downloading m3u8 information',
errnote=errnote or 'Failed to download m3u8 information')
errnote=errnote or 'Failed to download m3u8 information',
fatal=fatal)
if m3u8_doc is False:
return m3u8_doc
last_info = None
last_media = None
kv_rex = re.compile(

View File

@@ -27,7 +27,7 @@ from ..aes import (
class CrunchyrollIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:[^/]*/[^/?&]*?|media/\?id=)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_NETRC_MACHINE = 'crunchyroll'
_TESTS = [{
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
@@ -45,6 +45,22 @@ class CrunchyrollIE(InfoExtractor):
# rtmp
'skip_download': True,
},
}, {
'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
'info_dict': {
'id': '589804',
'ext': 'flv',
'title': 'Culture Japan Episode 1 Rebuilding Japan after the 3.11',
'description': 'md5:fe2743efedb49d279552926d0bd0cd9e',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Danny Choo Network',
'upload_date': '20120213',
},
'params': {
# rtmp
'skip_download': True,
},
}, {
'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
'only_matching': True,
@@ -251,16 +267,17 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
streamdata_req = compat_urllib_request.Request('http://www.crunchyroll.com/xml/')
# urlencode doesn't work!
streamdata_req.data = 'req=RpcApiVideoEncode%5FGetStreamInfo&video%5Fencode%5Fquality=' + stream_quality + '&media%5Fid=' + stream_id + '&video%5Fformat=' + stream_format
streamdata_req = compat_urllib_request.Request(
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
% (stream_id, stream_format, stream_quality),
compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8'))
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
streamdata_req.add_header('Content-Length', str(len(streamdata_req.data)))
streamdata = self._download_xml(
streamdata_req, video_id,
note='Downloading media info for %s' % video_format)
video_url = streamdata.find('./host').text
video_play_path = streamdata.find('./file').text
stream_info = streamdata.find('./{default}preload/stream_info')
video_url = stream_info.find('./host').text
video_play_path = stream_info.find('./file').text
formats.append({
'url': video_url,
'play_path': video_play_path,

View File

@@ -254,22 +254,30 @@ class DailymotionUserIE(DailymotionPlaylistIE):
class DailymotionCloudIE(DailymotionBaseInfoExtractor):
_VALID_URL = r'http://api\.dmcloud\.net/embed/[^/]+/(?P<id>[^/?]+)'
_VALID_URL_PREFIX = r'http://api\.dmcloud\.net/(?:player/)?embed/'
_VALID_URL = r'%s[^/]+/(?P<id>[^/?]+)' % _VALID_URL_PREFIX
_VALID_EMBED_URL = r'%s[^/]+/[^\'"]+' % _VALID_URL_PREFIX
_TEST = {
_TESTS = [{
# From http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html
# Tested at FranceTvInfo_2
'url': 'http://api.dmcloud.net/embed/4e7343f894a6f677b10006b4/556e03339473995ee145930c?auth=1464865870-0-jyhsm84b-ead4c701fb750cf9367bf4447167a3db&autoplay=1',
'only_matching': True,
}
}, {
# http://www.francetvinfo.fr/societe/larguez-les-amarres-le-cobaturage-se-developpe_980101.html
'url': 'http://api.dmcloud.net/player/embed/4e7343f894a6f677b10006b4/559545469473996d31429f06?auth=1467430263-0-90tglw2l-a3a4b64ed41efe48d7fccad85b8b8fda&autoplay=1',
'only_matching': True,
}]
@classmethod
def _extract_dmcloud_url(self, webpage):
mobj = re.search(r'<iframe[^>]+src=[\'"](http://api\.dmcloud\.net/embed/[^/]+/[^\'"]+)[\'"]', webpage)
mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % self._VALID_EMBED_URL, webpage)
if mobj:
return mobj.group(1)
mobj = re.search(r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](http://api\.dmcloud\.net/embed/[^/]+/[^\'"]+)[\'"]', webpage)
mobj = re.search(
r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % self._VALID_EMBED_URL,
webpage)
if mobj:
return mobj.group(1)

View File

@@ -36,25 +36,24 @@ class DrTuberIE(InfoExtractor):
r'<source src="([^"]+)"', webpage, 'video URL')
title = self._html_search_regex(
[r'class="hd_title" style="[^"]+">([^<]+)</h1>', r'<title>([^<]+) - \d+'],
[r'<p[^>]+class="title_substrate">([^<]+)</p>', r'<title>([^<]+) - \d+'],
webpage, 'title')
thumbnail = self._html_search_regex(
r'poster="([^"]+)"',
webpage, 'thumbnail', fatal=False)
like_count = str_to_int(self._html_search_regex(
r'<span id="rate_likes">\s*<img[^>]+>\s*<span>([\d,\.]+)</span>',
webpage, 'like count', fatal=False))
dislike_count = str_to_int(self._html_search_regex(
r'<span id="rate_dislikes">\s*<img[^>]+>\s*<span>([\d,\.]+)</span>',
webpage, 'like count', fatal=False))
comment_count = str_to_int(self._html_search_regex(
r'<span class="comments_count">([\d,\.]+)</span>',
webpage, 'comment count', fatal=False))
def extract_count(id_, name):
return str_to_int(self._html_search_regex(
r'<span[^>]+(?:class|id)="%s"[^>]*>([\d,\.]+)</span>' % id_,
webpage, '%s count' % name, fatal=False))
like_count = extract_count('rate_likes', 'like')
dislike_count = extract_count('rate_dislikes', 'dislike')
comment_count = extract_count('comments_count', 'comment')
cats_str = self._search_regex(
r'<span>Categories:</span><div>(.+?)</div>', webpage, 'categories', fatal=False)
r'<div[^>]+class="categories_list">(.+?)</div>', webpage, 'categories', fatal=False)
categories = [] if not cats_str else re.findall(r'<a title="([^"]+)"', cats_str)
return {

View File

@@ -1,31 +0,0 @@
from __future__ import unicode_literals
from .tnaflix import TNAFlixIE
class EMPFlixIE(TNAFlixIE):
_VALID_URL = r'https?://(?:www\.)?empflix\.com/videos/(?P<display_id>.+?)-(?P<id>[0-9]+)\.html'
_TITLE_REGEX = r'name="title" value="(?P<title>[^"]*)"'
_DESCRIPTION_REGEX = r'name="description" value="([^"]*)"'
_CONFIG_REGEX = r'flashvars\.config\s*=\s*escape\("([^"]+)"'
_TESTS = [
{
'url': 'http://www.empflix.com/videos/Amateur-Finger-Fuck-33051.html',
'md5': 'b1bc15b6412d33902d6e5952035fcabc',
'info_dict': {
'id': '33051',
'display_id': 'Amateur-Finger-Fuck',
'ext': 'mp4',
'title': 'Amateur Finger Fuck',
'description': 'Amateur solo finger fucking.',
'thumbnail': 're:https?://.*\.jpg$',
'age_limit': 18,
}
},
{
'url': 'http://www.empflix.com/videos/[AROMA][ARMD-718]-Aoi-Yoshino-Sawa-25826.html',
'only_matching': True,
}
]

View File

@@ -47,6 +47,7 @@ from .xhamster import XHamsterEmbedIE
from .vimeo import VimeoIE
from .dailymotion import DailymotionCloudIE
from .onionstudios import OnionStudiosIE
from .snagfilms import SnagFilmsEmbedIE
class GenericIE(InfoExtractor):
@@ -668,6 +669,18 @@ class GenericIE(InfoExtractor):
'title': 'John Carlson Postgame 2/25/15',
},
},
# Kaltura embed (different embed code)
{
'url': 'http://www.premierchristianradio.com/Shows/Saturday/Unbelievable/Conference-Videos/Os-Guinness-Is-It-Fools-Talk-Unbelievable-Conference-2014',
'info_dict': {
'id': '1_a52wc67y',
'ext': 'flv',
'upload_date': '20150127',
'uploader_id': 'PremierMedia',
'timestamp': int,
'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014',
},
},
# Eagle.Platform embed (generic URL)
{
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
@@ -849,6 +862,15 @@ class GenericIE(InfoExtractor):
'uploader_id': 'clickhole',
}
},
# SnagFilms embed
{
'url': 'http://whilewewatch.blogspot.ru/2012/06/whilewewatch-whilewewatch-gripping.html',
'info_dict': {
'id': '74849a00-85a9-11e1-9660-123139220831',
'ext': 'mp4',
'title': '#whilewewatch',
}
},
# AdobeTVVideo embed
{
'url': 'https://helpx.adobe.com/acrobat/how-to/new-experience-acrobat-dc.html?set=acrobat--get-started--essential-beginners',
@@ -1482,8 +1504,8 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'Zapiks')
# Look for Kaltura embeds
mobj = re.search(
r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_id'\s*:\s*'(?P<id>[^']+)',", webpage)
mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_id'\s*:\s*'(?P<id>[^']+)',", webpage) or
re.search(r'(?s)(["\'])(?:https?:)?//cdnapisec\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?\1.*?entry_id\s*:\s*(["\'])(?P<id>[^\2]+?)\2', webpage))
if mobj is not None:
return self.url_result('kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(), 'Kaltura')
@@ -1550,6 +1572,11 @@ class GenericIE(InfoExtractor):
if onionstudios_url:
return self.url_result(onionstudios_url)
# Look for SnagFilms embeds
snagfilms_url = SnagFilmsEmbedIE._extract_url(webpage)
if snagfilms_url:
return self.url_result(snagfilms_url)
# Look for AdobeTVVideo embeds
mobj = re.search(
r'<iframe[^>]+src=[\'"]((?:https?:)?//video\.tv\.adobe\.com/v/\d+[^"]+)[\'"]',

View File

@@ -6,12 +6,13 @@ from ..utils import (
int_or_none,
float_or_none,
qualities,
ExtractorError,
)
class GfycatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?P<id>[^/?#]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/)?(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': {
'id': 'DeadlyDecisiveGermanpinscher',
@@ -27,14 +28,33 @@ class GfycatIE(InfoExtractor):
'categories': list,
'age_limit': 0,
}
}
}, {
'url': 'http://gfycat.com/ifr/JauntyTimelyAmazontreeboa',
'info_dict': {
'id': 'JauntyTimelyAmazontreeboa',
'ext': 'mp4',
'title': 'JauntyTimelyAmazontreeboa',
'timestamp': 1411720126,
'upload_date': '20140926',
'uploader': 'anonymous',
'duration': 3.52,
'view_count': int,
'like_count': int,
'dislike_count': int,
'categories': list,
'age_limit': 0,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
gfy = self._download_json(
'http://gfycat.com/cajax/get/%s' % video_id,
video_id, 'Downloading video info')['gfyItem']
video_id, 'Downloading video info')
if 'error' in gfy:
raise ExtractorError('Gfycat said: ' + gfy['error'], expected=True)
gfy = gfy['gfyItem']
title = gfy.get('title') or gfy['gfyName']
description = gfy.get('description')

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -19,20 +17,19 @@ class HentaiStigmaIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<h2 class="posttitle"><a[^>]*>([^<]+)</a>',
r'<h2[^>]+class="posttitle"[^>]*><a[^>]*>([^<]+)</a>',
webpage, 'title')
wrap_url = self._html_search_regex(
r'<iframe src="([^"]+mp4)"', webpage, 'wrapper url')
r'<iframe[^>]+src="([^"]+mp4)"', webpage, 'wrapper url')
wrap_webpage = self._download_webpage(wrap_url, video_id)
video_url = self._html_search_regex(
r'clip:\s*{\s*url: "([^"]*)"', wrap_webpage, 'video url')
r'file\s*:\s*"([^"]+)"', wrap_webpage, 'video url')
return {
'id': video_id,

View File

@@ -1,8 +1,7 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_iso8601
class HowcastIE(InfoExtractor):
@@ -13,29 +12,31 @@ class HowcastIE(InfoExtractor):
'info_dict': {
'id': '390161',
'ext': 'mp4',
'description': 'The square knot, also known as the reef knot, is one of the oldest, most basic knots to tie, and can be used in many different ways. Here\'s the proper way to tie a square knot.',
'title': 'How to Tie a Square Knot Properly',
}
'description': 'md5:dbe792e5f6f1489027027bf2eba188a3',
'timestamp': 1276081287,
'upload_date': '20100609',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
video_url = self._search_regex(r'\'?file\'?: "(http://mobile-media\.howcast\.com/[0-9]+\.mp4)',
webpage, 'video URL')
video_description = self._html_search_regex(r'<meta content=(?:"([^"]+)"|\'([^\']+)\') name=\'description\'',
webpage, 'description', fatal=False)
embed_code = self._search_regex(
r'<iframe[^>]+src="[^"]+\bembed_code=([^\b]+)\b',
webpage, 'ooyala embed code')
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % embed_code,
'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage),
'description': video_description,
'thumbnail': self._og_search_thumbnail(webpage),
'timestamp': parse_iso8601(self._html_search_meta(
'article:published_time', webpage, 'timestamp')),
}

View File

@@ -7,7 +7,7 @@ from .common import InfoExtractor
class InaIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?ina\.fr/video/(?P<id>I?[A-Z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?ina\.fr/video/(?P<id>I?[A-Z0-9]+)'
_TEST = {
'url': 'http://www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
'md5': 'a667021bf2b41f8dc6049479d9bb38a3',

View File

@@ -5,13 +5,14 @@ import base64
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urlparse,
)
class InfoQIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?infoq\.com/[^/]+/(?P<id>[^/]+)$'
_VALID_URL = r'https?://(?:www\.)?infoq\.com/(?:[^/]+/)+(?P<id>[^/]+)'
_TEST = {
_TESTS = [{
'url': 'http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things',
'md5': 'b5ca0e0a8c1fed93b0e65e48e462f9a2',
'info_dict': {
@@ -20,7 +21,10 @@ class InfoQIE(InfoExtractor):
'description': 'Mike Pirnat presents some tips and tricks, standard libraries and third party packages that make programming in Python a richer experience.',
'title': 'A Few of My Favorite [Python] Things',
},
}
}, {
'url': 'http://www.infoq.com/fr/presentations/changez-avis-sur-javascript',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -42,7 +46,7 @@ class InfoQIE(InfoExtractor):
video_id, extension = video_filename.split('.')
http_base = self._search_regex(
r'EXPRESSINSTALL_SWF\s*=\s*"(https?://[^/"]+/)', webpage,
r'EXPRESSINSTALL_SWF\s*=\s*[^"]*"((?:https?:)?//[^/"]+/)', webpage,
'HTTP base URL')
formats = [{
@@ -52,7 +56,7 @@ class InfoQIE(InfoExtractor):
'play_path': playpath,
}, {
'format_id': 'http',
'url': http_base + real_id,
'url': compat_urlparse.urljoin(url, http_base) + real_id,
}]
self._sort_formats(formats)

View File

@@ -30,13 +30,13 @@ class LyndaBaseIE(InfoExtractor):
return
login_form = {
'username': username,
'password': password,
'username': username.encode('utf-8'),
'password': password.encode('utf-8'),
'remember': 'false',
'stayPut': 'false'
}
request = compat_urllib_request.Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
login_page = self._download_webpage(
request, None, 'Logging in as %s' % username)
@@ -65,7 +65,7 @@ class LyndaBaseIE(InfoExtractor):
'stayPut': 'false',
}
request = compat_urllib_request.Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(confirm_form))
self._LOGIN_URL, compat_urllib_parse.urlencode(confirm_form).encode('utf-8'))
login_page = self._download_webpage(
request, None,
'Confirming log in and log out from another device')

View File

@@ -31,7 +31,7 @@ class NewstubeIE(InfoExtractor):
page = self._download_webpage(url, video_id, 'Downloading page')
video_guid = self._html_search_regex(
r'<meta property="og:video" content="https?://(?:www\.)?newstube\.ru/freshplayer\.swf\?guid=(?P<guid>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
r'<meta property="og:video:url" content="https?://(?:www\.)?newstube\.ru/freshplayer\.swf\?guid=(?P<guid>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
page, 'video GUID')
player = self._download_xml(

View File

@@ -16,8 +16,24 @@ class NPOBaseIE(InfoExtractor):
token_page = self._download_webpage(
'http://ida.omroep.nl/npoplayer/i.js',
video_id, note='Downloading token')
return self._search_regex(
token = self._search_regex(
r'npoplayer\.token = "(.+?)"', token_page, 'token')
# Decryption algorithm extracted from http://npoplayer.omroep.nl/csjs/npoplayer-min.js
token_l = list(token)
first = second = None
for i in range(5, len(token_l) - 4):
if token_l[i].isdigit():
if first is None:
first = i
elif second is None:
second = i
if first is None or second is None:
first = 12
second = 13
token_l[first], token_l[second] = token_l[second], token_l[first]
return ''.join(token_l)
class NPOIE(NPOBaseIE):
@@ -92,7 +108,7 @@ class NPOIE(NPOBaseIE):
def _get_info(self, video_id):
metadata = self._download_json(
'http://e.omroep.nl/metadata/aflevering/%s' % video_id,
'http://e.omroep.nl/metadata/%s' % video_id,
video_id,
# We have to remove the javascript callback
transform_source=strip_jsonp,

View File

@@ -13,7 +13,7 @@ from ..utils import (
class NRKIE(InfoExtractor):
_VALID_URL = r'(?:nrk:|http://(?:www\.)?nrk\.no/video/PS\*)(?P<id>\d+)'
_VALID_URL = r'(?:nrk:|https?://(?:www\.)?nrk\.no/video/PS\*)(?P<id>\d+)'
_TESTS = [
{
@@ -76,7 +76,7 @@ class NRKIE(InfoExtractor):
class NRKPlaylistIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?nrk\.no/(?!video)(?:[^/]+/)+(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video)(?:[^/]+/)+(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.nrk.no/troms/gjenopplev-den-historiske-solformorkelsen-1.12270763',
@@ -116,11 +116,11 @@ class NRKPlaylistIE(InfoExtractor):
class NRKTVIE(InfoExtractor):
_VALID_URL = r'(?P<baseurl>http://tv\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
_VALID_URL = r'(?P<baseurl>https?://tv\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
_TESTS = [
{
'url': 'http://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': 'adf2c5454fa2bf032f47a9f8fb351342',
'info_dict': {
'id': 'MUHH48000314',
@@ -132,7 +132,7 @@ class NRKTVIE(InfoExtractor):
},
},
{
'url': 'http://tv.nrk.no/program/mdfp15000514',
'url': 'https://tv.nrk.no/program/mdfp15000514',
'md5': '383650ece2b25ecec996ad7b5bb2a384',
'info_dict': {
'id': 'mdfp15000514',
@@ -145,7 +145,7 @@ class NRKTVIE(InfoExtractor):
},
{
# single playlist video
'url': 'http://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
@@ -157,7 +157,7 @@ class NRKTVIE(InfoExtractor):
'skip': 'Only works from Norway',
},
{
'url': 'http://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
'playlist': [
{
'md5': '9480285eff92d64f06e02a5367970a7a',

View File

@@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@@ -35,6 +36,9 @@ class PBSIE(InfoExtractor):
'description': 'md5:ba0c207295339c8d6eced00b7c363c6a',
'duration': 3190,
},
'params': {
'skip_download': True, # requires ffmpeg
},
},
{
'url': 'http://www.pbs.org/wgbh/pages/frontline/losing-iraq/',
@@ -46,6 +50,9 @@ class PBSIE(InfoExtractor):
'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
'duration': 5050,
},
'params': {
'skip_download': True, # requires ffmpeg
}
},
{
'url': 'http://www.pbs.org/newshour/bb/education-jan-june12-cyberschools_02-23/',
@@ -68,7 +75,10 @@ class PBSIE(InfoExtractor):
'title': 'Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
'duration': 6559,
'thumbnail': 're:^https?://.*\.jpg$',
}
},
'params': {
'skip_download': True, # requires ffmpeg
},
},
{
'url': 'http://www.pbs.org/wgbh/nova/earth/killer-typhoon.html',
@@ -82,7 +92,10 @@ class PBSIE(InfoExtractor):
'duration': 3172,
'thumbnail': 're:^https?://.*\.jpg$',
'upload_date': '20140122',
}
},
'params': {
'skip_download': True, # requires ffmpeg
},
},
{
'url': 'http://www.pbs.org/wgbh/pages/frontline/united-states-of-secrets/',
@@ -90,6 +103,21 @@ class PBSIE(InfoExtractor):
'id': 'united-states-of-secrets',
},
'playlist_count': 2,
},
{
'url': 'http://www.pbs.org/wgbh/americanexperience/films/death/player/',
'info_dict': {
'id': '2280706814',
'display_id': 'player',
'ext': 'mp4',
'title': 'Death and the Civil War',
'description': 'American Experience, TVs most-watched history series, brings to life the compelling stories from our past that inform our understanding of the world today.',
'duration': 6705,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # requires ffmpeg
},
}
]
@@ -123,7 +151,7 @@ class PBSIE(InfoExtractor):
return media_id, presumptive_id, upload_date
url = self._search_regex(
r'<iframe\s+(?:class|id)=["\']partnerPlayer["\'].*?\s+src=["\'](.*?)["\']>',
r'<iframe\s+[^>]*\s+src=["\']([^\'"]+partnerplayer[^\'"]+)["\']',
webpage, 'player URL')
mobj = re.match(self._VALID_URL, url)

View File

@@ -18,7 +18,8 @@ class PlanetaPlayIE(InfoExtractor):
'id': '3586',
'ext': 'flv',
'title': 'md5:e829428ee28b1deed00de90de49d1da1',
}
},
'skip': 'Not accessible from Travis CI server',
}
_SONG_FORMATS = {

View File

@@ -9,6 +9,7 @@ from .common import InfoExtractor
from ..utils import (
strip_jsonp,
unescapeHTML,
clean_html,
)
from ..compat import compat_urllib_request
@@ -26,6 +27,20 @@ class QQMusicIE(InfoExtractor):
'upload_date': '20141227',
'creator': '林俊杰',
'description': 'md5:d327722d0361576fde558f1ac68a7065',
'thumbnail': 're:^https?://.*\.jpg$',
}
}, {
'note': 'There is no mp3-320 version of this song.',
'url': 'http://y.qq.com/#type=song&mid=004MsGEo3DdNxV',
'md5': 'fa3926f0c585cda0af8fa4f796482e3e',
'info_dict': {
'id': '004MsGEo3DdNxV',
'ext': 'mp3',
'title': '如果',
'upload_date': '20050626',
'creator': '李季美',
'description': 'md5:46857d5ed62bc4ba84607a805dccf437',
'thumbnail': 're:^https?://.*\.jpg$',
}
}]
@@ -68,6 +83,14 @@ class QQMusicIE(InfoExtractor):
if lrc_content:
lrc_content = lrc_content.replace('\\n', '\n')
thumbnail_url = None
albummid = self._search_regex(
[r'albummid:\'([0-9a-zA-Z]+)\'', r'"albummid":"([0-9a-zA-Z]+)"'],
detail_info_page, 'album mid', default=None)
if albummid:
thumbnail_url = "http://i.gtimg.cn/music/photo/mid_album_500/%s/%s/%s.jpg" \
% (albummid[-2:-1], albummid[-1], albummid)
guid = self.m_r_get_ruin()
vkey = self._download_json(
@@ -85,6 +108,7 @@ class QQMusicIE(InfoExtractor):
'preference': details['preference'],
'abr': details.get('abr'),
})
self._check_formats(formats, mid)
self._sort_formats(formats)
return {
@@ -94,6 +118,7 @@ class QQMusicIE(InfoExtractor):
'upload_date': publish_time,
'creator': singer,
'description': lrc_content,
'thumbnail': thumbnail_url,
}
@@ -163,31 +188,40 @@ class QQMusicAlbumIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:album'
_VALID_URL = r'http://y.qq.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
_TEST = {
'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1&play=0',
_TESTS = [{
'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1',
'info_dict': {
'id': '000gXCTb2AhRR1',
'title': '我们都是这样长大的',
'description': 'md5:d216c55a2d4b3537fe4415b8767d74d6',
'description': 'md5:179c5dce203a5931970d306aa9607ea6',
},
'playlist_count': 4,
}
}, {
'url': 'http://y.qq.com/#type=album&mid=002Y5a3b3AlCu3',
'info_dict': {
'id': '002Y5a3b3AlCu3',
'title': '그리고...',
'description': 'md5:a48823755615508a95080e81b51ba729',
},
'playlist_count': 8,
}]
def _real_extract(self, url):
mid = self._match_id(url)
album_page = self._download_webpage(
self.qq_static_url('album', mid), mid, 'Download album page')
album = self._download_json(
'http://i.y.qq.com/v8/fcg-bin/fcg_v8_album_info_cp.fcg?albummid=%s&format=json' % mid,
mid, 'Download album page')['data']
entries = self.get_entries_from_page(album_page)
album_name = self._html_search_regex(
r"albumname\s*:\s*'([^']+)',", album_page, 'album name',
default=None)
album_detail = self._html_search_regex(
r'<div class="album_detail close_detail">\s*<p>((?:[^<>]+(?:<br />)?)+)</p>',
album_page, 'album details', default=None)
entries = [
self.url_result(
'http://y.qq.com/#type=song&mid=' + song['songmid'], 'QQMusic', song['songmid']
) for song in album['list']
]
album_name = album.get('name')
album_detail = album.get('desc')
if album_detail is not None:
album_detail = album_detail.strip()
return self.playlist_result(entries, mid, album_name, album_detail)
@@ -243,3 +277,36 @@ class QQMusicToplistIE(QQPlaylistBaseIE):
list_name = topinfo.get('ListName')
list_description = topinfo.get('info')
return self.playlist_result(entries, list_id, list_name, list_description)
class QQMusicPlaylistIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:playlist'
_VALID_URL = r'http://y\.qq\.com/#type=taoge&id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://y.qq.com/#type=taoge&id=3462654915',
'info_dict': {
'id': '3462654915',
'title': '韩国5月新歌精选下旬',
'description': 'md5:d2c9d758a96b9888cf4fe82f603121d4',
},
'playlist_count': 40,
}
def _real_extract(self, url):
list_id = self._match_id(url)
list_json = self._download_json(
'http://i.y.qq.com/qzone-music/fcg-bin/fcg_ucc_getcdinfo_byids_cp.fcg?type=1&json=1&utf8=1&onlysong=0&disstid=%s'
% list_id, list_id, 'Download list page',
transform_source=strip_jsonp)['cdlist'][0]
entries = [
self.url_result(
'http://y.qq.com/#type=song&mid=' + song['songmid'], 'QQMusic', song['songmid']
) for song in list_json['songlist']
]
list_name = list_json.get('dissname')
list_description = clean_html(unescapeHTML(list_json.get('desc')))
return self.playlist_result(entries, list_id, list_name, list_description)

View File

@@ -24,6 +24,7 @@ class QuickVidIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.(?:png|jpg|gif)$',
'view_count': int,
},
'skip': 'Not accessible from Travis CI server',
}
def _real_extract(self, url):

View File

@@ -43,6 +43,10 @@ class RtlNlIE(InfoExtractor):
'upload_date': '20150215',
'description': 'Er zijn nieuwe beelden vrijgegeven die vlak na de aanslag in Kopenhagen zijn gemaakt. Op de video is goed te zien hoe omstanders zich bekommeren om één van de slachtoffers, terwijl de eerste agenten ter plaatse komen.',
}
}, {
# encrypted m3u8 streams, georestricted
'url': 'http://www.rtlxl.nl/#!/afl-2-257632/52a74543-c504-4cde-8aa8-ec66fe8d68a7',
'only_matching': True,
}, {
'url': 'http://www.rtl.nl/system/videoplayer/derden/embed.html#!/uuid=bb0353b0-d6a4-1dad-90e9-18fe75b8d1f0',
'only_matching': True,
@@ -51,7 +55,7 @@ class RtlNlIE(InfoExtractor):
def _real_extract(self, url):
uuid = self._match_id(url)
info = self._download_json(
'http://www.rtl.nl/system/s4m/vfd/version=2/uuid=%s/fmt=flash/' % uuid,
'http://www.rtl.nl/system/s4m/vfd/version=2/uuid=%s/fmt=adaptive/' % uuid,
uuid)
material = info['material'][0]
@@ -59,9 +63,14 @@ class RtlNlIE(InfoExtractor):
subtitle = material['title'] or info['episodes'][0]['name']
description = material.get('synopsis') or info['episodes'][0]['synopsis']
meta = info.get('meta', {})
# Use unencrypted m3u8 streams (See https://github.com/rg3/youtube-dl/issues/4118)
videopath = material['videopath'].replace('.f4m', '.m3u8')
m3u8_url = 'http://manifest.us.rtl.nl' + videopath
# NB: nowadays, recent ffmpeg and avconv can handle these encrypted streams, so
# this adaptive -> flash workaround is not required in general, but it also
# allows bypassing georestriction therefore is retained for now.
videopath = material['videopath'].replace('/adaptive/', '/flash/')
m3u8_url = meta.get('videohost', 'http://manifest.us.rtl.nl') + videopath
formats = self._extract_m3u8_formats(m3u8_url, uuid, ext='mp4')
@@ -82,7 +91,7 @@ class RtlNlIE(InfoExtractor):
self._sort_formats(formats)
thumbnails = []
meta = info.get('meta', {})
for p in ('poster_base_url', '"thumb_base_url"'):
if not meta.get(p):
continue

View File

@@ -53,7 +53,7 @@ class SmotriIE(InfoExtractor):
'thumbnail': 'http://frame4.loadup.ru/03/ed/57591.2.3.jpg',
},
},
# video-password
# video-password, not approved by moderator
{
'url': 'http://smotri.com/video/view/?id=v1390466a13c',
'md5': 'f6331cef33cad65a0815ee482a54440b',
@@ -71,7 +71,24 @@ class SmotriIE(InfoExtractor):
},
'skip': 'Video is not approved by moderator',
},
# age limit + video-password
# video-password
{
'url': 'http://smotri.com/video/view/?id=v6984858774#',
'md5': 'f11e01d13ac676370fc3b95b9bda11b0',
'info_dict': {
'id': 'v6984858774',
'ext': 'mp4',
'title': 'Дача Солженицина ПАРОЛЬ 223322',
'uploader': 'psavari1',
'uploader_id': 'psavari1',
'upload_date': '20081103',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'videopassword': '223322',
},
},
# age limit + video-password, not approved by moderator
{
'url': 'http://smotri.com/video/view/?id=v15408898bcf',
'md5': '91e909c9f0521adf5ee86fbe073aad70',
@@ -90,19 +107,22 @@ class SmotriIE(InfoExtractor):
},
'skip': 'Video is not approved by moderator',
},
# not approved by moderator, but available
# age limit + video-password
{
'url': 'http://smotri.com/video/view/?id=v28888533b73',
'md5': 'f44bc7adac90af518ef1ecf04893bb34',
'url': 'http://smotri.com/video/view/?id=v7780025814',
'md5': 'b4599b068422559374a59300c5337d72',
'info_dict': {
'id': 'v28888533b73',
'id': 'v7780025814',
'ext': 'mp4',
'title': 'Russian Spies Killed By ISIL Child Soldier',
'uploader': 'Mopeder',
'uploader_id': 'mopeder',
'duration': 71,
'thumbnail': 'http://frame9.loadup.ru/d7/32/2888853.2.3.jpg',
'upload_date': '20150114',
'title': 'Sexy Beach (пароль 123)',
'uploader': 'вАся',
'uploader_id': 'asya_prosto',
'upload_date': '20081218',
'thumbnail': 're:^https?://.*\.jpg$',
'age_limit': 18,
},
'params': {
'videopassword': '123'
},
},
# swf player
@@ -152,6 +172,10 @@ class SmotriIE(InfoExtractor):
'getvideoinfo': '1',
}
video_password = self._downloader.params.get('videopassword', None)
if video_password:
video_form['pass'] = hashlib.md5(video_password.encode('utf-8')).hexdigest()
request = compat_urllib_request.Request(
'http://smotri.com/video/view/url/bot/', compat_urllib_parse.urlencode(video_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
@@ -161,13 +185,18 @@ class SmotriIE(InfoExtractor):
video_url = video.get('_vidURL') or video.get('_vidURL_mp4')
if not video_url:
if video.get('_moderate_no') or not video.get('moderated'):
if video.get('_moderate_no'):
raise ExtractorError(
'Video %s has not been approved by moderator' % video_id, expected=True)
if video.get('error'):
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
if video.get('_pass_protected') == 1:
msg = ('Invalid video password' if video_password
else 'This video is protected by a password, use the --video-password option')
raise ExtractorError(msg, expected=True)
title = video['title']
thumbnail = video['_imgURL']
upload_date = unified_strdate(video['added'])

View File

@@ -0,0 +1,171 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
determine_ext,
int_or_none,
js_to_json,
parse_duration,
)
class SnagFilmsEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|embed)\.)?snagfilms\.com/embed/player\?.*\bfilmId=(?P<id>[\da-f-]{36})'
_TESTS = [{
'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
'md5': '2924e9215c6eff7a55ed35b72276bd93',
'info_dict': {
'id': '74849a00-85a9-11e1-9660-123139220831',
'ext': 'mp4',
'title': '#whilewewatch',
}
}, {
'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:embed\.)?snagfilms\.com/embed/player.+?)\1',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if '>This film is not playable in your area.<' in webpage:
raise ExtractorError(
'Film %s is not playable in your area.' % video_id, expected=True)
formats = []
for source in self._parse_json(js_to_json(self._search_regex(
r'(?s)sources:\s*(\[.+?\]),', webpage, 'json')), video_id):
file_ = source.get('file')
if not file_:
continue
type_ = source.get('type')
format_id = source.get('label')
ext = determine_ext(file_)
if any(_ == 'm3u8' for _ in (type_, ext)):
formats.extend(self._extract_m3u8_formats(
file_, video_id, 'mp4', m3u8_id='hls'))
else:
bitrate = int_or_none(self._search_regex(
r'(\d+)kbps', file_, 'bitrate', default=None))
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None))
formats.append({
'url': file_,
'format_id': format_id,
'tbr': bitrate,
'height': height,
})
self._sort_formats(formats)
title = self._search_regex(
[r"title\s*:\s*'([^']+)'", r'<title>([^<]+)</title>'],
webpage, 'title')
return {
'id': video_id,
'title': title,
'formats': formats,
}
class SnagFilmsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?snagfilms\.com/(?:films/title|show)/(?P<id>[^?#]+)'
_TESTS = [{
'url': 'http://www.snagfilms.com/films/title/lost_for_life',
'md5': '19844f897b35af219773fd63bdec2942',
'info_dict': {
'id': '0000014c-de2f-d5d6-abcf-ffef58af0017',
'display_id': 'lost_for_life',
'ext': 'mp4',
'title': 'Lost for Life',
'description': 'md5:fbdacc8bb6b455e464aaf98bc02e1c82',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 4489,
'categories': ['Documentary', 'Crime', 'Award Winning', 'Festivals']
}
}, {
'url': 'http://www.snagfilms.com/show/the_world_cut_project/india',
'md5': 'e6292e5b837642bbda82d7f8bf3fbdfd',
'info_dict': {
'id': '00000145-d75c-d96e-a9c7-ff5c67b20000',
'display_id': 'the_world_cut_project/india',
'ext': 'mp4',
'title': 'India',
'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 979,
'categories': ['Documentary', 'Sports', 'Politics']
}
}, {
# Film is not playable in your area.
'url': 'http://www.snagfilms.com/films/title/inside_mecca',
'only_matching': True,
}, {
# Film is not available.
'url': 'http://www.snagfilms.com/show/augie_alone/flirting',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
if ">Sorry, the Film you're looking for is not available.<" in webpage:
raise ExtractorError(
'Film %s is not available.' % display_id, expected=True)
film_id = self._search_regex(r'filmId=([\da-f-]{36})"', webpage, 'film id')
snag = self._parse_json(
self._search_regex(
'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'),
display_id)
for item in snag:
if item.get('data', {}).get('film', {}).get('id') == film_id:
data = item['data']['film']
title = data['title']
description = clean_html(data.get('synopsis'))
thumbnail = data.get('image')
duration = int_or_none(data.get('duration') or data.get('runtime'))
categories = [
category['title'] for category in data.get('categories', [])
if category.get('title')]
break
else:
title = self._search_regex(
r'itemprop="title">([^<]+)<', webpage, 'title')
description = self._html_search_regex(
r'(?s)<div itemprop="description" class="film-synopsis-inner ">(.+?)</div>',
webpage, 'description', default=None) or self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
duration = parse_duration(self._search_regex(
r'<span itemprop="duration" class="film-duration strong">([^<]+)<',
webpage, 'duration', fatal=False))
categories = re.findall(r'<a href="/movies/[^"]+">([^<]+)</a>', webpage)
return {
'_type': 'url_transparent',
'url': 'http://embed.snagfilms.com/embed/player?filmId=%s' % film_id,
'id': film_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'categories': categories,
}

View File

@@ -77,11 +77,13 @@ class SpiegeltvIE(InfoExtractor):
'rtmp_live': True,
})
elif determine_ext(endpoint) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
m3u8_formats = self._extract_m3u8_formats(
endpoint.replace('[video]', play_path),
video_id, 'm4v',
preference=1, # Prefer hls since it allows to workaround georestriction
m3u8_id='hls'))
m3u8_id='hls', fatal=False)
if m3u8_formats is not False:
formats.extend(m3u8_formats)
else:
formats.append({
'url': endpoint,

View File

@@ -1,9 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import unified_strdate
@@ -17,7 +14,7 @@ class TheSixtyOneIE(InfoExtractor):
song
)/(?P<id>[A-Za-z0-9]+)/?$'''
_SONG_URL_TEMPLATE = 'http://thesixtyone.com/s/{0:}'
_SONG_FILE_URL_TEMPLATE = 'http://{audio_server:}.thesixtyone.com/thesixtyone_production/audio/{0:}_stream'
_SONG_FILE_URL_TEMPLATE = 'http://{audio_server:}/thesixtyone_production/audio/{0:}_stream'
_THUMBNAIL_URL_TEMPLATE = '{photo_base_url:}_desktop'
_TESTS = [
{
@@ -70,14 +67,19 @@ class TheSixtyOneIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group('id')
song_id = self._match_id(url)
webpage = self._download_webpage(
self._SONG_URL_TEMPLATE.format(song_id), song_id)
song_data = json.loads(self._search_regex(
r'"%s":\s(\{.*?\})' % song_id, webpage, 'song_data'))
song_data = self._parse_json(self._search_regex(
r'"%s":\s(\{.*?\})' % song_id, webpage, 'song_data'), song_id)
if self._search_regex(r'(t61\.s3_audio_load\s*=\s*1\.0;)', webpage, 's3_audio_load marker', default=None):
song_data['audio_server'] = 's3.amazonaws.com'
else:
song_data['audio_server'] = song_data['audio_server'] + '.thesixtyone.com'
keys = [self._DECODE_MAP.get(s, s) for s in song_data['key']]
url = self._SONG_FILE_URL_TEMPLATE.format(
"".join(reversed(keys)), **song_data)

View File

@@ -0,0 +1,40 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class ThisAmericanLifeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisamericanlife\.org/(?:radio-archives/episode/|play_full\.php\?play=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.thisamericanlife.org/radio-archives/episode/487/harper-high-school-part-one',
'md5': '8f7d2da8926298fdfca2ee37764c11ce',
'info_dict': {
'id': '487',
'ext': 'm4a',
'title': '487: Harper High School, Part One',
'description': 'md5:ee40bdf3fb96174a9027f76dbecea655',
'thumbnail': 're:^https?://.*\.jpg$',
},
}, {
'url': 'http://www.thisamericanlife.org/play_full.php?play=487',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://www.thisamericanlife.org/radio-archives/episode/%s' % video_id, video_id)
return {
'id': video_id,
'url': 'http://stream.thisamericanlife.org/{0}/stream/{0}_64k.m3u8'.format(video_id),
'protocol': 'm3u8_native',
'ext': 'm4a',
'acodec': 'aac',
'vcodec': 'none',
'abr': 64,
'title': self._html_search_meta(r'twitter:title', webpage, 'title', fatal=True),
'description': self._html_search_meta(r'description', webpage, 'description'),
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@@ -3,39 +3,70 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
parse_duration,
fix_xml_ampersands,
float_or_none,
int_or_none,
parse_duration,
str_to_int,
xpath_text,
)
class TNAFlixIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)'
_TITLE_REGEX = r'<title>(.+?) - TNAFlix Porn Videos</title>'
_DESCRIPTION_REGEX = r'<h3 itemprop="description">([^<]+)</h3>'
_CONFIG_REGEX = r'flashvars\.config\s*=\s*escape\("([^"]+)"'
_TESTS = [
{
'url': 'http://www.tnaflix.com/porn-stars/Carmella-Decesare-striptease/video553878',
'md5': 'ecf3498417d09216374fc5907f9c6ec0',
'info_dict': {
'id': '553878',
'display_id': 'Carmella-Decesare-striptease',
'ext': 'mp4',
'title': 'Carmella Decesare - striptease',
'description': '',
'thumbnail': 're:https?://.*\.jpg$',
'duration': 91,
'age_limit': 18,
}
},
{
'url': 'https://www.tnaflix.com/amateur-porn/bunzHD-Ms.Donk/video358632',
'only_matching': True,
}
class TNAFlixNetworkBaseIE(InfoExtractor):
# May be overridden in descendants if necessary
_CONFIG_REGEX = [
r'flashvars\.config\s*=\s*escape\("([^"]+)"',
r'<input[^>]+name="config\d?" value="([^"]+)"',
]
_TITLE_REGEX = r'<input[^>]+name="title" value="([^"]+)"'
_DESCRIPTION_REGEX = r'<input[^>]+name="description" value="([^"]+)"'
_UPLOADER_REGEX = r'<input[^>]+name="username" value="([^"]+)"'
_VIEW_COUNT_REGEX = None
_COMMENT_COUNT_REGEX = None
_AVERAGE_RATING_REGEX = None
_CATEGORIES_REGEX = r'<li[^>]*>\s*<span[^>]+class="infoTitle"[^>]*>Categories:</span>\s*<span[^>]+class="listView"[^>]*>(.+?)</span>\s*</li>'
def _extract_thumbnails(self, flix_xml):
def get_child(elem, names):
for name in names:
child = elem.find(name)
if child is not None:
return child
timeline = get_child(flix_xml, ['timeline', 'rolloverBarImage'])
if timeline is None:
return
pattern_el = get_child(timeline, ['imagePattern', 'pattern'])
if pattern_el is None or not pattern_el.text:
return
first_el = get_child(timeline, ['imageFirst', 'first'])
last_el = get_child(timeline, ['imageLast', 'last'])
if first_el is None or last_el is None:
return
first_text = first_el.text
last_text = last_el.text
if not first_text.isdigit() or not last_text.isdigit():
return
first = int(first_text)
last = int(last_text)
if first > last:
return
width = int_or_none(xpath_text(timeline, './imageWidth', 'thumbnail width'))
height = int_or_none(xpath_text(timeline, './imageHeight', 'thumbnail height'))
return [{
'url': self._proto_relative_url(pattern_el.text.replace('#', compat_str(i)), 'http:'),
'width': width,
'height': height,
} for i in range(first, last + 1)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -44,39 +75,64 @@ class TNAFlixIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
cfg_url = self._proto_relative_url(self._html_search_regex(
self._CONFIG_REGEX, webpage, 'flashvars.config'), 'http:')
cfg_xml = self._download_xml(
cfg_url, display_id, 'Downloading metadata',
transform_source=fix_xml_ampersands)
formats = []
def extract_video_url(vl):
return re.sub('speed=\d+', 'speed=', vl.text)
video_link = cfg_xml.find('./videoLink')
if video_link is not None:
formats.append({
'url': extract_video_url(video_link),
'ext': xpath_text(cfg_xml, './videoConfig/type', 'type', default='flv'),
})
for item in cfg_xml.findall('./quality/item'):
video_link = item.find('./videoLink')
if video_link is None:
continue
res = item.find('res')
format_id = None if res is None else res.text
height = int_or_none(self._search_regex(
r'^(\d+)[pP]', format_id, 'height', default=None))
formats.append({
'url': self._proto_relative_url(extract_video_url(video_link), 'http:'),
'format_id': format_id,
'height': height,
})
self._sort_formats(formats)
thumbnail = self._proto_relative_url(
xpath_text(cfg_xml, './startThumb', 'thumbnail'), 'http:')
thumbnails = self._extract_thumbnails(cfg_xml)
title = self._html_search_regex(
self._TITLE_REGEX, webpage, 'title') if self._TITLE_REGEX else self._og_search_title(webpage)
description = self._html_search_regex(
self._DESCRIPTION_REGEX, webpage, 'description', fatal=False, default='')
age_limit = self._rta_search(webpage)
duration = parse_duration(self._html_search_meta(
'duration', webpage, 'duration', default=None))
cfg_url = self._proto_relative_url(self._html_search_regex(
self._CONFIG_REGEX, webpage, 'flashvars.config'), 'http:')
def extract_field(pattern, name):
return self._html_search_regex(pattern, webpage, name, default=None) if pattern else None
cfg_xml = self._download_xml(
cfg_url, display_id, note='Downloading metadata',
transform_source=fix_xml_ampersands)
description = extract_field(self._DESCRIPTION_REGEX, 'description')
uploader = extract_field(self._UPLOADER_REGEX, 'uploader')
view_count = str_to_int(extract_field(self._VIEW_COUNT_REGEX, 'view count'))
comment_count = str_to_int(extract_field(self._COMMENT_COUNT_REGEX, 'comment count'))
average_rating = float_or_none(extract_field(self._AVERAGE_RATING_REGEX, 'average rating'))
thumbnail = self._proto_relative_url(
cfg_xml.find('./startThumb').text, 'http:')
formats = []
for item in cfg_xml.findall('./quality/item'):
video_url = re.sub('speed=\d+', 'speed=', item.find('videoLink').text)
format_id = item.find('res').text
fmt = {
'url': self._proto_relative_url(video_url, 'http:'),
'format_id': format_id,
}
m = re.search(r'^(\d+)', format_id)
if m:
fmt['height'] = int(m.group(1))
formats.append(fmt)
self._sort_formats(formats)
categories_str = extract_field(self._CATEGORIES_REGEX, 'categories')
categories = categories_str.split(', ') if categories_str is not None else []
return {
'id': video_id,
@@ -84,7 +140,130 @@ class TNAFlixIE(InfoExtractor):
'title': title,
'description': description,
'thumbnail': thumbnail,
'thumbnails': thumbnails,
'duration': duration,
'age_limit': age_limit,
'uploader': uploader,
'view_count': view_count,
'comment_count': comment_count,
'average_rating': average_rating,
'categories': categories,
'formats': formats,
}
class TNAFlixIE(TNAFlixNetworkBaseIE):
_VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)'
_TITLE_REGEX = r'<title>(.+?) - TNAFlix Porn Videos</title>'
_DESCRIPTION_REGEX = r'<h3 itemprop="description">([^<]+)</h3>'
_UPLOADER_REGEX = r'(?s)<span[^>]+class="infoTitle"[^>]*>Uploaded By:</span>(.+?)<div'
_TESTS = [{
# anonymous uploader, no categories
'url': 'http://www.tnaflix.com/porn-stars/Carmella-Decesare-striptease/video553878',
'md5': 'ecf3498417d09216374fc5907f9c6ec0',
'info_dict': {
'id': '553878',
'display_id': 'Carmella-Decesare-striptease',
'ext': 'mp4',
'title': 'Carmella Decesare - striptease',
'thumbnail': 're:https?://.*\.jpg$',
'duration': 91,
'age_limit': 18,
'uploader': 'Anonymous',
'categories': [],
}
}, {
# non-anonymous uploader, categories
'url': 'https://www.tnaflix.com/teen-porn/Educational-xxx-video/video6538',
'md5': '0f5d4d490dbfd117b8607054248a07c0',
'info_dict': {
'id': '6538',
'display_id': 'Educational-xxx-video',
'ext': 'mp4',
'title': 'Educational xxx video',
'description': 'md5:b4fab8f88a8621c8fabd361a173fe5b8',
'thumbnail': 're:https?://.*\.jpg$',
'duration': 164,
'age_limit': 18,
'uploader': 'bobwhite39',
'categories': ['Amateur Porn', 'Squirting Videos', 'Teen Girls 18+'],
}
}, {
'url': 'https://www.tnaflix.com/amateur-porn/bunzHD-Ms.Donk/video358632',
'only_matching': True,
}]
class EMPFlixIE(TNAFlixNetworkBaseIE):
_VALID_URL = r'https?://(?:www\.)?empflix\.com/videos/(?P<display_id>.+?)-(?P<id>[0-9]+)\.html'
_UPLOADER_REGEX = r'<span[^>]+class="infoTitle"[^>]*>Uploaded By:</span>(.+?)</li>'
_TESTS = [{
'url': 'http://www.empflix.com/videos/Amateur-Finger-Fuck-33051.html',
'md5': 'b1bc15b6412d33902d6e5952035fcabc',
'info_dict': {
'id': '33051',
'display_id': 'Amateur-Finger-Fuck',
'ext': 'mp4',
'title': 'Amateur Finger Fuck',
'description': 'Amateur solo finger fucking.',
'thumbnail': 're:https?://.*\.jpg$',
'duration': 83,
'age_limit': 18,
'uploader': 'cwbike',
'categories': ['Amateur', 'Anal', 'Fisting', 'Home made', 'Solo'],
}
}, {
'url': 'http://www.empflix.com/videos/[AROMA][ARMD-718]-Aoi-Yoshino-Sawa-25826.html',
'only_matching': True,
}]
class MovieFapIE(TNAFlixNetworkBaseIE):
_VALID_URL = r'https?://(?:www\.)?moviefap\.com/videos/(?P<id>[0-9a-f]+)/(?P<display_id>[^/]+)\.html'
_VIEW_COUNT_REGEX = r'<br>Views\s*<strong>([\d,.]+)</strong>'
_COMMENT_COUNT_REGEX = r'<span[^>]+id="comCount"[^>]*>([\d,.]+)</span>'
_AVERAGE_RATING_REGEX = r'Current Rating\s*<br>\s*<strong>([\d.]+)</strong>'
_CATEGORIES_REGEX = r'(?s)<div[^>]+id="vid_info"[^>]*>\s*<div[^>]*>.+?</div>(.*?)<br>'
_TESTS = [{
# normal, multi-format video
'url': 'http://www.moviefap.com/videos/be9867c9416c19f54a4a/experienced-milf-amazing-handjob.html',
'md5': '26624b4e2523051b550067d547615906',
'info_dict': {
'id': 'be9867c9416c19f54a4a',
'display_id': 'experienced-milf-amazing-handjob',
'ext': 'mp4',
'title': 'Experienced MILF Amazing Handjob',
'description': 'Experienced MILF giving an Amazing Handjob',
'thumbnail': 're:https?://.*\.jpg$',
'age_limit': 18,
'uploader': 'darvinfred06',
'view_count': int,
'comment_count': int,
'average_rating': float,
'categories': ['Amateur', 'Masturbation', 'Mature', 'Flashing'],
}
}, {
# quirky single-format case where the extension is given as fid, but the video is really an flv
'url': 'http://www.moviefap.com/videos/e5da0d3edce5404418f5/jeune-couple-russe.html',
'md5': 'fa56683e291fc80635907168a743c9ad',
'info_dict': {
'id': 'e5da0d3edce5404418f5',
'display_id': 'jeune-couple-russe',
'ext': 'flv',
'title': 'Jeune Couple Russe',
'description': 'Amateur',
'thumbnail': 're:https?://.*\.jpg$',
'age_limit': 18,
'uploader': 'whiskeyjar',
'view_count': int,
'comment_count': int,
'average_rating': float,
'categories': ['Amateur', 'Teen'],
}
}]

View File

@@ -189,17 +189,17 @@ class TwitchVodIE(TwitchItemBaseIE):
_ITEM_SHORTCUT = 'v'
_TEST = {
'url': 'http://www.twitch.tv/ksptv/v/3622000',
'url': 'http://www.twitch.tv/riotgames/v/6528877',
'info_dict': {
'id': 'v3622000',
'id': 'v6528877',
'ext': 'mp4',
'title': '''KSPTV: Squadcast: "Everyone's on vacation so here's Dahud" Edition!''',
'title': 'LCK Summer Split - Week 6 Day 1',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 6951,
'timestamp': 1419028564,
'upload_date': '20141219',
'uploader': 'KSPTV',
'uploader_id': 'ksptv',
'duration': 17208,
'timestamp': 1435131709,
'upload_date': '20150624',
'uploader': 'Riot Games',
'uploader_id': 'riotgames',
'view_count': int,
},
'params': {
@@ -215,7 +215,7 @@ class TwitchVodIE(TwitchItemBaseIE):
'%s/api/vods/%s/access_token' % (self._API_BASE, item_id), item_id,
'Downloading %s access token' % self._ITEM_TYPE)
formats = self._extract_m3u8_formats(
'%s/vod/%s?nauth=%s&nauthsig=%s'
'%s/vod/%s?nauth=%s&nauthsig=%s&allow_source=true'
% (self._USHER_BASE, item_id, access_token['token'], access_token['sig']),
item_id, 'mp4')
self._prefer_source(formats)

View File

@@ -0,0 +1,72 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_request
from ..utils import (
float_or_none,
unescapeHTML,
)
class TwitterCardIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?twitter\.com/i/cards/tfw/v1/(?P<id>\d+)'
_TEST = {
'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
'md5': 'a74f50b310c83170319ba16de6955192',
'info_dict': {
'id': '560070183650213889',
'ext': 'mp4',
'title': 'TwitterCard',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 30.033,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
# Different formats served for different User-Agents
USER_AGENTS = [
'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/20.0 (Chrome)', # mp4
'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0', # webm
]
config = None
formats = []
for user_agent in USER_AGENTS:
request = compat_urllib_request.Request(url)
request.add_header('User-Agent', user_agent)
webpage = self._download_webpage(request, video_id)
config = self._parse_json(
unescapeHTML(self._search_regex(
r'data-player-config="([^"]+)"', webpage, 'data player config')),
video_id)
video_url = config['playlist'][0]['source']
f = {
'url': video_url,
}
m = re.search(r'/(?P<width>\d+)x(?P<height>\d+)/', video_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
formats.append(f)
self._sort_formats(formats)
thumbnail = config.get('posterImageUrl')
duration = float_or_none(config.get('duration'))
return {
'id': video_id,
'title': 'TwitterCard',
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@@ -121,20 +121,27 @@ class VKIE(InfoExtractor):
if username is None:
return
login_form = {
'act': 'login',
'role': 'al_frame',
'expire': '1',
login_page = self._download_webpage(
'https://vk.com', None, 'Downloading login page')
login_form = dict(re.findall(
r'<input\s+type="hidden"\s+name="([^"]+)"\s+(?:id="[^"]+"\s+)?value="([^"]*)"',
login_page))
login_form.update({
'email': username.encode('cp1251'),
'pass': password.encode('cp1251'),
}
})
request = compat_urllib_request.Request('https://login.vk.com/?act=login',
compat_urllib_parse.urlencode(login_form).encode('utf-8'))
login_page = self._download_webpage(request, None, note='Logging in as %s' % username)
request = compat_urllib_request.Request(
'https://login.vk.com/?act=login',
compat_urllib_parse.urlencode(login_form).encode('utf-8'))
login_page = self._download_webpage(
request, None, note='Logging in as %s' % username)
if re.search(r'onLoginFailed', login_page):
raise ExtractorError('Unable to login, incorrect username and/or password', expected=True)
raise ExtractorError(
'Unable to login, incorrect username and/or password', expected=True)
def _real_initialize(self):
self._login()
@@ -146,9 +153,14 @@ class VKIE(InfoExtractor):
if not video_id:
video_id = '%s_%s' % (mobj.group('oid'), mobj.group('id'))
info_url = 'http://vk.com/al_video.php?act=show&al=1&module=video&video=%s' % video_id
info_url = 'https://vk.com/al_video.php?act=show&al=1&module=video&video=%s' % video_id
info_page = self._download_webpage(info_url, video_id)
if re.search(r'<!>/login\.php\?.*\bact=security_check', info_page):
raise ExtractorError(
'You are trying to log in from an unusual location. You should confirm ownership at vk.com to log in with this IP.',
expected=True)
ERRORS = {
r'>Видеозапись .*? была изъята из публичного доступа в связи с обращением правообладателя.<':
'Video %s has been removed from public access due to rightholder complaint.',

View File

@@ -36,6 +36,7 @@ class VubeIE(InfoExtractor):
'comment_count': int,
'categories': ['amazing', 'hd', 'best drummer ever', 'william wei', 'bucket drumming', 'street drummer', 'epic street drumming'],
},
'skip': 'Not accessible from Travis CI server',
}, {
'url': 'http://vube.com/Chiara+Grispo+Video+Channel/YL2qNPkqon',
'md5': 'db7aba89d4603dadd627e9d1973946fe',

View File

@@ -29,9 +29,11 @@ from ..utils import (
get_element_by_id,
int_or_none,
orderedSet,
str_to_int,
unescapeHTML,
unified_strdate,
uppercase_escape,
ISO3166Utils,
)
@@ -518,6 +520,20 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'skip_download': 'requires avconv',
}
},
# Extraction from multiple DASH manifests (https://github.com/rg3/youtube-dl/pull/6097)
{
'url': 'https://www.youtube.com/watch?v=FIl7x6_3R5Y',
'info_dict': {
'id': 'FIl7x6_3R5Y',
'ext': 'mp4',
'title': 'md5:7b81415841e02ecd4313668cde88737a',
'description': 'md5:116377fd2963b81ec4ce64b542173306',
'upload_date': '20150625',
'uploader_id': 'dorappi2000',
'uploader': 'dorappi2000',
'formats': 'mincount:33',
},
}
]
def __init__(self, *args, **kwargs):
@@ -824,6 +840,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
except StopIteration:
full_info = self._formats.get(format_id, {}).copy()
full_info.update(f)
codecs = r.attrib.get('codecs')
if codecs:
if full_info.get('acodec') == 'none' and 'vcodec' not in full_info:
full_info['vcodec'] = codecs
elif full_info.get('vcodec') == 'none' and 'acodec' not in full_info:
full_info['acodec'] = codecs
formats.append(full_info)
else:
existing_format.update(f)
@@ -853,6 +875,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else:
player_url = None
dash_mpds = []
def add_dash_mpd(video_info):
dash_mpd = video_info.get('dashmpd')
if dash_mpd and dash_mpd[0] not in dash_mpds:
dash_mpds.append(dash_mpd[0])
# Get video info
embed_webpage = None
if re.search(r'player-age-gate-content">', video_webpage) is not None:
@@ -873,24 +902,29 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
note='Refetching age-gated info webpage',
errnote='unable to download video info webpage')
video_info = compat_parse_qs(video_info_webpage)
add_dash_mpd(video_info)
else:
age_gate = False
try:
# Try looking directly into the video webpage
mobj = re.search(r';ytplayer\.config\s*=\s*({.*?});', video_webpage)
if not mobj:
raise ValueError('Could not find ytplayer.config') # caught below
video_info = None
# Try looking directly into the video webpage
mobj = re.search(r';ytplayer\.config\s*=\s*({.*?});', video_webpage)
if mobj:
json_code = uppercase_escape(mobj.group(1))
ytplayer_config = json.loads(json_code)
args = ytplayer_config['args']
# Convert to the same format returned by compat_parse_qs
video_info = dict((k, [v]) for k, v in args.items())
if not args.get('url_encoded_fmt_stream_map'):
raise ValueError('No stream_map present') # caught below
except ValueError:
# We fallback to the get_video_info pages (used by the embed page)
if args.get('url_encoded_fmt_stream_map'):
# Convert to the same format returned by compat_parse_qs
video_info = dict((k, [v]) for k, v in args.items())
add_dash_mpd(video_info)
if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
# We also try looking in get_video_info since it may contain different dashmpd
# URL that points to a DASH manifest with possibly different itag set (some itags
# are missing from DASH manifest pointed by webpage's dashmpd, some - from DASH
# manifest pointed by get_video_info's dashmpd).
# The general idea is to take a union of itags of both DASH manifests (for example
# video with such 'manifest behavior' see https://github.com/rg3/youtube-dl/issues/6093)
self.report_video_info_webpage_download(video_id)
for el_type in ['&el=embedded', '&el=detailpage', '&el=vevo', '']:
for el_type in ['&el=info', '&el=embedded', '&el=detailpage', '&el=vevo', '']:
video_info_url = (
'%s://www.youtube.com/get_video_info?&video_id=%s%s&ps=default&eurl=&gl=US&hl=en'
% (proto, video_id, el_type))
@@ -898,11 +932,20 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_info_url,
video_id, note=False,
errnote='unable to download video info webpage')
video_info = compat_parse_qs(video_info_webpage)
if 'token' in video_info:
get_video_info = compat_parse_qs(video_info_webpage)
add_dash_mpd(get_video_info)
if not video_info:
video_info = get_video_info
if 'token' in get_video_info:
break
if 'token' not in video_info:
if 'reason' in video_info:
if 'The uploader has not made this video available in your country.' in video_info['reason']:
regions_allowed = self._html_search_meta('regionsAllowed', video_webpage, default=None)
if regions_allowed is not None:
raise ExtractorError('YouTube said: This video is available in %s only' % (
', '.join(map(ISO3166Utils.short2full, regions_allowed.split(',')))),
expected=True)
raise ExtractorError(
'YouTube said: %s' % video_info['reason'][0],
expected=True, video_id=video_id)
@@ -956,15 +999,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_thumbnail = compat_urllib_parse.unquote_plus(video_info['thumbnail_url'][0])
# upload date
upload_date = None
mobj = re.search(r'(?s)id="eow-date.*?>(.*?)</span>', video_webpage)
if mobj is None:
mobj = re.search(
r'(?s)id="watch-uploader-info".*?>.*?(?:Published|Uploaded|Streamed live) on (.*?)</strong>',
video_webpage)
if mobj is not None:
upload_date = ' '.join(re.sub(r'[/,-]', r' ', mobj.group(1)).split())
upload_date = unified_strdate(upload_date)
upload_date = self._html_search_meta(
'datePublished', video_webpage, 'upload date', default=None)
if not upload_date:
upload_date = self._search_regex(
[r'(?s)id="eow-date.*?>(.*?)</span>',
r'id="watch-uploader-info".*?>.*?(?:Published|Uploaded|Streamed live|Started) on (.+?)</strong>'],
video_webpage, 'upload date', default=None)
if upload_date:
upload_date = ' '.join(re.sub(r'[/,-]', r' ', mobj.group(1)).split())
upload_date = unified_strdate(upload_date)
m_cat_container = self._search_regex(
r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
@@ -998,12 +1042,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_description = ''
def _extract_count(count_name):
count = self._search_regex(
r'id="watch-%s"[^>]*>.*?([\d,]+)\s*</span>' % re.escape(count_name),
video_webpage, count_name, default=None)
if count is not None:
return int(count.replace(',', ''))
return None
return str_to_int(self._search_regex(
r'-%s-button[^>]+><span[^>]+class="yt-uix-button-content"[^>]*>([\d,]+)</span>'
% re.escape(count_name),
video_webpage, count_name, default=None))
like_count = _extract_count('like')
dislike_count = _extract_count('dislike')
@@ -1118,24 +1161,25 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Look for the DASH manifest
if self._downloader.params.get('youtube_include_dash_manifest', True):
dash_mpd = video_info.get('dashmpd')
if dash_mpd:
dash_manifest_url = dash_mpd[0]
for dash_manifest_url in dash_mpds:
dash_formats = {}
try:
dash_formats = self._parse_dash_manifest(
video_id, dash_manifest_url, player_url, age_gate)
for df in self._parse_dash_manifest(
video_id, dash_manifest_url, player_url, age_gate):
# Do not overwrite DASH format found in some previous DASH manifest
if df['format_id'] not in dash_formats:
dash_formats[df['format_id']] = df
except (ExtractorError, KeyError) as e:
self.report_warning(
'Skipping DASH manifest: %r' % e, video_id)
else:
if dash_formats:
# Remove the formats we found through non-DASH, they
# contain less info and it can be wrong, because we use
# fixed values (for example the resolution). See
# https://github.com/rg3/youtube-dl/issues/5774 for an
# example.
dash_keys = set(df['format_id'] for df in dash_formats)
formats = [f for f in formats if f['format_id'] not in dash_keys]
formats.extend(dash_formats)
formats = [f for f in formats if f['format_id'] not in dash_formats.keys()]
formats.extend(dash_formats.values())
# Check for malformed aspect ratio
stretched_m = re.search(

View File

@@ -346,12 +346,13 @@ def parseOpts(overrideArguments=None):
video_format.add_option(
'--youtube-skip-dash-manifest',
action='store_false', dest='youtube_include_dash_manifest',
help='Do not download the DASH manifest on YouTube videos')
help='Do not download the DASH manifests and related data on YouTube videos')
video_format.add_option(
'--merge-output-format',
action='store', dest='merge_output_format', metavar='FORMAT', default=None,
help=(
'If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv.'
'If a merge is required (e.g. bestvideo+bestaudio), '
'output to given container format. One of mkv, mp4, ogg, webm, flv. '
'Ignored if no merge is required'))
subtitles = optparse.OptionGroup(parser, 'Subtitle Options')

View File

@@ -62,6 +62,8 @@ std_headers = {
}
NO_DEFAULT = object()
ENGLISH_MONTH_NAMES = [
'January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']
@@ -171,13 +173,15 @@ def xpath_with_ns(path, ns_map):
return '/'.join(replaced)
def xpath_text(node, xpath, name=None, fatal=False):
def xpath_text(node, xpath, name=None, fatal=False, default=NO_DEFAULT):
if sys.version_info < (2, 7): # Crazy 2.6
xpath = xpath.encode('ascii')
n = node.find(xpath)
if n is None or n.text is None:
if fatal:
if default is not NO_DEFAULT:
return default
elif fatal:
name = xpath if name is None else name
raise ExtractorError('Could not find XML element %s' % name)
else:
@@ -2084,6 +2088,266 @@ class ISO639Utils(object):
return short_name
class ISO3166Utils(object):
# From http://data.okfn.org/data/core/country-list
_country_map = {
'AF': 'Afghanistan',
'AX': 'Åland Islands',
'AL': 'Albania',
'DZ': 'Algeria',
'AS': 'American Samoa',
'AD': 'Andorra',
'AO': 'Angola',
'AI': 'Anguilla',
'AQ': 'Antarctica',
'AG': 'Antigua and Barbuda',
'AR': 'Argentina',
'AM': 'Armenia',
'AW': 'Aruba',
'AU': 'Australia',
'AT': 'Austria',
'AZ': 'Azerbaijan',
'BS': 'Bahamas',
'BH': 'Bahrain',
'BD': 'Bangladesh',
'BB': 'Barbados',
'BY': 'Belarus',
'BE': 'Belgium',
'BZ': 'Belize',
'BJ': 'Benin',
'BM': 'Bermuda',
'BT': 'Bhutan',
'BO': 'Bolivia, Plurinational State of',
'BQ': 'Bonaire, Sint Eustatius and Saba',
'BA': 'Bosnia and Herzegovina',
'BW': 'Botswana',
'BV': 'Bouvet Island',
'BR': 'Brazil',
'IO': 'British Indian Ocean Territory',
'BN': 'Brunei Darussalam',
'BG': 'Bulgaria',
'BF': 'Burkina Faso',
'BI': 'Burundi',
'KH': 'Cambodia',
'CM': 'Cameroon',
'CA': 'Canada',
'CV': 'Cape Verde',
'KY': 'Cayman Islands',
'CF': 'Central African Republic',
'TD': 'Chad',
'CL': 'Chile',
'CN': 'China',
'CX': 'Christmas Island',
'CC': 'Cocos (Keeling) Islands',
'CO': 'Colombia',
'KM': 'Comoros',
'CG': 'Congo',
'CD': 'Congo, the Democratic Republic of the',
'CK': 'Cook Islands',
'CR': 'Costa Rica',
'CI': 'Côte d\'Ivoire',
'HR': 'Croatia',
'CU': 'Cuba',
'CW': 'Curaçao',
'CY': 'Cyprus',
'CZ': 'Czech Republic',
'DK': 'Denmark',
'DJ': 'Djibouti',
'DM': 'Dominica',
'DO': 'Dominican Republic',
'EC': 'Ecuador',
'EG': 'Egypt',
'SV': 'El Salvador',
'GQ': 'Equatorial Guinea',
'ER': 'Eritrea',
'EE': 'Estonia',
'ET': 'Ethiopia',
'FK': 'Falkland Islands (Malvinas)',
'FO': 'Faroe Islands',
'FJ': 'Fiji',
'FI': 'Finland',
'FR': 'France',
'GF': 'French Guiana',
'PF': 'French Polynesia',
'TF': 'French Southern Territories',
'GA': 'Gabon',
'GM': 'Gambia',
'GE': 'Georgia',
'DE': 'Germany',
'GH': 'Ghana',
'GI': 'Gibraltar',
'GR': 'Greece',
'GL': 'Greenland',
'GD': 'Grenada',
'GP': 'Guadeloupe',
'GU': 'Guam',
'GT': 'Guatemala',
'GG': 'Guernsey',
'GN': 'Guinea',
'GW': 'Guinea-Bissau',
'GY': 'Guyana',
'HT': 'Haiti',
'HM': 'Heard Island and McDonald Islands',
'VA': 'Holy See (Vatican City State)',
'HN': 'Honduras',
'HK': 'Hong Kong',
'HU': 'Hungary',
'IS': 'Iceland',
'IN': 'India',
'ID': 'Indonesia',
'IR': 'Iran, Islamic Republic of',
'IQ': 'Iraq',
'IE': 'Ireland',
'IM': 'Isle of Man',
'IL': 'Israel',
'IT': 'Italy',
'JM': 'Jamaica',
'JP': 'Japan',
'JE': 'Jersey',
'JO': 'Jordan',
'KZ': 'Kazakhstan',
'KE': 'Kenya',
'KI': 'Kiribati',
'KP': 'Korea, Democratic People\'s Republic of',
'KR': 'Korea, Republic of',
'KW': 'Kuwait',
'KG': 'Kyrgyzstan',
'LA': 'Lao People\'s Democratic Republic',
'LV': 'Latvia',
'LB': 'Lebanon',
'LS': 'Lesotho',
'LR': 'Liberia',
'LY': 'Libya',
'LI': 'Liechtenstein',
'LT': 'Lithuania',
'LU': 'Luxembourg',
'MO': 'Macao',
'MK': 'Macedonia, the Former Yugoslav Republic of',
'MG': 'Madagascar',
'MW': 'Malawi',
'MY': 'Malaysia',
'MV': 'Maldives',
'ML': 'Mali',
'MT': 'Malta',
'MH': 'Marshall Islands',
'MQ': 'Martinique',
'MR': 'Mauritania',
'MU': 'Mauritius',
'YT': 'Mayotte',
'MX': 'Mexico',
'FM': 'Micronesia, Federated States of',
'MD': 'Moldova, Republic of',
'MC': 'Monaco',
'MN': 'Mongolia',
'ME': 'Montenegro',
'MS': 'Montserrat',
'MA': 'Morocco',
'MZ': 'Mozambique',
'MM': 'Myanmar',
'NA': 'Namibia',
'NR': 'Nauru',
'NP': 'Nepal',
'NL': 'Netherlands',
'NC': 'New Caledonia',
'NZ': 'New Zealand',
'NI': 'Nicaragua',
'NE': 'Niger',
'NG': 'Nigeria',
'NU': 'Niue',
'NF': 'Norfolk Island',
'MP': 'Northern Mariana Islands',
'NO': 'Norway',
'OM': 'Oman',
'PK': 'Pakistan',
'PW': 'Palau',
'PS': 'Palestine, State of',
'PA': 'Panama',
'PG': 'Papua New Guinea',
'PY': 'Paraguay',
'PE': 'Peru',
'PH': 'Philippines',
'PN': 'Pitcairn',
'PL': 'Poland',
'PT': 'Portugal',
'PR': 'Puerto Rico',
'QA': 'Qatar',
'RE': 'Réunion',
'RO': 'Romania',
'RU': 'Russian Federation',
'RW': 'Rwanda',
'BL': 'Saint Barthélemy',
'SH': 'Saint Helena, Ascension and Tristan da Cunha',
'KN': 'Saint Kitts and Nevis',
'LC': 'Saint Lucia',
'MF': 'Saint Martin (French part)',
'PM': 'Saint Pierre and Miquelon',
'VC': 'Saint Vincent and the Grenadines',
'WS': 'Samoa',
'SM': 'San Marino',
'ST': 'Sao Tome and Principe',
'SA': 'Saudi Arabia',
'SN': 'Senegal',
'RS': 'Serbia',
'SC': 'Seychelles',
'SL': 'Sierra Leone',
'SG': 'Singapore',
'SX': 'Sint Maarten (Dutch part)',
'SK': 'Slovakia',
'SI': 'Slovenia',
'SB': 'Solomon Islands',
'SO': 'Somalia',
'ZA': 'South Africa',
'GS': 'South Georgia and the South Sandwich Islands',
'SS': 'South Sudan',
'ES': 'Spain',
'LK': 'Sri Lanka',
'SD': 'Sudan',
'SR': 'Suriname',
'SJ': 'Svalbard and Jan Mayen',
'SZ': 'Swaziland',
'SE': 'Sweden',
'CH': 'Switzerland',
'SY': 'Syrian Arab Republic',
'TW': 'Taiwan, Province of China',
'TJ': 'Tajikistan',
'TZ': 'Tanzania, United Republic of',
'TH': 'Thailand',
'TL': 'Timor-Leste',
'TG': 'Togo',
'TK': 'Tokelau',
'TO': 'Tonga',
'TT': 'Trinidad and Tobago',
'TN': 'Tunisia',
'TR': 'Turkey',
'TM': 'Turkmenistan',
'TC': 'Turks and Caicos Islands',
'TV': 'Tuvalu',
'UG': 'Uganda',
'UA': 'Ukraine',
'AE': 'United Arab Emirates',
'GB': 'United Kingdom',
'US': 'United States',
'UM': 'United States Minor Outlying Islands',
'UY': 'Uruguay',
'UZ': 'Uzbekistan',
'VU': 'Vanuatu',
'VE': 'Venezuela, Bolivarian Republic of',
'VN': 'Viet Nam',
'VG': 'Virgin Islands, British',
'VI': 'Virgin Islands, U.S.',
'WF': 'Wallis and Futuna',
'EH': 'Western Sahara',
'YE': 'Yemen',
'ZM': 'Zambia',
'ZW': 'Zimbabwe',
}
@classmethod
def short2full(cls, code):
"""Convert an ISO 3166-2 country code to the corresponding full name"""
return cls._country_map.get(code.upper())
class PerRequestProxyHandler(compat_urllib_request.ProxyHandler):
def __init__(self, proxies=None):
# Set default handlers

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2015.06.25'
__version__ = '2015.07.07'