francescomecca.eu/output/categories/archiveorg.xml
Francesco Mecca 2fc0ad5c9f new cv
2020-01-29 11:08:46 +01:00

36 lines
No EOL
3.6 KiB
XML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Caught in the Net (Posts about archive.org)</title><link>francescomecca.eu</link><description></description><atom:link href="francescomecca.eu/categories/archiveorg.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2020 &lt;a href="mailto:francescomecca.eu"&gt;Francesco Mecca&lt;/a&gt; </copyright><lastBuildDate>Wed, 29 Jan 2020 10:04:36 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Script per il bulk download da Archive.org</title><link>francescomecca.eu/blog/2015/6/30/script-per-il-bulk-download-da-archive-org/</link><dc:creator>Francesco Mecca</dc:creator><description>&lt;div&gt;&lt;p&gt;In questi giorni mi e` capitato di dover scaricare varie collezioni da &lt;a href="https://en.wikipedia.org/wiki/Internet_Archive"&gt;archive.org&lt;/a&gt;, una libreria digitale multimediale la cui missione e` laccesso universale a tutta la conoscenza.&lt;/p&gt;
&lt;p&gt;Principalmente lo uso per scaricare tantissime registrazioni live di vari concerti registrati a mio avviso in maniera impeccabile.&lt;/p&gt;
&lt;p&gt;Nel sito si trova una guida per scaricare in bulk usando wget e gli strumenti del sito, ma risulta piuttosto prolissa e complicata se si vuole fare un download al volo.&lt;/p&gt;
&lt;p&gt;Questo e` lo script che uso, modificato da &lt;a href="https://github.com/ghalfacree/bash-scripts/blob/master/archivedownload.sh"&gt;questo&lt;/a&gt; script: e` scritto in bash e funziona su tutte le distribuzioni sulle quali e` installato wget, tail e sed.&lt;/p&gt;
&lt;pre class="wp-code-highlight prettyprint linenums:1"&gt;#!/bin/bash
# Write here the extension of the file that you want to accept
#filetype =.flac
#append this to line 24
#-A .$filetype
#Write here the extension of the file that you want to reject, divided by a comma
fileremove = .null
if [ “$1” = “” ]; then
echo USAGE: archivedownload.sh collectionname
echo See Archive.org entry page for the collection name.
echo Collection name must be entered exactly as shown: lower case, with hyphens.
exit
fi
echo Downloading list of entries for collection name $1…
wget -nd -q “http://archive.org/advancedsearch.php?q=collection%3A$1&amp;amp;fl%5B%5D=identifier&amp;amp;sort%5B%5D=identifier+asc&amp;amp;sort%5B%5D=&amp;amp;sort%5B%5D=&amp;amp;rows=9999&amp;amp;page=1&amp;amp;callback=callback&amp;amp;save=yes&amp;amp;output=csv” -O identifiers.txt
echo Processing entry list for wget parsing…
tail -n +2 identifiers.txt | sed s/”//g &amp;gt; processedidentifiers.txt
if [ “`cat processedidentifiers.txt | wc -l`” = “0” ]; then
echo No identifiers found for collection $1. Check name and try again.
rm processedidentifiers.txt identifiers.txt
exit
fi
echo Beginning wget download of `cat processedidentifiers.txt | wc -l` identifiers…
wget -r -H -nc -np -nH -nd -e -R $fileremove robots=off -i processedidentifiers.txt -B http://archive.org/download/
rm identifiers.txt processedidentifiers.txt
echo Complete.
&lt;/pre&gt;
&lt;p&gt;Francesco Mecca &lt;/p&gt;&lt;/div&gt;</description><category>archive.org</category><category>bulk download archive.org</category><category>PesceWanda</category><category>script</category><guid>francescomecca.eu/blog/2015/6/30/script-per-il-bulk-download-da-archive-org/</guid><pubDate>Tue, 30 Jun 2015 13:39:00 GMT</pubDate></item></channel></rss>