読者です 読者をやめる 読者になる 読者になる

改良版

Perl

PDFで読んだ本出力までできるようにした。perlのスクリプト自体はtexのソース吐くので、適当に変換。同じところにasin.txtというファイルを作っておく必要あり。

perl photo.pl > book.tex ; platex book.tex ; dvips book.dvi > book.ps ; ps2pdf book.ps > book.pdf

ソース

use strict;
use warnings;
use WWW::Mechanize;

my $title_pattern = '<title>(.*?)</title>';
my $price_pattern = '<span class="price bold">\s(.*?)</span>';
my $jpg_pattern = 'http://ec1.images-amazon.com/images/I/.*?\.jpg';
my $url;

my $w=WWW::Mechanize->new;

print_header();

open(FILE,"asin.txt");
my @asins =<FILE>;
close(FILE);
foreach my $asin(@asins){
    chomp($asin);
    $url = "http://d.hatena.ne.jp/asin/".$asin;
    $w->get($url);
    my $content = $w->content();
    print '\begin{itemize}',"\n";
    print '\item ';
    get_title($content,$title_pattern);
    print '\item ';
    get_price($content,$price_pattern);
    print '\end{itemize}',"\n";
    get_photo($w,$content,$jpg_pattern,$asin);
    print_photo($asin);

}

print_footer();


sub get_title{
    my ($content,$title_pattern) = @_;
    my $title;
    if($content =~ /$title_pattern/){
	$title = $1;
    }
    print $title,"\n";
}    

sub get_price{
    my ($content,$price_pattern) = @_;
    my $price;
    if($content =~ /$price_pattern/){
	$price = $1;
    }
    print $price,"\n";
}    

sub get_photo{
    my ($w,$content,$jpg_pattern,$asin) = @_;
    my $filenam
    if($content =~ /$jpg_pattern/){
	$filename = $&;
    }
    my $response = $w -> get($filename);
    open OFH," > $asin.jpg";
    binmode OFH;
    print OFH $response -> content;
    close OFH;
    system("convert $asin.jpg eps2:$asin.eps");
}

sub print_header{
    print '\documentclass[twocolumn]{jsarticle}',"\n";
    print '\usepackage{wrapfig}',"\n";
    print '\usepackage[dvipdfm]{graphicx}',"\n";
    print '\begin{document}',"\n";
}

sub print_footer{
    print '\end{document}',"\n";
}

sub print_photo{
    my $photo=shift;
    print '\includegraphics[scale=1]',"{$photo.eps}\n";
}

PDF

できあがりのpdfはこんな感じ

バグ

  • Amazonの画像ファイルが違うところに置いてあると、画像か取ってこれない
  • 価格が表示してないところが取ってこれない