r - Trying get the price of products with RCurl -
im scrapping cost of products website . in python used urllib2 without problems, when tried using rcurl in r couldn't donwload source code.
i have paste source code product code, grab price. path of product is: http://www.americanas.com.br/produto/code_of_product.
actually, can't download source code of product rcurl. when seek illustration geturl('http://www.americanas.com.br/produto/111467594') returns "".
i tried using geturl('.../produtos/111467594') , download source, in way i'm unable price. :(
anyone know how cost of products?
thanks.
ps.: sorry bad english. :)
welcome stackoverflow.
it's hard me why doesn't work, include verbose=true
in geturl
? also, notice there's different prices on webpage linked. want or first? how "por price":
library("stringr") productwebpage<-readlines("http://www.americanas.com.br/produto/111467594") pricerow<-productwebpage[grep("p class=\"sale price\"",productwebpage)] price<-str_extract_all(pricerow,"\\(?[0-9,.]+\\)?")[[1]]
you substitute grep("p class=\"sale price\"",productwebpage)
either grep("<p><span class=\"regular price\">",productwebpage)
(to "de price" / old price) or grep("<span class=\"p-v interest\">",productwebpage)
(which give "sem jouros" cost / per month payment). lastly illustration months first , payment after be:
> cost [1] "12" "83,25"
this should work other products (just tried 5 , seemed work of them).
r web-scraping rcurl
No comments:
Post a Comment