Thursday, 15 January 2015

r - Trying get the price of products with RCurl -



r - Trying get the price of products with RCurl -

im scrapping cost of products website . in python used urllib2 without problems, when tried using rcurl in r couldn't donwload source code.

i have paste source code product code, grab price. path of product is: http://www.americanas.com.br/produto/code_of_product.

actually, can't download source code of product rcurl. when seek illustration geturl('http://www.americanas.com.br/produto/111467594') returns "".

i tried using geturl('.../produtos/111467594') , download source, in way i'm unable price. :(

anyone know how cost of products?

thanks.

ps.: sorry bad english. :)

welcome stackoverflow.

it's hard me why doesn't work, include verbose=true in geturl? also, notice there's different prices on webpage linked. want or first? how "por price":

library("stringr") productwebpage<-readlines("http://www.americanas.com.br/produto/111467594") pricerow<-productwebpage[grep("p class=\"sale price\"",productwebpage)] price<-str_extract_all(pricerow,"\\(?[0-9,.]+\\)?")[[1]]

you substitute grep("p class=\"sale price\"",productwebpage) either grep("<p><span class=\"regular price\">",productwebpage) (to "de price" / old price) or grep("<span class=\"p-v interest\">",productwebpage) (which give "sem jouros" cost / per month payment). lastly illustration months first , payment after be:

> cost [1] "12" "83,25"

this should work other products (just tried 5 , seemed work of them).

r web-scraping rcurl

No comments:

Post a Comment