Another reason why I love Python
I didn’t get a chance to go see all of the amazing galleries in Brooklyn participating in #gobrooklynart this weekend. Browsing their list of artists revealed a lot of great pictures of their respective work. Naturally I wrote a script to parse through all of their artist pages to grab over 5000 pictures of awesomeness. I’m going to put the links I grabbed in a simple infinite scroll to digest the whole thing over this week.
Here’s the quick and dirty script I wrote in about 20 minutes:
# Get all the artists and all the links from Go! Brookly Art
# and put them in a text file for future use
from BeautifulSoup import BeautifulSoup
import requests
import json
pages = range(1, 115)
storage = open("gobrooklynart.txt", 'a')
for page_number in pages:
url = "https://www.gobrooklynart.org/explore/artists?page=" + str(page_number) + "&neighborhoods=&media=Painting%2CPhotography%2CPerformance%2CVideo%2FFilm%2FSound%2CSculpture%2CPrint+Making%2FBook+Arts%2CIllustration%2CMixed+Media%2CTextile+Arts%2CDrawing%2CInstallation%2CDesign%2CFashion%2CCraft&accessibility=&order_by=&keyword="
site = requests.get(url)
html = BeautifulSoup(site.content)
artists = list(set(html.find(id="search_results").findAll('a')))
for artist in artists:
artist_url = "https://www.gobrooklynart.org" + artist['href']
artist_page = requests.get(artist_url).content
artist_html = BeautifulSoup(artist_page)
data = {}
data['name'] = artist_html.find('h2', {'class':'display-name'}).text
images = artist_html.find(id='profile-photos').findAll('img')
data['pictures'] = [i['src'].replace("thumb", "standard") for i in images]
try:
data['homepage'] = artist_html.find(id='studio-website').findAll('a')[0]['href']
except Exception, e:
print e
data['homepage'] = None
print "Got artist %s" % data['name']
storage.write(json.dumps(data) + ", ")
print "On to page next page."
Now I have a txt document with a simple list of artists and pictures of their work. I actually forgot to put the extra “[ ” and “]”, but whatever you get the gist of it. Hooray!
1 Notes/ Hide
-
scotterc reblogged this from alexkehayias
-
alexkehayias posted this
