Alex Kehayias's Blog

  • Archive
  • RSS

Another reason why I love Python

I didn’t get a chance to go see all of the amazing galleries in Brooklyn participating in #gobrooklynart this weekend. Browsing their list of artists revealed a lot of great pictures of their respective work. Naturally I wrote a script to parse through all of their artist pages to grab over 5000 pictures of awesomeness. I’m going to put the links I grabbed in a simple infinite scroll to digest the whole thing over this week. 

Here’s the quick and dirty script I wrote in about 20 minutes:

# Get all the artists and all the links from Go! Brookly Art
# and put them in a text file for future use

from BeautifulSoup import BeautifulSoup
import requests
import json

pages = range(1, 115)

storage = open("gobrooklynart.txt", 'a')
for page_number in pages:
    url = "https://www.gobrooklynart.org/explore/artists?page=" + str(page_number) + "&neighborhoods=&media=Painting%2CPhotography%2CPerformance%2CVideo%2FFilm%2FSound%2CSculpture%2CPrint+Making%2FBook+Arts%2CIllustration%2CMixed+Media%2CTextile+Arts%2CDrawing%2CInstallation%2CDesign%2CFashion%2CCraft&accessibility=&order_by=&keyword="
    site = requests.get(url)
    html = BeautifulSoup(site.content)
    artists = list(set(html.find(id="search_results").findAll('a')))
    for artist in artists:
        artist_url = "https://www.gobrooklynart.org" + artist['href']
        artist_page = requests.get(artist_url).content
        artist_html = BeautifulSoup(artist_page)
        data = {}
        data['name'] = artist_html.find('h2', {'class':'display-name'}).text
        images = artist_html.find(id='profile-photos').findAll('img')
        data['pictures'] = [i['src'].replace("thumb", "standard") for i in images]
        try:
            data['homepage'] = artist_html.find(id='studio-website').findAll('a')[0]['href']
        except Exception, e:
            print e
            data['homepage'] = None
        print "Got artist %s" % data['name']
        storage.write(json.dumps(data) + ", ")
    print "On to page next page."

Now I have a txt document with a simple list of artists and pictures of their work. I actually forgot to put the extra “[ ” and “]”, but whatever you get the gist of it. Hooray! 

    • #python
  • 8 months ago
  • 1
  • Permalink
  • Share
    Tweet

1 Notes/ Hide

  1. scotterc reblogged this from alexkehayias
  2. alexkehayias posted this
← Previous • Next →

About

Blogging about hacking code, life, and music.

Me, Elsewhere

  • @alexkehayias on Twitter
  • Facebook Profile
  • Linkedin Profile

Twitter

loading tweets…

I Dig These Posts

  • Post via jordanorelli
    hexes

    image

    source

    Post via jordanorelli
  • Photoset via ruineshumaines

    Katharina Grosse

    Photoset via ruineshumaines
  • Photoset via myedol

    Imeüble by Bjørn Jørund Blikstad

    Created out of plastic this interesting shelving system really messes with your mind. Simple to assemble and...

    Photoset via myedol
  • Photo via tr3ats

    There’s a sense of satisfaction when using a pen ‘til it runs out of ink. It doesn’t happen often, but when it does, it feels like the arrival of a...

    Photo via tr3ats
See more →
  • RSS
  • Random
  • Archive
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr