Digital Cut-Ups and Data Structures in Python

Feb 28, 2019 01:30 · 385 words · 2 minute read rwet

Link to Jupyter Notebook.

The prompt for this week’s assignment was to create a program/notebook that reads in text from two or more sources, stores this text in data structures with whatever creative manipulations we choose, and programatically arranges these structures into something resembling a poem.

The two sources I chose were various sections from the Wikipedia entry for the “Emu” and “Composition as Explanation,” an essay of sorts by Gertrud Stein.

I read in the Wikipedia sections using the wikipedia module. For the Stein, I used BeautifulSoup’s html parser in tandem with the requests library’s fetching functionality.

The wikipedia textual matter underwent a decent amount of transformation. I used regex to split three sections (History, Behavior and ecology, and Diet) along various delimiters. Each of the sections then underwent several unique transformations, from removal of all sentences containing digits to removal of all words shorter than 10 characters, and filtering of stopwords and duplicates. In the process of the above, I used list comprehensions to fit everything into lists.

I ran the Stein essay through a generator function which helped me create a list of randomly sized sentence fragments (between 4 and 14 words). No surprisingly, the output of this generator function was pretty excellent on its own.

After cleaning some things up, I printed out random selections from each of the above mentioned lists, looping these print statements six times.

Plenty of room for improvement, but all in all, I’m a lot more pleased with this output than I was with that of my last attempt.

Here’s an example:


poem