Intro
Start by importing pandas into the python enviornment
Click on the next 'cell' and hit undefined+undefined to execute the code within.
Now import the excel document using the library you just imported and sneak a peak at its contents.
You will need to drag and drop the file into the virtual directory (left hand side -> folder icon). This drag and drop feature is only permitted once a (any) code cell has been ran.
Refuge | Region | Comment | |
---|---|---|---|
0 | Bald Knob National Wildlife Refuge | IR1 | Condition of the road system from when it was ... |
1 | Bald Knob National Wildlife Refuge | IR1 | One complaint â can't ride a 4-wheeler in to f... |
2 | Bald Knob National Wildlife Refuge | IR1 | Not very many handicap trails. |
3 | Bald Knob National Wildlife Refuge | IR1 | Would like to see more area cleared on the roa... |
4 | Bald Knob National Wildlife Refuge | IR1 | Many of the information signs, especially the ... |
Heres a whole bunch of things to import all at once.
If comments aren't in-lined they may be explained later or irrelevant
How many records, & columns were in that dataset, again?
(2286, 3)Oh. Right...
I guess we only really need the Refuge/Comment Pairs
Comment | Refuge | |
---|---|---|
0 | Condition of the road system from when it was ... | Bald Knob National Wildlife Refuge |
1 | One complaint â can't ride a 4-wheeler in to f... | Bald Knob National Wildlife Refuge |
2 | Not very many handicap trails. | Bald Knob National Wildlife Refuge |
3 | Would like to see more area cleared on the roa... | Bald Knob National Wildlife Refuge |
4 | Many of the information signs, especially the ... | Bald Knob National Wildlife Refuge |
Nice! What Refuges are there?
Word Analysis
Lets take those comments and look into them a bit. Start by importing whats needed for this section.
[nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data] Unzipping corpora/stopwords.zip.This next block will clean up the data. descriptions are given above most lines of code.
Now that we have our clean text. lets see those word counts!
words | count | |
---|---|---|
0 | refuge | 684 |
1 | road | 551 |
2 | parking | 399 |
3 | would | 383 |
4 | roads | 354 |
Aweomse. Lets save it to a csv
And plot these word counts horizontally as well.
Simple Word Cloud
We found this function online. Missing attribution.
It will draw a wordcloud for you if you give it data and color specifics.
This next function will display our wordcloud
Lets make two.
Start by Downloading the Fish and Wildlife logo
This function will list content in your current directory. You should see the jpg here. Take note.
8170-clarified-interpretation-could-change-u-s-fish-and-wildlife-policy.jpg NaomiKeywordCount.csv [0m[01;34msample_data[0m/ xport3.xlsxNow 'Open' that image into a variable. then convert that picture in an array of array of numbers where each array in the array represents a RGB pixel of the picture, and each number in the sub array is the R G and B value
You'll now need to retrieve the 'poppins' font face from google (if thats the font face you want to use. Be sure to upload it the same way you did the excel sheet.
If you uploaded the raw file, you can upzip it using these two terminal commands
Then store the font face in a variable as well
Very Pretty. Now lets save em.
Bigram!
And this bit gets the most common bigrams
Lets take a peak
bigram | count | |
---|---|---|
0 | (t, h) | 5052 |
1 | (h, e) | 4899 |
2 | (r, e) | 4216 |
3 | (i, n) | 4033 |
4 | (e, r) | 3537 |
... | ... | ... |
1208 | ((, C) | 1 |
1209 | (U, -) | 1 |
1210 | (9, 9) | 1 |
1211 | (i, u) | 1 |
1212 | (', n) | 1 |
1213 rows à 2 columns
Network Graph
We can save those bigrams as a csv, but also it'd be very nice to see how each of these bigrams relate to eachother. We didnt actually get to finish this so it'd be best to skip over this entire section.
NLP Sentiment Analysis
And this is where we can run sentiment analysis
This line will install the library
This one will import it and create a utility variable that we can use later.
This is the function that will create the predictions and return any label with a greater than 50% probability of being applicable to the comment.
We call the function here and store the results into a csv for each Refuge.
['Road', 'Trail', 'Parking', 'Boat', 'Access', 'Sign', 'Safety', 'Maintain'] 0 1 2And thats the end! Everything below is scrap note and tests.
Split pos neg, refuge, theme
themes: We have a list but we need to flesh it out. We can do this by creating a wordcloud of the comments.
Pos Neg
Create General/Pos/Neg WordCloud
Group by Theme
Word analysis (Unique Counts/ NGrams Counts)
Group by Refuge - Find Theme