undefined

Intro

Start by importing pandas into the python enviornment

Click on the next 'cell' and hit

Shift

Enter

to execute the code within.

import xlsx

Now import the excel document using the library you just imported and sneak a peak at its contents.

You will need to drag and drop the file into the virtual directory (left hand side -> folder icon). This drag and drop feature is only permitted once a (any) code cell has been ran.

	Refuge	Region	Comment
0	Bald Knob National Wildlife Refuge	IR1	Condition of the road system from when it was ...
1	Bald Knob National Wildlife Refuge	IR1	One complaint – can't ride a 4-wheeler in to f...
2	Bald Knob National Wildlife Refuge	IR1	Not very many handicap trails.
3	Bald Knob National Wildlife Refuge	IR1	Would like to see more area cleared on the roa...
4	Bald Knob National Wildlife Refuge	IR1	Many of the information signs, especially the ...

Heres a whole bunch of things to import all at once.

If comments aren't in-lined they may be explained later or irrelevant

How many records, & columns were in that dataset, again?

(2286, 3)

Oh. Right...

I guess we only really need the Refuge/Comment Pairs

	Comment	Refuge
0	Condition of the road system from when it was ...	Bald Knob National Wildlife Refuge
1	One complaint – can't ride a 4-wheeler in to f...	Bald Knob National Wildlife Refuge
2	Not very many handicap trails.	Bald Knob National Wildlife Refuge
3	Would like to see more area cleared on the roa...	Bald Knob National Wildlife Refuge
4	Many of the information signs, especially the ...	Bald Knob National Wildlife Refuge

Nice! What Refuges are there?

Word Analysis

Lets take those comments and look into them a bit. Start by importing whats needed for this section.

[nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data] Unzipping corpora/stopwords.zip.

This next block will clean up the data. descriptions are given above most lines of code.

Now that we have our clean text. lets see those word counts!

	words	count
0	refuge	684
1	road	551
2	parking	399
3	would	383
4	roads	354

Aweomse. Lets save it to a csv

And plot these word counts horizontally as well.

Simple Word Cloud

We found this function online. Missing attribution.

It will draw a wordcloud for you if you give it data and color specifics.

This next function will display our wordcloud

Lets make two.

Start by Downloading the Fish and Wildlife logo

This function will list content in your current directory. You should see the jpg here. Take note.

8170-clarified-interpretation-could-change-u-s-fish-and-wildlife-policy.jpg NaomiKeywordCount.csv [0m[01;34msample_data[0m/ xport3.xlsx

Now 'Open' that image into a variable. then convert that picture in an array of array of numbers where each array in the array represents a RGB pixel of the picture, and each number in the sub array is the R G and B value

You'll now need to retrieve the 'poppins' font face from google (if thats the font face you want to use. Be sure to upload it the same way you did the excel sheet.

If you uploaded the raw file, you can upzip it using these two terminal commands

Then store the font face in a variable as well

Very Pretty. Now lets save em.

Bigram!

And this bit gets the most common bigrams

Lets take a peak

	bigram	count
0	(t, h)	5052
1	(h, e)	4899
2	(r, e)	4216
3	(i, n)	4033
4	(e, r)	3537
...	...	...
1208	((, C)	1
1209	(U, -)	1
1210	(9, 9)	1
1211	(i, u)	1
1212	(', n)	1

1213 rows × 2 columns

Network Graph

We can save those bigrams as a csv, but also it'd be very nice to see how each of these bigrams relate to eachother. We didnt actually get to finish this so it'd be best to skip over this entire section.

NLP Sentiment Analysis

And this is where we can run sentiment analysis

This line will install the library

This one will import it and create a utility variable that we can use later.

This is the function that will create the predictions and return any label with a greater than 50% probability of being applicable to the comment.

We call the function here and store the results into a csv for each Refuge.

['Road', 'Trail', 'Parking', 'Boat', 'Access', 'Sign', 'Safety', 'Maintain'] 0 1 2

And thats the end! Everything below is scrap note and tests.

Split pos neg, refuge, theme
themes: We have a list but we need to flesh it out. We can do this by creating a wordcloud of the comments.

Pos Neg

Create General/Pos/Neg WordCloud

Group by Theme

Word analysis (Unique Counts/ NGrams Counts)

Group by Refuge - Find Theme

Labs

Charles Karpati |