Use Employee Information.csv to do the following:

  1. Print out the first 5 rows to understand what this file contains.

  2. Sort the rows by the Last Name , alphabetically. What is the first “Last Name” recorded?

  3. Calculate the ratio of women to men, to 2 decimal places. ratio = (no. of women) / (no. of men)

  4. Calculate the fraction of women to men. Fraction = (no. of women) / (no. of men)

  5. Calculate the mean Salary of female participants, and male participants separately. Which gender makes more?

  6. Calculate the number of employees from each country, and put this information into a dictionary: {“country1”: number}. What are the top three countries of origin with the greatest number of employees?

  7. Using string interpolation, print out the following sentences using the information you found above.

FINAL PROJECT 1 Our survey revealed that _ make _ dollars more than . Most employees originated from _, __, and __.

Source: Singapore University of Technology and Design (SUTD)

In [1]:
import pandas as pd #Dataset lib
import numpy as np
In [2]:
dataFrame = pd.read_csv("Day 3\Resources\Employee Information.csv")
dataFrame #Original Data
Out[2]:
First Name Last Name Email Gender Nationality Salary Age
0 Cinderella Bosch cbosch0@blinklist.com Female China 14028 26
1 Maud Kix mkix1@e-recht24.de Female Mongolia 13796 28
2 Prudence Ghidini pghidini2@so-net.ne.jp Female Uzbekistan 18025 37
3 Catlaina Dumpleton cdumpleton3@ox.ac.uk Female Chile 14646 34
4 Rhoda Corwood rcorwood4@sakura.ne.jp Female Comoros 7437 31
5 Henry Montgomery hmontgomery5@php.net Male Argentina 6534 34
6 Carrie Gane cgane6@earthlink.net Female Nepal 8057 39
7 Fawnia Larsen flarsen7@netscape.com Female Malaysia 4881 34
8 Alis Howel ahowel8@typepad.com Female Colombia 19369 42
9 Marianne Woolfitt mwoolfitt9@earthlink.net Female Philippines 10313 36
10 Brenna Marchand bmarchanda@usnews.com Female Japan 10555 54
11 Alf Savaage asavaageb@mediafire.com Male Luxembourg 16786 36
12 Maurise Guillart mguillartc@archive.org Male Indonesia 19706 56
13 Demott Gumm dgummd@economist.com Male Czech Republic 10235 49
14 Dael Bahia dbahiae@blog.com Female Brazil 3921 50
15 Camila Lilleyman clilleymanf@accuweather.com Female Philippines 6692 34
16 Lanny Hartzog lhartzogg@marriott.com Male China 6252 40
17 Ag Tingle atingleh@dell.com Female China 8308 45
18 Roberto Bolte rboltei@yellowbook.com Male Argentina 16249 47
19 Horst Castellucci hcastelluccij@ycombinator.com Male United States 3300 25
20 Denis Phillimore dphillimorek@theglobeandmail.com Male Brazil 2835 60
21 Mariellen Bispham mbisphaml@webnode.com Female Sweden 14105 42
22 Dory Kittle dkittlem@paypal.com Female Uruguay 9077 40
23 Jessi Lieb jliebn@marriott.com Female South Korea 6695 53
24 Jeniece Gaitung jgaitungo@huffingtonpost.com Female Thailand 6320 31
25 Abbie Weathey aweatheyp@eventbrite.com Female Indonesia 13646 34
26 Sibel Dana sdanaq@intel.com Female Indonesia 5574 48
27 Hank Erswell herswellr@mlb.com Male Russia 4183 30
28 Gwendolen Arendsen garendsens@barnesandnoble.com Female Indonesia 11472 58
29 Bette-ann Riply briplyt@mtv.com Female China 10976 50
... ... ... ... ... ... ... ...
421 Lauralee Ancketill lancketillbp@liveinternet.ru Female Japan 13860 31
422 Chantal Gilfether cgilfetherbq@bandcamp.com Female Colombia 11567 33
423 Mendel Brookson mbrooksonbr@ovh.net Male Central African Republic 14157 27
424 Ransom Furst rfurstbs@yahoo.co.jp Male Poland 8519 25
425 Bellanca Mowen bmowenbt@shinystat.com Female Indonesia 17264 60
426 Morgen Chalkly mchalklybu@bigcartel.com Male Peru 13483 35
427 Mariann Mussared mmussaredbv@com.com Female China 15251 57
428 Sampson Curteis scurteisbw@japanpost.jp Male Mauritius 12849 60
429 Arielle Lauret alauretbx@meetup.com Female Philippines 4109 37
430 Deloria Coaster dcoasterby@rakuten.co.jp Female Canada 7433 56
431 Algernon Sutehall asutehallbz@behance.net Male Poland 2904 29
432 Chiquia Tallboy ctallboyc0@privacy.gov.au Female China 4327 58
433 Joshua Cardoe jcardoec1@gizmodo.com Male Poland 2963 55
434 Nat Montfort nmontfortc2@springer.com Male Philippines 18238 56
435 Val Scolts vscoltsc3@loc.gov Female Russia 15118 48
436 Simon Delwater sdelwaterc4@nbcnews.com Male Portugal 16110 49
437 Romona Evershed revershedc5@bbb.org Female Philippines 16060 37
438 Christy Gyves cgyvesc6@blogs.com Female Russia 17466 52
439 Mitchell Godbolt mgodboltc7@domainmarket.com Male Czech Republic 16614 46
440 Viviene O'Sesnane vosesnanec8@time.com Female China 16449 38
441 Esther Champkin echampkinc9@hostgator.com Female Philippines 18387 42
442 Felix McPeck fmcpeckca@howstuffworks.com Male Indonesia 15565 35
443 Carmina Dally cdallycb@deliciousdays.com Female France 10795 51
444 Maurice Hedon mhedoncc@blog.com Male Monaco 9637 55
445 Erskine Elvin eelvincd@odnoklassniki.ru Male United States 12872 48
446 Collen Inman cinmance@odnoklassniki.ru Female China 12012 29
447 Roman Ralston rralstoncf@home.pl Male Philippines 13501 34
448 Jan Satterthwaite jsatterthwaitecg@seattletimes.com Male Brazil 17983 32
449 Christyna Widdocks cwiddocksch@dedecms.com Female South Africa 8207 36
450 Osbourne Heyworth oheyworthci@example.com Male Mozambique 13593 54

451 rows × 7 columns

(1) Print out the first 5 rows to understand what this file contains.

In [3]:
dataFrame_1st5Rows = dataFrame[1:6]
dataFrame_1st5Rows
Out[3]:
First Name Last Name Email Gender Nationality Salary Age
1 Maud Kix mkix1@e-recht24.de Female Mongolia 13796 28
2 Prudence Ghidini pghidini2@so-net.ne.jp Female Uzbekistan 18025 37
3 Catlaina Dumpleton cdumpleton3@ox.ac.uk Female Chile 14646 34
4 Rhoda Corwood rcorwood4@sakura.ne.jp Female Comoros 7437 31
5 Henry Montgomery hmontgomery5@php.net Male Argentina 6534 34

(2) Sort the rows by the Last Name , alphabetically. What is the first “Last Name” recorded?

In [4]:
df_sortLastName = dataFrame.sort_values(by=['Last Name']).reset_index()
df_sortLastName = df_sortLastName.drop(columns=['index'])
df_sortLastName #Sorted by the 'Last Name' and re-indexed
Out[4]:
First Name Last Name Email Gender Nationality Salary Age
0 Tarra Accomb taccomb1w@businessweek.com Female Philippines 11388 59
1 Avivah Ackenson aackenson7a@behance.net Female Yemen 8521 57
2 Zonnya Ackerman zackerman26@cornell.edu Female China 19528 26
3 Gaven Agneau gagneau99@amazon.de Male Indonesia 10538 46
4 Creight Aliberti caliberti83@about.com Male Philippines 5278 26
5 Tamiko Altoft taltoft5l@geocities.com Female Poland 16254 45
6 Lauralee Ancketill lancketillbp@liveinternet.ru Female Japan 13860 31
7 Avictor Anderbrugge aanderbrugge93@cocolog-nifty.com Male American Samoa 16240 40
8 Nels Andrag nandrag94@economist.com Male France 12499 60
9 Krissie Angrick kangrick1h@ucoz.ru Female China 16179 49
10 Alanah Antognoni aantognoni8b@sohu.com Female China 13483 39
11 Lotty Antoniazzi lantoniazzibm@adobe.com Female France 6597 43
12 Heath Aphale haphale8r@pinterest.com Female Portugal 16561 48
13 Gwendolen Arendsen garendsens@barnesandnoble.com Female Indonesia 11472 58
14 Phillida Arnault parnault5c@jigsy.com Female Russia 12156 50
15 King Asch kasch32@sogou.com Male Thailand 19626 53
16 Chaim Asee casee7g@nasa.gov Male Tanzania 4135 60
17 Gerda Aspole gaspole1n@shareasale.com Female Croatia 6784 57
18 Elonore Asprey easpreyaw@baidu.com Female Sweden 12727 59
19 Fair Astley fastleyb9@w3.org Male Spain 8272 40
20 Ancell Attryde aattrydeb1@boston.com Male Afghanistan 17256 60
21 Dael Bahia dbahiae@blog.com Female Brazil 3921 50
22 Carmina Ballaam cballaam4e@reddit.com Female Luxembourg 16538 33
23 Donall Balshaw dbalshaw4g@apache.org Male Costa Rica 18909 56
24 Bucky Bangiard bbangiard2n@1688.com Male Russia 17576 51
25 Peggy Bartosek pbartosekai@51.la Female Comoros 7794 55
26 Denys Bavidge dbavidge5e@china.com.cn Male China 8090 59
27 Janeczka Bednell jbednell4c@ucsd.edu Female Ukraine 9059 42
28 Gayelord Belin gbelin2r@cnn.com Male North Korea 12566 36
29 Ynez Bellenie ybellenie1j@hubpages.com Female China 11905 41
... ... ... ... ... ... ... ...
421 Tan Varrow tvarrow8x@ovh.net Male Germany 9272 41
422 Lanny Venney lvenney68@list-manage.com Male Russia 3100 50
423 Dex Vernay dvernayb6@uol.com.br Male Sweden 8238 34
424 Dicky Vicioso dvicioso96@elpais.com Male Tunisia 3897 47
425 Darcie Vondra dvondra56@fema.gov Female Czech Republic 14112 49
426 Salli Vowels svowels9x@freewebs.com Female United States 19264 52
427 Justinian Wanell jwanell21@deviantart.com Male Ukraine 13273 52
428 Susanna Warton swartonal@rakuten.co.jp Female Philippines 7080 38
429 Roosevelt Waterland rwaterlandu@drupal.org Male Costa Rica 10468 40
430 Brear Watsham bwatsham4m@biblegateway.com Female China 12141 52
431 Peter Weaben pweaben6l@t-online.de Male Mauritius 19646 57
432 Abbie Weathey aweatheyp@eventbrite.com Female Indonesia 13646 34
433 Carlita Weight cweight6y@google.com.br Female Indonesia 10226 34
434 Donnamarie Welbelove dwelbelove2s@marketwatch.com Female Armenia 18333 37
435 Sampson Whitby swhitby58@vistaprint.com Male China 15255 48
436 Christyna Widdocks cwiddocksch@dedecms.com Female South Africa 8207 36
437 Elianora Wilby ewilby2p@google.ru Female China 18498 44
438 Alli Wilmott awilmott7n@blog.com Female Colombia 18079 54
439 Papagena Winslett pwinslett36@woothemes.com Female South Korea 14587 42
440 Daryn Winterson dwinterson87@dyndns.org Female Azerbaijan 9604 32
441 Korella Wittey kwittey40@disqus.com Female Venezuela 11072 43
442 Marianne Woolfitt mwoolfitt9@earthlink.net Female Philippines 10313 36
443 Rex Woolmington rwoolmington9a@lulu.com Male Portugal 7966 45
444 Falkner Worthy fworthy6f@ow.ly Male Philippines 4790 44
445 Karoly Wrotchford kwrotchford6j@tripod.com Female China 2994 41
446 Abbey Wyman awyman9j@ezinearticles.com Male Philippines 9990 28
447 Electra Youle eyoule8l@slideshare.net Female Greece 6463 43
448 Marga Zelner mzelnerat@merriam-webster.com Female Philippines 19110 54
449 Denny Zeplin dzeplin6d@adobe.com Male Panama 17678 49
450 Adolpho todor atodor59@symantec.com Male Czech Republic 15729 58

451 rows × 7 columns

(3) Calculate the ratio of women to men, to 2 decimal places. ratio = (no. of women) / (no. of men)

In [5]:
sumG = dataFrame.groupby('Gender').size()
sumG
Out[5]:
Gender
Female    238
Male      213
dtype: int64
In [6]:
F = sumG.iloc[0]
M = sumG.iloc[1]
R = round(F/M,2)
print("Ratio of Women to Men:", R) #Answer to Question 3
Ratio of Women to Men: 1.12

(4) Calculate the fraction of women to men. Fraction = (no. of women) / (no. of men)

In [7]:
from fractions import Fraction
f_WM = Fraction(R).limit_denominator(5)
print("Fraction of Women to Men:",f_WM ) # Ans to Q4
Fraction of Women to Men: 6/5

(5) Calculate the mean Salary of female participants, and male participants separately. Which gender makes more?

In [8]:
# Average Salaries Female and Male participants tabled
AvgSalary = dataFrame.groupby('Gender').mean()
AvgSalary = AvgSalary.drop(columns=['Age']).astype(int)
AvgSalary
Out[8]:
Salary
Gender
Female 11798
Male 11539
In [9]:
# Re-set the variables for last question
F_name = AvgSalary.iloc[0].name
M_name = AvgSalary.iloc[1].name
In [10]:
AvgSalary_F = AvgSalary.iloc[0]['Salary']
AvgSalary_M = AvgSalary.iloc[1]['Salary']
print("The mean salary of female participants is", AvgSalary_F, "while the male participants earned an average salary of", AvgSalary_M, "."  )
def WhoEarnsMore():
    if AvgSalary_F > AvgSalary_M:
        print("Female Participants earn more at", AvgSalary_F,"." )
    else:
        print("Male Participants earn more at", AvgSalary_M,"." )
WhoEarnsMore()
#Ans to Q5
The mean salary of female participants is 11798 while the male participants earned an average salary of 11539 .
Female Participants earn more at 11798 .

(6) Calculate the number of employees from each country, and put this information into a dictionary: {“country1”: number}. What are the top three countries of origin with the greatest number of employees?

In [11]:
# One way to do calculate the number of employees from each country
AggEmployee = dataFrame.groupby('Nationality').size()
AggEmployee #Number of Employees from each Country
Out[11]:
Nationality
Afghanistan                  3
American Samoa               1
Argentina                    4
Armenia                      1
Azerbaijan                   1
Bangladesh                   3
Belarus                      1
Bosnia and Herzegovina       3
Brazil                      16
Bulgaria                     1
Cameroon                     2
Canada                       4
Central African Republic     1
Chad                         1
Chile                        1
China                       76
Colombia                    10
Comoros                      3
Costa Rica                   2
Croatia                      1
Cuba                         4
Czech Republic               8
Dominican Republic           1
Ecuador                      3
Egypt                        1
Finland                      2
France                       7
Gabon                        1
Germany                      3
Greece                       4
                            ..
Palestinian Territory        1
Panama                       1
Peru                         4
Philippines                 28
Poland                      22
Portugal                    16
Russia                      31
Serbia                       1
Seychelles                   1
Slovenia                     2
South Africa                 3
South Korea                  4
Spain                        2
Sweden                      12
Syria                        1
Tajikistan                   1
Tanzania                     3
Thailand                     5
Togo                         1
Tunisia                      1
Ukraine                     11
United Arab Emirates         1
United Kingdom               1
United States               10
Uruguay                      1
Uzbekistan                   2
Venezuela                    5
Vietnam                      5
Yemen                        4
Zimbabwe                     1
Length: 93, dtype: int64
In [12]:
# Another way to do calculate the number of employees from each country
AvgEmployee_2 = dataFrame.groupby('Nationality').count().drop(columns=["Last Name","Email","Gender","Salary","Age" ])
AvgEmployee_2 = AvgEmployee_2.rename(columns={'First Name':'No. of Emoloyee'}) 
AvgEmployee_2 #Part Ans to Q6: Calculate the number of employees from each country
Out[12]:
No. of Emoloyee
Nationality
Afghanistan 3
American Samoa 1
Argentina 4
Armenia 1
Azerbaijan 1
Bangladesh 3
Belarus 1
Bosnia and Herzegovina 3
Brazil 16
Bulgaria 1
Cameroon 2
Canada 4
Central African Republic 1
Chad 1
Chile 1
China 76
Colombia 10
Comoros 3
Costa Rica 2
Croatia 1
Cuba 4
Czech Republic 8
Dominican Republic 1
Ecuador 3
Egypt 1
Finland 2
France 7
Gabon 1
Germany 3
Greece 4
... ...
Palestinian Territory 1
Panama 1
Peru 4
Philippines 28
Poland 22
Portugal 16
Russia 31
Serbia 1
Seychelles 1
Slovenia 2
South Africa 3
South Korea 4
Spain 2
Sweden 12
Syria 1
Tajikistan 1
Tanzania 3
Thailand 5
Togo 1
Tunisia 1
Ukraine 11
United Arab Emirates 1
United Kingdom 1
United States 10
Uruguay 1
Uzbekistan 2
Venezuela 5
Vietnam 5
Yemen 4
Zimbabwe 1

93 rows × 1 columns

In [13]:
# Convert Dataframe to Dictionary
AvgEmployee_3 = AvgEmployee_2
AvgEmployee_3.to_dict() #Part Ans to Q6: ...put this information into a dictionary: {“country1”: number}
Out[13]:
{'No. of Emoloyee': {'Afghanistan': 3,
  'American Samoa': 1,
  'Argentina': 4,
  'Armenia': 1,
  'Azerbaijan': 1,
  'Bangladesh': 3,
  'Belarus': 1,
  'Bosnia and Herzegovina': 3,
  'Brazil': 16,
  'Bulgaria': 1,
  'Cameroon': 2,
  'Canada': 4,
  'Central African Republic': 1,
  'Chad': 1,
  'Chile': 1,
  'China': 76,
  'Colombia': 10,
  'Comoros': 3,
  'Costa Rica': 2,
  'Croatia': 1,
  'Cuba': 4,
  'Czech Republic': 8,
  'Dominican Republic': 1,
  'Ecuador': 3,
  'Egypt': 1,
  'Finland': 2,
  'France': 7,
  'Gabon': 1,
  'Germany': 3,
  'Greece': 4,
  'Guatemala': 1,
  'Haiti': 1,
  'Honduras': 3,
  'Hungary': 2,
  'Indonesia': 46,
  'Iran': 2,
  'Ireland': 1,
  'Italy': 1,
  'Ivory Coast': 1,
  'Japan': 6,
  'Jordan': 1,
  'Kazakhstan': 1,
  'Kyrgyzstan': 1,
  'Latvia': 1,
  'Liberia': 1,
  'Luxembourg': 2,
  'Malaysia': 4,
  'Mali': 1,
  'Mauritius': 2,
  'Mexico': 2,
  'Monaco': 1,
  'Mongolia': 3,
  'Morocco': 3,
  'Mozambique': 1,
  'Myanmar': 2,
  'Nepal': 2,
  'Netherlands': 1,
  'New Caledonia': 1,
  'Nicaragua': 1,
  'Nigeria': 1,
  'North Korea': 1,
  'Norway': 2,
  'Pakistan': 2,
  'Palestinian Territory': 1,
  'Panama': 1,
  'Peru': 4,
  'Philippines': 28,
  'Poland': 22,
  'Portugal': 16,
  'Russia': 31,
  'Serbia': 1,
  'Seychelles': 1,
  'Slovenia': 2,
  'South Africa': 3,
  'South Korea': 4,
  'Spain': 2,
  'Sweden': 12,
  'Syria': 1,
  'Tajikistan': 1,
  'Tanzania': 3,
  'Thailand': 5,
  'Togo': 1,
  'Tunisia': 1,
  'Ukraine': 11,
  'United Arab Emirates': 1,
  'United Kingdom': 1,
  'United States': 10,
  'Uruguay': 1,
  'Uzbekistan': 2,
  'Venezuela': 5,
  'Vietnam': 5,
  'Yemen': 4,
  'Zimbabwe': 1}}
In [14]:
#Ranking
CountriesEmployeesRank = AvgEmployee_2.sort_values(by=['No. of Emoloyee'], ascending = False)
CountriesEmployeesRank = CountriesEmployeesRank.head(3)
CountriesEmployeesRank # Ans: Top 3 Countries ranked by the number of employees
Out[14]:
No. of Emoloyee
Nationality
China 76
Indonesia 46
Russia 31
In [15]:
Top1 = CountriesEmployeesRank.iloc[0].name
Top2 = CountriesEmployeesRank.iloc[1].name
Top3 = CountriesEmployeesRank.iloc[2].name
print("Top 3 Countries in ascending order:",Top1,",",Top2,",",Top3 ) #For presentation
Top 3 Countries in ascending order: China , Indonesia , Russia

(7) Using string interpolation, print out the following sentences using the information you found above.

FINAL PROJECT 1 Our survey revealed that make __ dollars more than . Most employees originated from _, , and .

In [16]:
def SurveyResult():
    if F > M:
        print("Our survey revealed that",F_name,"participants make $",str(AvgSalary_F-AvgSalary_M),"dollars more than",M_name,"participants.")
    else:
        print("Our survey revealed that",M_name,"participants make $",str(AvgSalary_M-AvgSalary_F),"dollars more than",F_name,"participants.")
    
    print("Most employees originated from", Top1,",",Top2,"and",Top3)   

print(SurveyResult())
Our survey revealed that Female participants make $ 259 dollars more than Male participants.
Most employees originated from China , Indonesia and Russia
None