#OpenIntro Data
source("http://www.openintro.org/stat/data/present.R")
source("http://www.openintro.org/stat/data/arbuthnot.R")
# 1. What years are included in this data set? What are the dimensions of the data frame and what are the variable or column names?
# shows the years included
present$year
## [1] 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954
## [16] 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
## [31] 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984
## [46] 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
## [61] 2000 2001 2002
# range is 1940-2002
#gives use the dimensions
dim(present)
## [1] 63 3
#shows the names of the columns
names(present)
## [1] "year" "boys" "girls"
# 2. How do these counts compare to Arbuthnotâs? Are they on a similar scale?
range(arbuthnot$boys+arbuthnot$girls)
## [1] 5612 16145
range(present$boys+present$girls)
## [1] 2360399 4268326
# Arbuthot has more observations and the are not on a similar scale becasuse the amount of children is off by a factor of 10^3
# 3. Make a plot that displays the boy-to-girl ratio for every year in the data set. What do you see? Does Arbuthnotâs observation about boys being born in greater proportion than girls hold up in the U.S.? Include the plot in your response.
plot(arbuthnot$year,arbuthnot$boys/arbuthnot$girls, type = "l",
main = "Ratio of Boys to Girls for every year ARBUTHNOT",
xlab="Year",
ylab = "Ratio of Boys to Girls born")

plot(present$year,present$boys/present$girls, type = "l",
main = "Ratio of Boys to Girls for every year PRESENT",
xlab="Year",
ylab = "Ratio of Boys to Girls born")

# The ratio for Boys to Girls stays above 1 in both which means that the observation about boys being born in a greater proportion is correct. The only difference is that in the present graph we se the ratio decreaseng where in the arbuthnot graph it jumps around the same point.
# 4. In what year did we see the most total number of births in the U.S.? You can refer to the help files or the R reference card http://cran.r-project.org/doc/contrib/Short-refcard.pdf to find helpful commands.
# will find the greatest nomber of boys in one year
which.max(present$boys)
## [1] 22
# same but for girls
which.max(present$girls)
## [1] 22
# from thiswe conclude that the year with most total number of births in the U.S. is 1961
#2 Milestone Data
# 1. Load your dataset
POKEMON = read.csv(file="C:\\Users\\C21Zhivko.Kolevski.m\\Desktop\\lab1\\Pokemon.csv")
# 2.Use the dim command to show how many rows and columns your dataset contains.
dim(POKEMON)
## [1] 800 13
# 3.How many observations does your dataset contain?
# 800 observations
# 4.What does each observation in your dataset represent?
# It represent a different pokemon
# 5.How many variables does your dataset contain?
# 13 Variables
# 6. What does each variable in your dataset represent?
# they represent different stats about every pokemon
# Documentation: I had trouble with using the read.csv function, and C3C Leonard told me to use double \ to find the directory of the file.