- Presentation du sujet et contexte de l’etude.
- Demarche utilisee. 2)1) 2)2)
- Limite des modeles et du pouvoir prédictif.
- Difficultes.
- Ce qui pourrait être améliorer à l’avenir.
2024-04-03
Notre démarche est consitutée de deux étapes :
## Warning: package 'reticulate' was built under R version 4.1.2
## Warning in system2(command = python, args = shQuote(script), stdout = ## TRUE, : running command ''/Users/polo11/Documents/GitHub/Premierleague/ ## Automatisation.ipynb' '/Library/Frameworks/R.framework/Versions/4.1/Resources/ ## library/reticulate/config/config.py' 2>/dev/null' had status 126
## Error in python_config_impl(python): Error 126 occurred running /Users/polo11/Documents/GitHub/Premierleague/Automatisation.ipynb:
#Example of URL that could be pasted.
urlpage_4 = 'https://www.skysports.com/premier-league-table/2023'
# The objective of the function get_page is to extract and return HTML elements corresponding to a
# specified tag and a specific class from a given web page.
def get_page(urlpage_4,element,html_class):
req_5 = urllib3.PoolManager()
res_5 = req_5.request('GET', urlpage_4)
row_html_5 = BeautifulSoup(res_5.data, 'html.parser')
PL19 = row_html_5.find_all(element ,
class_= html_class)
return(PL19)
PL19 = str(get_page(urlpage_4, 'tr', 'row-body'))
list_team_20 = re.findall('<span class="team-name">(.*?)</span>', str(PL19))
def extract_team_20_stats(PL19, team):
team= team.title()
teams = re.findall('<span class="team-name">(.*?)</span>',
str(PL19))
position= (list_team_20.index(team)+1)
start = PL19.find(team)
end = PL19.index("</tr>", start)
team_data_20 = PL19[start:end]
match_played= 38
data = [int(s) for s in re.findall(r'<td.*?>(\d+)</td>', team_data_20)]
points= data[0]
wins= data [1]
drawns= data [2]
loses =data [3]
goals_for = data [4]
goals_against = data [5]
team_stats20 = {'match_played': match_played,
'position': position,'points': points,
'wins': wins,'loses': loses ,
'drawns': drawns,'goals_for': goals_for,
'goals_against':goals_against
}
return team_stats20
team_stats_20 = {}
for team in list_team_20:
team_stats = extract_team_20_stats(PL19, team)
team_stats_df = pd.DataFrame(team_stats, index=[0])
team_stats_df['team'] = team
team_stats_df['year'] = 2020
team_stats_20[team] = team_stats_df
## Error in team_stats_20.replace({: could not find function "team_stats_20.replace"