What European Soccer Team Attributes Lead To More Wins?¶

Table of Contents¶

Introduction
Data Wrangling
Exploratory Data Analysis
Conclusions

Introduction¶

In this project, I will use the Kaggle "European Soccer Database" which can be found at https://www.kaggle.com/hugomathien/soccer. The "European Soccer Database" consists on 7 tables: Country, League, Match, Player, Player_Attributes, Team and Team_Attributes.

The Match table contains data over 8 soccer seasons and stores: the date a match was played, the teams that played the match, the number of goals each team made, the players in the match and other match statistics.

The Player table contains players name and vital statistics such as weight, height and birthdate.

The Player_Attributes table contains the players statistics such as their overall rating, potential, prefered foot etc.

The Team_Attributes table records the team's attributes such as build up play, chance creation and defence scores at a given time.

The Team table stores team id and team names that are used for referencing.

The Country table stores the country id and country name used for referencing.

The League tables stores league id, country id and league name used for referencing.

I will be analyzing a dataset derived from the "European Soccer Database". The dataset ecompasses data from the Match, Team, Team_Attributes, Country and League tables. The Dataset was derived usng the SQL query below:

CREATE TABLE team_match_data AS 
WITH team_data AS (
SELECT * 
FROM Team_Attributes ta
JOIN Team t
ON ta.team_api_id = t.team_api_id),


    teamdate_data AS (
SELECT *,
lead(team_api_id) OVER (PARTITION BY team_api_id ORDER BY "date") AS lead_team_api_id,
lead("date",1,"2016-05-25 00:00:00") OVER (PARTITION BY team_api_id ORDER BY "date") AS end_date
FROM team_data)


SELECT m.match_api_id, 
m."date" match_dt,
m.country_id,
c.name country_name,
m.league_id,
l.name league_name,
m.season,
m.home_team_api_id,
m.away_team_api_id,
td.team_api_id,
td.team_long_name,
td.team_short_name,
m.home_team_goal,
m.away_team_goal,
td."date" t_attr_start_dt,
td.end_date t_attr_end_dt,
td.buildUpPlaySpeed bu_play_speed,
td.buildUpPlaySpeedClass bu_play_speed_class,
td.buildUpPlayDribbling bu_play_dribbling,
td.buildUpPlayDribblingClass bu_play_dribbling_class,
td.buildUpPlayPassing bu_play_passing,
td.buildUpPlayPassingClass bu_play_passing_class,
td.buildUpPlayPositioningClass bu_play_positioning_class,
td.chanceCreationPassing cc_passing,
td.chanceCreationPassingClass cc_passing_class,
td.chanceCreationCrossing cc_crossing,
td.chanceCreationCrossingClass cc_crossing_class,
td.chanceCreationShooting cc_shooting,
td.chanceCreationShootingClass cc_shooting_class,
td.chanceCreationPositioningClass cc_positioning_class,
td.defencePressure defence_pressure,
td.defencePressureClass defence_pressure_class,
td.defenceAggression defence_aggression,
td.defenceAggressionClass defence_agression_class,
td.defenceTeamWidth defence_team_width,
td.defenceTeamWidthClass defence_team_width_class,
td.defenceDefenderLineClass defence_line_class
FROM Match m
JOIN teamdate_data td
ON td.team_api_id = m.home_team_api_id
OR td.team_api_id = m.away_team_api_id
JOIN League l
ON l.id = m.league_id
JOIN Country c
ON c.id = m.country_id
WHERE m."date"  BETWEEN td."date" AND td.end_date
ORDER BY td.team_api_id, td.end_date DESC;

Questions¶

I will be focusing on team goals and how their attributes contributed to these goals.

Which Teams won the most matches?
What team attributes contribute to the most wins?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Data Wrangling¶

General Properties¶

df = pd.read_csv('team_match_data.csv')
df.head()

Imported the team_match_data csv file into a data frame and viewed the first 5 rows

df.shape

(39882, 37)

There are 39,882 rows and 37 columns in the data frame

df.describe()

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39882 entries, 0 to 39881
Data columns (total 37 columns):
match_api_id                 39882 non-null int64
match_dt                     39882 non-null object
country_id                   39882 non-null int64
country_name                 39882 non-null object
league_id                    39882 non-null int64
league_name                  39882 non-null object
season                       39882 non-null object
home_team_api_id             39882 non-null int64
away_team_api_id             39882 non-null int64
team_api_id                  39882 non-null int64
team_long_name               39882 non-null object
team_short_name              39882 non-null object
home_team_goal               39882 non-null int64
away_team_goal               39882 non-null int64
t_attr_start_dt              39882 non-null object
t_attr_end_dt                39882 non-null object
bu_play_speed                39882 non-null int64
bu_play_speed_class          39882 non-null object
bu_play_dribbling            11749 non-null float64
bu_play_dribbling_class      39882 non-null object
bu_play_passing              39882 non-null int64
bu_play_passing_class        39882 non-null object
bu_play_positioning_class    39882 non-null object
cc_passing                   39882 non-null int64
cc_passing_class             39882 non-null object
cc_crossing                  39882 non-null int64
cc_crossing_class            39882 non-null object
cc_shooting                  39882 non-null int64
cc_shooting_class            39882 non-null object
cc_positioning_class         39882 non-null object
defence_pressure             39882 non-null int64
defence_pressure_class       39882 non-null object
defence_aggression           39882 non-null int64
defence_agression_class      39882 non-null object
defence_team_width           39882 non-null int64
defence_team_width_class     39882 non-null object
defence_line_class           39882 non-null object
dtypes: float64(1), int64(16), object(20)
memory usage: 11.3+ MB

The bu_play_dribbling column has null data. There are 27,133 records with null data

Data Cleaning¶

The data contains 39,882 rows and 37 columns.

The data consists of match data, team data, and team attribute data at the time the match took place. For each team attribute respresented with a numerical value, there is a corresponding classifaction.

The match_api_id, country_id, country_name, league_id, league_name and season columns can be dropped since they are not relevant to determine how a team's attributes contribute to their wins.

The data will need to be cleaned, to fill the null values for the bu_play_dribbling column. This will be done by determing with class the null value corresponds to and filled with the mean of that class.

# Drop columns that are not needed
df.drop(['match_api_id','country_id','league_id','country_name','league_name', 'season'], axis=1, inplace = True)

df.head()

df.shape

(39882, 31)

After dropping the columns, there are now 39,882 rows and 31 columns in the data frame

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39882 entries, 0 to 39881
Data columns (total 31 columns):
match_dt                     39882 non-null object
home_team_api_id             39882 non-null int64
away_team_api_id             39882 non-null int64
team_api_id                  39882 non-null int64
team_long_name               39882 non-null object
team_short_name              39882 non-null object
home_team_goal               39882 non-null int64
away_team_goal               39882 non-null int64
t_attr_start_dt              39882 non-null object
t_attr_end_dt                39882 non-null object
bu_play_speed                39882 non-null int64
bu_play_speed_class          39882 non-null object
bu_play_dribbling            11749 non-null float64
bu_play_dribbling_class      39882 non-null object
bu_play_passing              39882 non-null int64
bu_play_passing_class        39882 non-null object
bu_play_positioning_class    39882 non-null object
cc_passing                   39882 non-null int64
cc_passing_class             39882 non-null object
cc_crossing                  39882 non-null int64
cc_crossing_class            39882 non-null object
cc_shooting                  39882 non-null int64
cc_shooting_class            39882 non-null object
cc_positioning_class         39882 non-null object
defence_pressure             39882 non-null int64
defence_pressure_class       39882 non-null object
defence_aggression           39882 non-null int64
defence_agression_class      39882 non-null object
defence_team_width           39882 non-null int64
defence_team_width_class     39882 non-null object
defence_line_class           39882 non-null object
dtypes: float64(1), int64(13), object(17)
memory usage: 9.4+ MB

The bu_play_dribbling column has null data. There are 27,133 records with null data

# Find the bu_play_dribbling clases
df.bu_play_dribbling_class.unique()

array(['Normal', 'Little', 'Lots'], dtype=object)

There are 3 classes for bu_play_dribbling: Normal, Little and Lots

df[df.bu_play_dribbling.isnull()]['bu_play_dribbling_class'].unique()

array(['Little'], dtype=object)

All of the null values from the bu_play_dribbling column belong to the bu_play_dribbling_class Little.

We need to fill those null values with the mean of the Little bu_play_dribbling columns

Function returns a data frame for a specified Build Up Play Dribbling Class

def filter_dribblng_class(df, dribbling_class):
    return df.query('bu_play_dribbling_class == "%s"' % dribbling_class)

Function prints the minimum and maximum of a specified column

def col_minmax_range(col):
    print("({},{})".format(col.min(), col.max()))

df_normal_dribbling_class = filter_dribblng_class(df, "Normal")

Created a data frame for the Normal bu_play_dribbling_class

col_minmax_range(df_normal_dribbling_class.bu_play_dribbling)

(34.0,66.0)

The range of values for the Normal bu_play_dribbling_class is (34.0, 66.0)

df_lots_dribbling_class = filter_dribblng_class(df, "Lots")

Created a data frame for the Lots bu_play_dribbling_class

col_minmax_range(df_lots_dribbling_class.bu_play_dribbling)

(67.0,77.0)

The range of values for the Lots bu_play_dribbling_class is (67.0, 77.0)

df_little_dribbling_class = filter_dribblng_class(df, "Little")

Created a data frame for the Little bu_play_dribbling_class

col_minmax_range(df_little_dribbling_class.bu_play_dribbling)

(24.0,33.0)

The range of values for the Little bu_play_dribbling_class is (24.0, 33.0)

# Find the mean of the Little Build Up Dribbling class df
df_little_dribbling_class['bu_play_dribbling'].mean()

30.44859813084112

The mean of the bu_play_dribbling_class for the Little class is 30.44859813084112

# Fill nan values with the mean of the litle dribbling class since they were all inthe little dribblng class
df['bu_play_dribbling'].fillna(df_little_dribbling_class['bu_play_dribbling'].mean(), inplace=True)

The null values for bu_play_dribbling were filled with the mean of the Little class

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39882 entries, 0 to 39881
Data columns (total 31 columns):
match_dt                     39882 non-null object
home_team_api_id             39882 non-null int64
away_team_api_id             39882 non-null int64
team_api_id                  39882 non-null int64
team_long_name               39882 non-null object
team_short_name              39882 non-null object
home_team_goal               39882 non-null int64
away_team_goal               39882 non-null int64
t_attr_start_dt              39882 non-null object
t_attr_end_dt                39882 non-null object
bu_play_speed                39882 non-null int64
bu_play_speed_class          39882 non-null object
bu_play_dribbling            39882 non-null float64
bu_play_dribbling_class      39882 non-null object
bu_play_passing              39882 non-null int64
bu_play_passing_class        39882 non-null object
bu_play_positioning_class    39882 non-null object
cc_passing                   39882 non-null int64
cc_passing_class             39882 non-null object
cc_crossing                  39882 non-null int64
cc_crossing_class            39882 non-null object
cc_shooting                  39882 non-null int64
cc_shooting_class            39882 non-null object
cc_positioning_class         39882 non-null object
defence_pressure             39882 non-null int64
defence_pressure_class       39882 non-null object
defence_aggression           39882 non-null int64
defence_agression_class      39882 non-null object
defence_team_width           39882 non-null int64
defence_team_width_class     39882 non-null object
defence_line_class           39882 non-null object
dtypes: float64(1), int64(13), object(17)
memory usage: 9.4+ MB

The bu_play_dribbling column has all null values filled

df.duplicated().sum()

0

There are no duplicates in the data frame

df.shape

(39882, 31)

There are still 39,882 rows and 31 columns in the data frame

df.describe()

Exploratory Data Analysis¶

Question 1 : Which Teams won the most matches?¶

# Use this, and more code cells, to explore your data. Don't forget to add
#   Markdown cells to document your observations and findings.

# Get matches that did not result in a Tied Match
df_match_no_draw = df.query('home_team_goal != away_team_goal')

df_match_no_draw.shape

(29758, 31)

There are 29,758 records that consist of matches that resulted in no draw

df_match_no_draw.describe()

#Create a column to identify the winning team
df_match_no_draw['win_team_id'] = np.where(df_match_no_draw['home_team_goal'] > df_match_no_draw['away_team_goal'], df_match_no_draw['home_team_api_id'], df_match_no_draw['away_team_api_id'])

/root/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

df_match_no_draw.head()

# Filter the teams attributes for the team that won
df_win_team = df_match_no_draw.query('team_api_id == win_team_id')

df_win_team.shape

(15010, 32)

There are 15,010 matches that resulted in a win

df_win_team.head()

# Count the number of wins for each team and sort in decending order
df_num_wins=df_win_team.groupby(['team_long_name'])['win_team_id'].count().sort_values(ascending=False)

# Top Ten Soccer Teams
top_10 = df_num_wins.head(10)

top_10

team_long_name
FC Barcelona           189
Real Madrid CF         185
Celtic                 180
FC Bayern Munich       159
Juventus               156
SL Benfica             153
FC Porto               149
Manchester City        148
Paris Saint-Germain    147
Manchester United      146
Name: win_team_id, dtype: int64

The Top 10 European Soccer teams with the most wins are:

FC Barcelona, Real Madrid CF, Celtic, FC Bayern Munich, Juventus, SL Benfica, FC Porto, Manchester City, Paris, Saint-Germain, Manchester United

df_num_wins.idxmax()

'FC Barcelona'

# Plot a bar chart showing the Top 10 Soccer Teams
top_10.plot(kind='bar');
plt.title('Top 10 Winning Soccer Teams')
plt.xlabel('Team Name')
plt.ylabel('Total Wins');

Question 2 : What team attributes contribute to the most wins?¶

# Count the number of wins for an attribute period
df_win_team.groupby(['win_team_id','team_long_name','t_attr_start_dt','t_attr_end_dt'])['win_team_id'].count()

win_team_id  team_long_name           t_attr_start_dt      t_attr_end_dt      
1601         Ruch Chorzów             2010-02-22 00:00:00  2011-02-22 00:00:00    10
                                      2011-02-22 00:00:00  2012-02-22 00:00:00    15
                                      2012-02-22 00:00:00  2013-09-20 00:00:00    16
                                      2013-09-20 00:00:00  2014-09-19 00:00:00    14
                                      2014-09-19 00:00:00  2015-09-10 00:00:00    10
                                      2015-09-10 00:00:00  2016-05-25 00:00:00     8
1773         Oud-Heverlee Leuven      2012-02-22 00:00:00  2013-09-20 00:00:00     9
                                      2014-09-19 00:00:00  2016-05-25 00:00:00     7
1957         Jagiellonia Białystok    2010-02-22 00:00:00  2011-02-22 00:00:00    14
                                      2011-02-22 00:00:00  2012-02-22 00:00:00    11
                                      2012-02-22 00:00:00  2013-09-20 00:00:00    17
                                      2013-09-20 00:00:00  2014-09-19 00:00:00    10
                                      2014-09-19 00:00:00  2015-09-10 00:00:00    13
                                      2015-09-10 00:00:00  2016-05-25 00:00:00     7
2033         S.C. Olhanense           2010-02-22 00:00:00  2011-02-22 00:00:00     8
                                      2011-02-22 00:00:00  2012-02-22 00:00:00     6
                                      2012-02-22 00:00:00  2013-09-20 00:00:00    10
                                      2013-09-20 00:00:00  2014-09-19 00:00:00     5
2182         Lech Poznań              2010-02-22 00:00:00  2011-02-22 00:00:00    15
                                      2011-02-22 00:00:00  2012-02-22 00:00:00    16
                                      2012-02-22 00:00:00  2013-09-20 00:00:00    28
                                      2013-09-20 00:00:00  2014-09-19 00:00:00    15
                                      2014-09-19 00:00:00  2015-09-10 00:00:00    13
                                      2015-09-10 00:00:00  2016-05-25 00:00:00    12
2183         P. Warszawa              2010-02-22 00:00:00  2011-02-22 00:00:00    10
                                      2011-02-22 00:00:00  2012-02-22 00:00:00    17
                                      2012-02-22 00:00:00  2013-09-20 00:00:00    14
2186         Cracovia                 2010-02-22 00:00:00  2011-02-22 00:00:00     6
                                      2011-02-22 00:00:00  2012-02-22 00:00:00     9
                                      2012-02-22 00:00:00  2014-09-19 00:00:00    14
                                                                                  ..
10265        Wisła Kraków             2011-02-22 00:00:00  2012-02-22 00:00:00    17
                                      2012-02-22 00:00:00  2013-09-20 00:00:00    17
                                      2013-09-20 00:00:00  2014-09-19 00:00:00    14
                                      2014-09-19 00:00:00  2015-09-10 00:00:00     8
                                      2015-09-10 00:00:00  2016-05-25 00:00:00     6
10267        Valencia CF              2010-02-22 00:00:00  2011-02-22 00:00:00    23
                                      2011-02-22 00:00:00  2012-02-22 00:00:00    18
                                      2012-02-22 00:00:00  2013-09-20 00:00:00    26
                                      2013-09-20 00:00:00  2014-09-19 00:00:00    14
                                      2014-09-19 00:00:00  2015-09-10 00:00:00    20
                                      2015-09-10 00:00:00  2016-05-25 00:00:00    11
10268        Elche CF                 2013-09-20 00:00:00  2014-09-19 00:00:00    10
                                      2014-09-19 00:00:00  2015-09-10 00:00:00    10
10269        VfB Stuttgart            2010-02-22 00:00:00  2011-02-22 00:00:00    12
                                      2011-02-22 00:00:00  2012-02-22 00:00:00    14
                                      2012-02-22 00:00:00  2013-09-20 00:00:00    22
                                      2013-09-20 00:00:00  2014-09-19 00:00:00     6
                                      2014-09-19 00:00:00  2015-09-10 00:00:00     9
                                      2015-09-10 00:00:00  2016-05-25 00:00:00     9
10278        Hércules Club de Fútbol  2010-02-22 00:00:00  2011-02-22 00:00:00     7
                                      2011-02-22 00:00:00  2012-02-22 00:00:00     2
10281        Real Valladolid          2010-02-22 00:00:00  2011-02-22 00:00:00     4
                                      2012-02-22 00:00:00  2013-09-20 00:00:00    12
                                      2013-09-20 00:00:00  2014-09-19 00:00:00     6
108893       AC Arles-Avignon         2010-02-22 00:00:00  2011-02-22 00:00:00     1
                                      2011-02-22 00:00:00  2012-02-22 00:00:00     2
158085       FC Arouca                2014-09-19 00:00:00  2015-09-10 00:00:00     8
                                      2015-09-10 00:00:00  2016-05-25 00:00:00    11
208931       Carpi                    2015-09-10 00:00:00  2016-05-25 00:00:00     9
274581       Royal Excel Mouscron     2015-09-10 00:00:00  2016-05-25 00:00:00     7
Name: win_team_id, Length: 1146, dtype: int64

This shows the number of wins a team had with during a given team attribute start and end time

df_win_team['wins_by_attribute_period'] = df_win_team.groupby(['win_team_id','t_attr_start_dt','t_attr_end_dt'])['win_team_id'].transform("size")

/root/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

df_win_team.head()

df_win_team.drop(['match_dt','home_team_api_id','away_team_api_id','team_api_id','home_team_goal','away_team_goal'],axis=1, inplace=True)

/root/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py:3697: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)

Remove the columns that are no longer necessary: match_dt, home_team_api_id, away_team_api_id, team_api_id, home_team_goal and away_team_goal

df_win_team.duplicated().sum()

13864

There are 13,864 duplicate records for winning matches.

df_win_team.drop_duplicates(inplace=True)

/root/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

Drop the duplicates from the data frane

df_win_team.duplicated().sum()

0

There are no more duplicates

df_win_team.head()

df_win_team.hist(figsize=(12,10));

Scatterplot of wins against various features¶

df_win_team.plot(x="bu_play_dribbling", y="wins_by_attribute_period", kind="scatter");
plt.title("Relationship Between Team Match Wins and Build Up Play Dribbling")
plt.xlabel("Build Up Play Dribbling")
plt.ylabel("Team Wins")

Text(0, 0.5, 'Team Wins')

There is no correlation between the number of wins and build up play dribbling

df_win_team.plot(x="bu_play_passing", y="wins_by_attribute_period", kind="scatter");
plt.title("Relationship Between Team Match Wins and Build Up Play Passing")
plt.xlabel("Build Up Play Passing")
plt.ylabel("Team Wins")

Text(0, 0.5, 'Team Wins')

There is no correlation between wins and build up play passing

df_win_team.plot(x="cc_passing", y="wins_by_attribute_period", kind="scatter");
plt.title("Relationship Between Team Match Wins and Chance Creation Passing")
plt.xlabel("Chance Creation Passing")
plt.ylabel("Team Wins")

Text(0, 0.5, 'Team Wins')

There is no correlation between the number of wins and chance creation passing

df_win_team.plot(x="cc_crossing", y="wins_by_attribute_period", kind="scatter");
plt.title("Relationship Between Team Match Wins and Chance Creation Crossing")
plt.xlabel("Chance Creation Crossing")
plt.ylabel("Team Wins")

Text(0, 0.5, 'Team Wins')

There is no correlation between the number of wins and chance creation crossing

df_win_team.plot(x="cc_shooting", y="wins_by_attribute_period", kind="scatter");
plt.title("Relationship Between Team Match Wins and Chance Creation Crossing")
plt.xlabel("Chance Creation Shooting")
plt.ylabel("Team Wins")

Text(0, 0.5, 'Team Wins')

There is no correlation between the number of wins and chance creation shooting

df_win_team.plot(x="defence_pressure", y="wins_by_attribute_period", kind="scatter");
plt.title("Relationship Between Team Match Wins and Defence Pressure")
plt.xlabel("Defence Pressure")
plt.ylabel("Team Wins")

Text(0, 0.5, 'Team Wins')

There is no correlation between the number of wins and defence pressure

df_win_team.plot(x="defence_aggression", y="wins_by_attribute_period", kind="scatter");
plt.title("Relationship Between Team Match Wins and Defence Aggression")
plt.xlabel("Defence Aggression")
plt.ylabel("Team Wins")

Text(0, 0.5, 'Team Wins')

There is no correlation between the number of wins and defence agression

df_win_team.plot(x="defence_team_width", y="wins_by_attribute_period", kind="scatter");
plt.title("Relationship Between Team Match Wins and Defence Team Width")
plt.xlabel("Defence Team Width")
plt.ylabel("Team Wins")

Text(0, 0.5, 'Team Wins')

There is no correlation between the number of wins and defence team width

Conclusions¶

Results¶

The Top 10 European Soccer teams with the most wins are: FC Barcelona, Real Madrid CF, Celtic, FC Bayern Munich, Juventus, SL Benfica, FC Porto, Manchester City, Paris, Saint-Germain, Manchester United.
There is no correlation between the number of wins a team has and any of their team attributes.

Limitations¶

More than half of the build up dribbling team attribute had null data that was filled with the mean for its category. This may have made an impact on the correlation to number of team wins.

from subprocess import call
call(['python', '-m', 'nbconvert', 'Investigate_Soccer_Database.ipynb'])

0

	match_api_id	country_id	league_id	home_team_api_id	away_team_api_id	team_api_id	home_team_goal	away_team_goal	bu_play_speed	bu_play_dribbling	bu_play_passing	cc_passing	cc_crossing	cc_shooting	defence_pressure	defence_aggression	defence_team_width
count	3.988200e+04	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	11749.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000
mean	1.360436e+06	11763.067148	11763.067148	9956.776992	9954.216087	9665.847425	1.554787	1.173888	52.412818	49.088433	47.927210	52.411664	54.034627	54.270473	46.466978	49.474625	52.337596
std	4.222426e+05	7520.607243	7520.607243	13767.842735	13738.480474	11287.448154	1.306828	1.147342	11.930429	9.800439	11.081696	10.400321	11.184020	10.440281	10.215524	9.871917	9.435304
min	6.541530e+05	1.000000	1.000000	1601.000000	1601.000000	1601.000000	0.000000	0.000000	20.000000	24.000000	20.000000	21.000000	20.000000	22.000000	23.000000	24.000000	30.000000
25%	1.025090e+06	4769.000000	4769.000000	8472.000000	8472.000000	8481.000000	1.000000	0.000000	45.000000	42.000000	39.000000	47.000000	48.000000	49.000000	39.000000	44.000000	48.000000
50%	1.260174e+06	10257.000000	10257.000000	8697.000000	8697.000000	8721.000000	1.000000	1.000000	52.000000	49.000000	49.000000	52.000000	53.000000	54.000000	46.000000	48.000000	52.000000
75%	1.724266e+06	17642.000000	17642.000000	9925.000000	9925.000000	9925.000000	2.000000	2.000000	62.000000	55.000000	55.000000	60.000000	63.000000	62.000000	53.000000	56.000000	59.000000
max	2.216672e+06	24558.000000	24558.000000	274581.000000	274581.000000	274581.000000	10.000000	9.000000	80.000000	77.000000	80.000000	80.000000	80.000000	80.000000	72.000000	72.000000	73.000000

	home_team_api_id	away_team_api_id	team_api_id	home_team_goal	away_team_goal	bu_play_speed	bu_play_dribbling	bu_play_passing	cc_passing	cc_crossing	cc_shooting	defence_pressure	defence_aggression	defence_team_width
count	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000	39882.000000
mean	9956.776992	9954.216087	9665.847425	1.554787	1.173888	52.412818	35.939783	47.927210	52.411664	54.034627	54.270473	46.466978	49.474625	52.337596
std	13767.842735	13738.480474	11287.448154	1.306828	1.147342	11.930429	10.024826	11.081696	10.400321	11.184020	10.440281	10.215524	9.871917	9.435304
min	1601.000000	1601.000000	1601.000000	0.000000	0.000000	20.000000	24.000000	20.000000	21.000000	20.000000	22.000000	23.000000	24.000000	30.000000
25%	8472.000000	8472.000000	8481.000000	1.000000	0.000000	45.000000	30.448598	39.000000	47.000000	48.000000	49.000000	39.000000	44.000000	48.000000
50%	8697.000000	8697.000000	8721.000000	1.000000	1.000000	52.000000	30.448598	49.000000	52.000000	53.000000	54.000000	46.000000	48.000000	52.000000
75%	9925.000000	9925.000000	9925.000000	2.000000	2.000000	62.000000	38.000000	55.000000	60.000000	63.000000	62.000000	53.000000	56.000000	59.000000
max	274581.000000	274581.000000	274581.000000	10.000000	9.000000	80.000000	77.000000	80.000000	80.000000	80.000000	80.000000	72.000000	72.000000	73.000000

	home_team_api_id	away_team_api_id	team_api_id	home_team_goal	away_team_goal	bu_play_speed	bu_play_dribbling	bu_play_passing	cc_passing	cc_crossing	cc_shooting	defence_pressure	defence_aggression	defence_team_width
count	29758.000000	29758.000000	29758.000000	29758.000000	29758.000000	29758.000000	29758.000000	29758.000000	29758.000000	29758.000000	29758.000000	29758.000000	29758.00000	29758.000000
mean	9961.051415	9899.828685	9626.486491	1.743733	1.233248	52.347402	35.914585	47.896129	52.440117	54.045433	54.308354	46.573190	49.51361	52.355400
std	13757.721340	13420.665962	10930.258097	1.381442	1.229052	11.931015	9.982912	11.071162	10.413259	11.176362	10.476461	10.228084	9.87437	9.414251
min	1601.000000	1601.000000	1601.000000	0.000000	0.000000	20.000000	24.000000	20.000000	21.000000	20.000000	22.000000	23.000000	24.00000	30.000000
25%	8481.000000	8472.000000	8483.000000	1.000000	0.000000	45.000000	30.448598	39.000000	47.000000	48.000000	49.000000	39.000000	44.00000	48.000000
50%	8697.000000	8697.000000	8721.000000	2.000000	1.000000	52.000000	30.448598	49.000000	52.000000	53.000000	54.000000	46.000000	48.00000	52.000000
75%	9925.000000	9925.000000	9925.000000	3.000000	2.000000	62.000000	38.000000	55.000000	60.000000	63.000000	63.000000	53.000000	56.00000	59.000000
max	274581.000000	274581.000000	274581.000000	10.000000	9.000000	80.000000	77.000000	80.000000	80.000000	80.000000	80.000000	72.000000	72.00000	73.000000

	match_api_id	match_dt	country_id	country_name	league_id	league_name	season	home_team_api_id	away_team_api_id	team_api_id	...	cc_shooting	cc_shooting_class	cc_positioning_class	defence_pressure	defence_pressure_class	defence_aggression	defence_agression_class	defence_team_width	defence_team_width_class	defence_line_class
0	1994692	2015-09-26 00:00:00	15722	Poland	15722	Poland Ekstraklasa	2015/2016	2186	1601	1601	...	50	Normal	Organised	43	Medium	44	Press	49	Normal	Cover
1	1994701	2015-10-02 00:00:00	15722	Poland	15722	Poland Ekstraklasa	2015/2016	1601	8023	1601	...	50	Normal	Organised	43	Medium	44	Press	49	Normal	Cover
2	1994710	2015-10-17 00:00:00	15722	Poland	15722	Poland Ekstraklasa	2015/2016	2182	1601	1601	...	50	Normal	Organised	43	Medium	44	Press	49	Normal	Cover
3	1994716	2015-10-23 00:00:00	15722	Poland	15722	Poland Ekstraklasa	2015/2016	1601	8025	1601	...	50	Normal	Organised	43	Medium	44	Press	49	Normal	Cover
4	1994723	2015-10-30 00:00:00	15722	Poland	15722	Poland Ekstraklasa	2015/2016	10265	1601	1601	...	50	Normal	Organised	43	Medium	44	Press	49	Normal	Cover

	team_long_name	team_short_name	t_attr_start_dt	t_attr_end_dt	bu_play_speed	bu_play_speed_class	bu_play_dribbling	bu_play_dribbling_class	bu_play_passing	bu_play_passing_class	...	cc_positioning_class	defence_pressure	defence_pressure_class	defence_aggression	defence_agression_class	defence_team_width	defence_team_width_class	defence_line_class	win_team_id	wins_by_attribute_period
3	Ruch Chorzów	CHO	2015-09-10 00:00:00	2016-05-25 00:00:00	47	Balanced	48.000000	Normal	38	Mixed	...	Organised	43	Medium	44	Press	49	Normal	Cover	1601	8
28	Ruch Chorzów	CHO	2014-09-19 00:00:00	2015-09-10 00:00:00	53	Balanced	48.000000	Normal	38	Mixed	...	Organised	43	Medium	44	Press	49	Normal	Cover	1601	10
53	Ruch Chorzów	CHO	2013-09-20 00:00:00	2014-09-19 00:00:00	53	Balanced	30.448598	Little	55	Mixed	...	Organised	43	Medium	44	Press	49	Normal	Cover	1601	14
83	Ruch Chorzów	CHO	2012-02-22 00:00:00	2013-09-20 00:00:00	53	Balanced	30.448598	Little	55	Mixed	...	Organised	43	Medium	44	Press	49	Normal	Cover	1601	16
134	Ruch Chorzów	CHO	2011-02-22 00:00:00	2012-02-22 00:00:00	48	Balanced	30.448598	Little	51	Mixed	...	Organised	46	Medium	48	Press	50	Normal	Cover	1601	15