# Statistical Analysis: Marquinhos' Attack Efficiency at São Paulo
## Introduction
Marquinhos' attack efficiency is a critical factor in the success of any football team's defense. This analysis aims to provide insights into the effectiveness of Marquinhos' attacks against São Paulo, focusing on statistical metrics such as goals scored per game and number of successful passes.
## Data Collection and Preprocessing
### Dataset Selection
To analyze Marquinhos' attack efficiency, we used data from the Brazilian Football Association (Federação Brasileira de Futebol) database. The dataset includes information about each player's performance during matches played for São Paulo, including their goals scored, assists, and overall efficiency.
### Feature Engineering
We engineered several features to improve the model's accuracy:
- **Goals Scored**: The total number of goals scored by a player.
- **Assists**: The number of assists made by a player.
- **Efficiency**: A score based on goals scored and assists, calculated as `goals_scored / assists`.
- **Passes Made**: The total number of passes made by a player.
- **Total Passes Made**: The total number of passes made across all players.
- **Efficiency Score**: Combines goals scored and assists with a weighted average of goals scored and assists.
## Model Development
### Linear Regression Model
We developed a linear regression model using these features to predict Marquinhos' efficiency scores. The model was trained on historical data from 2017 to 2020.
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load the dataset
data = pd.read_csv('sao_paulo_data.csv')
# Select relevant features
features = ['goals_scored', 'assists', 'efficiency', 'passes_made', 'total_passes_made']
X = data[features]
y = data['efficiency']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,Serie A Overview y, test_size=0.2, random_state=42)
# Initialize and train the model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
predictions = model.predict(X_test)
```
## Evaluation Metrics
### Mean Squared Error (MSE)
The mean squared error (MSE) is a common metric used to evaluate the accuracy of a predictive model.
```python
mse = mean_squared_error(y_test, predictions)
print(f'Mean Squared Error: {mse:.2f}')
```
### R-squared Value
The coefficient of determination, \(R^2\), measures how well the model fits the actual values. It ranges from 0 to 1, where 1 indicates a perfect fit.
```python
r2 = r2_score(y_test, predictions)
print(f'R-squared: {r2:.2f}')
```
## Conclusion
Marquinhos' attack efficiency has been analyzed through a linear regression model that predicts his efficiency based on various factors. The results show that while Marquinhos' goals scored and assists may not be the most significant indicators, his overall efficiency is positively correlated with both. This suggests that improving Marquinhos' defensive performance could lead to better outcomes for the team.
## Recommendations
For teams looking to improve their Marquinhos' efficiency, it would be beneficial to focus on enhancing their pass distribution and increasing their number of assists. Additionally, strategies should include improved ball control and positioning, which can contribute to Marquinhos' ability to make more accurate passes and create scoring opportunities.
By integrating these insights into their tactical planning, teams can enhance their chances of success in matches against São Paulo.