===== Matches to Coplays =====
==== Problem description ====
Consider the dataframe
match_id player_id
0 0 a
1 0 b
2 0 c
3 1 a
4 1 b
5 2 c
which shows the players played in a given match. For example, it shows that [a, b, c] played in match 0, [a, b] played in match 1, and c alone played in match 2.
We want to get a coplay count
player_id1 player_id2 size
0 a a 2
1 a b 2
2 a c 1
3 b a 2
4 b b 2
5 b c 1
6 c a 1
7 c b 1
8 c c 2
which shows
* the number of games each player played with another player when player_id1 $ \neq $ player_id2 (ex:- a and b played in 2 games, a and c played in 1 game)
* the number of games a player played if player_id1 $ == $ player_id2
==== Solution ====
Inner merge the initial ''df'' with itself on ''match_id''. Then group by on ''player_1'' and ''player_2''. Aggregate using size() to get the weighted-edges dataframe.
$ ipython
Python 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
import pandas as pd
a, b, c = 'a', 'b', 'c'
df = pd.DataFrame(
{
'match_id': [0, 0, 0, 1, 1, 2],
'player_id': [a, b, c, a, b, c],
})
print(df)
match_id player_id
0 0 a
1 0 b
2 0 c
3 1 a
4 1 b
5 2 c
Do an inner merge on itself
In [2]:
df.merge(df, on='match_id', how='inner')
Out[2]:
match_id player_id_x player_id_y
0 0 a a
1 0 a b
2 0 a c
3 0 b a
4 0 b b
5 0 b c
6 0 c a
7 0 c b
8 0 c c
9 1 a a
10 1 a b
11 1 b a
12 1 b b
13 2 c c
We want the columns to be player_id1, player_id2 instead of player_id_x, player_id_y
In [3]:
df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))
Out[3]:
match_id player_id1 player_id2
0 0 a a
1 0 a b
2 0 a c
3 0 b a
4 0 b b
5 0 b c
6 0 c a
7 0 c b
8 0 c c
9 1 a a
10 1 a b
11 1 b a
12 1 b b
13 2 c c
Groupby on player_id1, player_id2 and get the size of each group
In [9]:
df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))\
.groupby(['player_id1', 'player_id2'])\
.size()
Out[9]:
player_id1 player_id2
a a 2
b 2
c 1
b a 2
b 2
c 1
c a 1
b 1
c 2
dtype: int64
We want player_id1 and player_id2 as columns instead of as index.
In [11]:
df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))\
.groupby(['player_id1', 'player_id2'], as_index=False)\
.size()
Out[11]:
player_id1 player_id2 size
0 a a 2
1 a b 2
2 a c 1
3 b a 2
4 b b 2
5 b c 1
6 c a 1
7 c b 1
8 c c 2
See also:
* https://stackoverflow.com/questions/75537816/transform-a-dataframe-for-network-analysis-using-pandas
* It is worth going through this page in its entirety.
* I got the answer from here. I just added some intermediate steps to understand what is going on behind the scenes.
* It shows some alternative solutions which are worth exploring.
* It shows how to get the adjacency matrix
* It also shows how to visualize the result with some cool graphs produced by the networkx package.