Table of Contents

Matches to Coplays

Problem description

Consider the dataframe

   match_id player_id
0         0         a
1         0         b
2         0         c
3         1         a
4         1         b
5         2         c

which shows the players played in a given match. For example, it shows that [a, b, c] played in match 0, [a, b] played in match 1, and c alone played in match 2.

We want to get a coplay count

  player_id1 player_id2  size
0          a          a     2
1          a          b     2
2          a          c     1
3          b          a     2
4          b          b     2
5          b          c     1
6          c          a     1
7          c          b     1
8          c          c     2

which shows

Solution

Inner merge the initial df with itself on match_id. Then group by on player_1 and player_2. Aggregate using size() to get the weighted-edges dataframe.

$ ipython
Python 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]:
import pandas as pd

a, b, c = 'a', 'b', 'c'

df = pd.DataFrame(
{
    'match_id':  [0, 0, 0, 1, 1, 2],
    'player_id': [a, b, c, a, b, c],
})
print(df)
   match_id player_id
0         0         a
1         0         b
2         0         c
3         1         a
4         1         b
5         2         c

Do an inner merge on itself

In [2]:
df.merge(df, on='match_id', how='inner')
Out[2]:
    match_id player_id_x player_id_y
0          0           a           a
1          0           a           b
2          0           a           c
3          0           b           a
4          0           b           b
5          0           b           c
6          0           c           a
7          0           c           b
8          0           c           c
9          1           a           a
10         1           a           b
11         1           b           a
12         1           b           b
13         2           c           c

We want the columns to be player_id1, player_id2 instead of player_id_x, player_id_y

In [3]:
df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))
Out[3]:
    match_id player_id1 player_id2
0          0          a          a
1          0          a          b
2          0          a          c
3          0          b          a
4          0          b          b
5          0          b          c
6          0          c          a
7          0          c          b
8          0          c          c
9          1          a          a
10         1          a          b
11         1          b          a
12         1          b          b
13         2          c          c

Groupby on player_id1, player_id2 and get the size of each group

In [9]:
df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))\
.groupby(['player_id1', 'player_id2'])\
.size()
Out[9]:
player_id1  player_id2
a           a             2
            b             2
            c             1
b           a             2
            b             2
            c             1
c           a             1
            b             1
            c             2
dtype: int64

We want player_id1 and player_id2 as columns instead of as index.

In [11]:
df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))\
.groupby(['player_id1', 'player_id2'], as_index=False)\
.size()
Out[11]:
  player_id1 player_id2  size
0          a          a     2
1          a          b     2
2          a          c     1
3          b          a     2
4          b          b     2
5          b          c     1
6          c          a     1
7          c          b     1
8          c          c     2

See also: