Drawing Semantic Networks with Python and NetworkX library


import networkx as nx
G = nx.Graph()
G.add_node("kids",repeat=2)
G.add_node("netflix",repeat=4)
G.add_node("strategy",repeat=1)
G.add_node("app",repeat=1,)
G.add_node("chinese",repeat=2)
G.add_node("movie",repeat=1)
G.add_node("language",repeat=2)
G.add_node("alibaba",repeat=1)
G.add_edges_from([
                  ('kids','netflix'),
                  ('kids','suat'),
                  ('kids','app'),
                  ('netflix','app'),
                  ('netflix','movie'),
                  ('netflix','strategy'),
                  ('strategy','movie'),
                  ('chinese','language'),
                  ('netflix','language'),
                  ('strategy','chinese'),
                  ('netflix','alibaba')
                 ])

#colorizing according to the frequency
#https://stackoverflow.com/questions/27030473/how-to-set-colors-for-nodes-in-networkx-python
color_map = []
for node in G:
    #https://stackoverflow.com/questions/13698352/storing-and-accessing-node-attributes-python-networkx
    node_repeat = G.node[node]['repeat']
    if node_repeat > 2:
        color_map.append('red')
    else: color_map.append('orange')
    

import matplotlib.pyplot as plt
pos=nx.circular_layout(G)
nx.draw(G,node_color = color_map,with_labels = True , pos = pos)
plt.show()
Reklamlar

Data Wrangling Cases: Case #1: Emre’s Language Score Problem


userscore_typescore
jimmytoefl80
catherineielts33
fitzwilliamtoefl90
emreielts100
emretoefl99

Let’s contemplate a data set like above. Any user can have TOEFL or IELTS score. Some of them like ‘Emre’ has two score together. Let’s think we want to convert this dataset like below:

userieltstoefl
jimmyNone80
catherine33None
fitzwilliam90None
emre100None
emreNone99

What we did? We just converted two factor type of score_type as a column and get the corresponding values from to the newly generated score columns. During this conversation if person has not given type of exam we put ‘None’ and if any person has more than one exam we put each score within a new row.

How can we perform this conversion? Manually? Perfect for a few row! What we do when there is 40.000 of row? The GG should answer this question? 🙂

Use Python Language:

import pandas as pd
df = pd.read_csv("/home/satan/Masaüstü/data.csv",sep="|")
# you can either use a function disctincts kind of factors from score_type column by df['colname'].unique().tolist()
score_factors = ['toefl','ielts']
for factor in score_factors:
    df[factor] = None
# distribute the values to added columns
j = 3 # due to additional columns should start from 3'th column
score_type_column_nr = 1 # locate where is score_type column
for factor in score_factors:
    for index,row in df.iterrows():
        stype = df.iloc[index,score_type_column_nr]
        corresponding_column_no = j
        if (factor in stype):
            df.iloc[index,corresponding_column_no] = df.iloc[index,2]
        else:
            pass
    j = j+1 

That’s all. Can you solve this problem with your veteran Excel? I think no 🙂 If you can write, please explain how, to below. Don’t forget, to add how much time you spend to solve the problem including goggling.

For me I didn’t open the browser and wrote it within 15 minutes! Huh ?

Note: This scenario and its answer is original and firstly published from Dr. Suat ATAN’s blog