import networkx as nx G = nx.Graph() G.add_node("kids",repeat=2) G.add_node("netflix",repeat=4) G.add_node("strategy",repeat=1) G.add_node("app",repeat=1,) G.add_node("chinese",repeat=2) G.add_node("movie",repeat=1) G.add_node("language",repeat=2) G.add_node("alibaba",repeat=1) G.add_edges_from([ ('kids','netflix'), ('kids','suat'), ('kids','app'), ('netflix','app'), ('netflix','movie'), ('netflix','strategy'), ('strategy','movie'), ('chinese','language'), ('netflix','language'), ('strategy','chinese'), ('netflix','alibaba') ]) #colorizing according to the frequency #https://stackoverflow.com/questions/27030473/how-to-set-colors-for-nodes-in-networkx-python color_map =  for node in G: #https://stackoverflow.com/questions/13698352/storing-and-accessing-node-attributes-python-networkx node_repeat = G.node[node]['repeat'] if node_repeat > 2: color_map.append('red') else: color_map.append('orange') import matplotlib.pyplot as plt pos=nx.circular_layout(G) nx.draw(G,node_color = color_map,with_labels = True , pos = pos) plt.show()
Let’s contemplate a data set like above. Any user can have TOEFL or IELTS score. Some of them like ‘Emre’ has two score together. Let’s think we want to convert this dataset like below:
What we did? We just converted two factor type of
score_type as a column and get the corresponding values from to the newly generated score columns. During this conversation if person has not given type of exam we put ‘None’ and if any person has more than one exam we put each score within a new row.
How can we perform this conversion? Manually? Perfect for a few row! What we do when there is 40.000 of row? The GG should answer this question? 🙂
Use Python Language:
import pandas as pd df = pd.read_csv("/home/satan/Masaüstü/data.csv",sep="|") # you can either use a function disctincts kind of factors from score_type column by df['colname'].unique().tolist() score_factors = ['toefl','ielts'] for factor in score_factors: df[factor] = None # distribute the values to added columns j = 3 # due to additional columns should start from 3'th column score_type_column_nr = 1 # locate where is score_type column for factor in score_factors: for index,row in df.iterrows(): stype = df.iloc[index,score_type_column_nr] corresponding_column_no = j if (factor in stype): df.iloc[index,corresponding_column_no] = df.iloc[index,2] else: pass j = j+1
That’s all. Can you solve this problem with your veteran Excel? I think no 🙂 If you can write, please explain how, to below. Don’t forget, to add how much time you spend to solve the problem including goggling.
For me I didn’t open the browser and wrote it within 15 minutes! Huh ?
Note: This scenario and its answer is original and firstly published from Dr. Suat ATAN’s blog