[Python] About Faker python package

The time I got to know Faker the python package is back to 1 year ago. When we urgently need some dummy dataset to do a Customer Profile Analysis. Since we want to create a large data sample of about 1M rows, doing it in excel became not realistic. ( I think you still can do it, but gathering fake names, addresses, also bear with a constantly frozen screen made me give up.)

Back to that time, I was just new to python(still rusty now ), this should be the first package I use other than pandas, numpy, matplotlib, I found this really is a treasury box especially when you need to create a large amount of dummy data for building dashboards or doing some analysis.

I want to briefly share what functions helped me a lot, for more functions I didn’t cover in this post, you can find it on their website or GitHub

1. Generate some fake personal information

a. Name, Job, Address

1   from faker import Faker
2
3   fake = Faker()
4   for i in range(100):
5       print(faker.name())
6       print(faker.job())
7       print(faker.address())

Michael Small
Sports therapist
69847 Andre Center Apt. 376
New Christinaburgh, OH 33083
Denise Levine
Conservation officer, historic buildings
05094 Munoz Groves Apt. 651
New Robertfort, NV 90754
Jesse Gilbert
Broadcast engineer
2218 John Island Suite 777
Mooremouth, MD 52009
Jordan Miller
Audiological scientist
398 Brown Fort
North Andrea, MO 69783
Gregory Bentley
Manufacturing engineer
Unit 7555 Box 1437
DPO AP 34981

Faker.name() will return a First Name & Last Name combination, if you just need first name or last name, you can try:

Faker.first_name() or Faker.last_name()

Also, using Faker.first_name_female() or Faker.first_name_male() can generate name for each gender you specified.

Faker.address() generates a complete address with street number, city, state, and zipcode, it also can be break down to the certain information you need by using certain syntax.

2. Generate some complete personal profile

If you need generate a personal profile, this might save a lot of time.

There are two types of profiles you can create:

a. Complete Profile

This is a comprehensive fake profile including name, website, username(this help me a lot when I tried to generate some banking system transactional data), blood type, address, birthday, gender, job, ssn, location(this is great for create a geographical customer distribution chart in Power BI or Tableau), and email.

Note: To demonstrate better, I put the generate data into DataFrame and set the display option to show all columns and rows.

1   from faker import Faker
2   import pandas as pd
3
4   fake = Faker()
5   df = []
6
7   for i in range(5):
8      df.append(list(fake.profile().values()))
9
10  df = pd.DataFrame(df, columns=fake.profile().keys())
11  
12  # Show all columns & rows for demo purpose
13  pd.set_option('display.max_columns', None)
14  pd.set_option('display.max_rows', None)
15
16  print(df)


                                             website         username  \
0  [https://wilkinson.net/, https://www.davis.com...            kgill   
1                         [http://www.williams.com/]  snyderchristine   
2  [https://www.cooley-gonzales.net/, http://www....           qparks   
3     [http://www.perry.com/, https://www.dunn.com/]      ronaldwhite   
4  [http://www.miller-krueger.com/, http://miller...  jenniferschultz   

                name blood_group  \
0  Christopher Jones          A+   
1        Amy Collins         AB-   
2      Willie Howard          B-   
3       Mark English         AB+   
4       Robin Bailey          B-   

                                           residence  \
0        14325 Tucker Dale\nLake Katherine, TN 19701   
1  57092 Morales Mountains Suite 061\nGallagherbo...   
2                   PSC 5460, Box 6713\nAPO AE 35463   
3  305 Bethany Key Apt. 046\nEast Wendyfurt, PA 8...   
4  725 Crawford Flats Apt. 566\nWest Nicole, OH 7...   

                         company  \
0        Garner, Lamb and Krause   
1              Mckinney and Sons   
2  Carlson, Mcfarland and Nguyen   
3                 Pearson-Walton   
4                   Ingram Group   

                                             address   birthdate sex  \
0    0246 Larry Via Suite 171\nAnthonytown, AR 00691  1938-09-07   M   
1            879 Campbell Glen\nClairebury, NC 51015  1964-08-07   F   
2  471 Ball Club Apt. 514\nNew Nicholasside, NM 9...  2010-11-11   M   
3  3817 Olson Way Suite 925\nSouth Michael, FL 22932  1934-09-27   M   
4         717 Eric Skyway\nEast Justinstad, DE 58337  1941-07-16   F   

                         job          ssn          current_location  \
0                 Counsellor  717-15-7333    (60.839217, 76.433347)   
1                 Cabin crew  308-31-6788  (-48.5682775, 36.649682)   
2           Ambulance person  428-24-5476   (-82.916594, 57.271374)   
3  Conference centre manager  415-91-3935   (36.159786, 158.210498)   
4           Engineer, mining  790-12-2326  (-64.4659925, 15.749643)   

                         mail  
0    johnsoncynthia@yahoo.com  
1  harrisjonathan@hotmail.com  
2         james54@hotmail.com  
3       tonifritz@hotmail.com  
4       ggonzalez@hotmail.com  

b. Simple Profile

This is more a simple profile only includes username, name, birthday, gender, address and email. Similar to the complete profile showed above just need to change fake.profile() to fake.simple_profile().

3. Fake Location / Coordinate

This is more used to generate fake locations and coordinate for create some geographical dashboard or do some analysis on ArcGIS.

1   from faker import Faker
2
3   fake = Faker()
4   for i in range(5):
5       print(fake.latlng())
6       print(fake.local_latlng())

(Decimal('-70.975188'), Decimal('98.941459'))
('38.06084', '-97.92977', 'Hutchinson', 'US', 'America/Chicago')
(Decimal('-58.7434245'), Decimal('-78.066040'))
('44.73941', '-93.12577', 'Rosemount', 'US', 'America/Chicago')
(Decimal('-24.594395'), Decimal('92.413023'))
('34.06635', '-84.67837', 'Acworth', 'US', 'America/New_York')
(Decimal('59.935252'), Decimal('173.438339'))
('30.16688', '-96.39774', 'Brenham', 'US', 'America/Chicago')
(Decimal('32.8298765'), Decimal('-170.584877'))
('33.92946', '-116.97725', 'Beaumont', 'US', 'America/Los_Angeles')

Faker.location_on_land(), very self-explanatory syntax, used to randomly generate(“select” is more appropriate) location on earth and provide the coordinate as well.

1   from faker import Faker
2
3   fake = Faker()
4   for i in range(1):
5       print(fake.location_on_land())

('55.54028', '89.20083', 'Sharypovo', 'RU', 'Asia/Krasnoyarsk')

Pretty cool isn’t it!

3. Bank Information

This is another function I used a lot, although right now the data types still very limited but it is good to be able to have some fake bank information generate when you need it.

Bank information data you can generate are Bank Account Number, Routing Number, SWIFT Code, which perfectly fit my need of creating a wire transaction reporting dashboard. You can check my dashboard to see how it works!

4. Others

There are so many other functions I didn’t cover here, like phone numbers, company, color, lorem, or even a bar code! If you are interested or have the need of creating some dummy data, this is definitely the cool tool you gonna love.

Join the ConversationLeave a reply

Your email address will not be published. Required fields are marked *

Comment*

Name*

Website