{"id":285,"date":"2021-07-24T00:11:55","date_gmt":"2021-07-24T00:11:55","guid":{"rendered":"https:\/\/leonsworkshop.com\/?p=285"},"modified":"2021-08-20T03:15:04","modified_gmt":"2021-08-20T03:15:04","slug":"python-about-faker-python-package","status":"publish","type":"post","link":"https:\/\/leonsworkshop.com\/?p=285","title":{"rendered":"[Python] About Faker python package"},"content":{"rendered":"\n<p>The time I got to know Faker the python package is back to 1 year ago. When we urgently need some dummy dataset to do a Customer Profile Analysis. Since we want to create a large data sample of about 1M rows, doing it in excel became not realistic. ( I think you still can do it, but gathering fake names, addresses, also bear with a constantly frozen screen made me give up.)   <\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Back to that time, I was just new to python(still rusty now ), this should be the first package I use other than pandas, numpy, matplotlib, I found this really is a treasury box especially when you need to create a large amount of dummy data for building dashboards or doing some analysis. <\/p>\n\n\n\n<p>I want to briefly share what functions helped me a lot, for more functions I didn&#8217;t cover in this post, you can find it on their <a href=\"https:\/\/faker.readthedocs.io\/en\/master\/index.html\">website<\/a> or <a href=\"https:\/\/github.com\/joke2k\/faker\">GitHub  <img loading=\"lazy\" decoding=\"async\" width=\"17\" height=\"17\" class=\"wp-image-251\" style=\"width: 17px;\" src=\"https:\/\/leonsworkshop.com\/wp-content\/uploads\/2021\/01\/github-fill.png\" alt=\"\" srcset=\"https:\/\/leonsworkshop.com\/wp-content\/uploads\/2021\/01\/github-fill.png 240w, https:\/\/leonsworkshop.com\/wp-content\/uploads\/2021\/01\/github-fill-150x150.png 150w\" sizes=\"auto, (max-width: 17px) 100vw, 17px\" \/><\/a><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>1. Generate some fake personal information<\/p>\n\n\n\n<p>   a. Name, Job, Address <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><meta charset=\"utf-8\">1   from faker import Faker\n2\n3   fake = Faker()\n4   for i in range(100):\n5   <meta charset=\"utf-8\">    print(faker.name())\n6<meta charset=\"utf-8\">   <meta charset=\"utf-8\">    print(faker.job())\n7   <meta charset=\"utf-8\">    print(faker.address())\n\nMichael Small\nSports therapist\n69847 Andre Center Apt. 376\nNew Christinaburgh, OH 33083\nDenise Levine\nConservation officer, historic buildings\n05094 Munoz Groves Apt. 651\nNew Robertfort, NV 90754\nJesse Gilbert\nBroadcast engineer\n2218 John Island Suite 777\nMooremouth, MD 52009\nJordan Miller\nAudiological scientist\n398 Brown Fort\nNorth Andrea, MO 69783\nGregory Bentley\nManufacturing engineer\nUnit 7555 Box 1437\nDPO AP 34981<\/code><\/pre>\n\n\n\n<p><em>Faker.name()<\/em> will return a First Name &amp; Last Name combination, if you just need first name or last name, you can try:<\/p>\n\n\n\n<p><em>Faker.first_name()<\/em> or <em>Faker.last_name()<\/em><\/p>\n\n\n\n<p>Also, using <em>Faker.first_name_female()<\/em> or <em>Faker.first_name_male()<\/em> can generate name for each gender you specified. <\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><em>Faker.address() <\/em>generates a complete address with street number, city, state, and zipcode, it also can be break down to the certain information you need by using certain syntax.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>2. Generate some complete personal profile<\/p>\n\n\n\n<p>If you need generate a personal profile, this might save a lot of time.<\/p>\n\n\n\n<p>There are two types of profiles you can create: <\/p>\n\n\n\n<p>   a. Complete Profile<\/p>\n\n\n\n<p>This is a comprehensive fake profile including name, website, username(this help me a lot when I tried to generate some banking system transactional data), blood type, address, birthday, gender, job, ssn, location(this is great for create a geographical customer distribution chart in Power BI or Tableau), and email.<\/p>\n\n\n\n<p><em>Note: To demonstrate better, I put the generate data into DataFrame and set the display option to show all columns and rows.<\/em><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1   from faker import Faker\n2<meta charset=\"utf-8\">   import pandas as pd\n3\n4<meta charset=\"utf-8\">   fake = Faker()\n5<meta charset=\"utf-8\">   df = &#91;]\n6\n7<meta charset=\"utf-8\">   for i in range(5):\n8   <meta charset=\"utf-8\">   df.append(list(fake.profile().values()))\n9\n10<meta charset=\"utf-8\">  df = pd.DataFrame(df, columns=fake.profile().keys())\n11  \n12<meta charset=\"utf-8\">  # Show all columns &amp; rows for demo purpose\n13  pd.set_option('display.max_columns', None)\n14  pd.set_option('display.max_rows', None)\n15\n16  print(df)\n\n\n                                             website         username  \\\n0  &#91;https:\/\/wilkinson.net\/, https:\/\/www.davis.com...            kgill   \n1                         &#91;http:\/\/www.williams.com\/]  snyderchristine   \n2  &#91;https:\/\/www.cooley-gonzales.net\/, http:\/\/www....           qparks   \n3     &#91;http:\/\/www.perry.com\/, https:\/\/www.dunn.com\/]      ronaldwhite   \n4  &#91;http:\/\/www.miller-krueger.com\/, http:\/\/miller...  jenniferschultz   \n\n                name blood_group  \\\n0  Christopher Jones          A+   \n1        Amy Collins         AB-   \n2      Willie Howard          B-   \n3       Mark English         AB+   \n4       Robin Bailey          B-   \n\n                                           residence  \\\n0        14325 Tucker Dale\\nLake Katherine, TN 19701   \n1  57092 Morales Mountains Suite 061\\nGallagherbo...   \n2                   PSC 5460, Box 6713\\nAPO AE 35463   \n3  305 Bethany Key Apt. 046\\nEast Wendyfurt, PA 8...   \n4  725 Crawford Flats Apt. 566\\nWest Nicole, OH 7...   \n\n                         company  \\\n0        Garner, Lamb and Krause   \n1              Mckinney and Sons   \n2  Carlson, Mcfarland and Nguyen   \n3                 Pearson-Walton   \n4                   Ingram Group   \n\n                                             address   birthdate sex  \\\n0    0246 Larry Via Suite 171\\nAnthonytown, AR 00691  1938-09-07   M   \n1            879 Campbell Glen\\nClairebury, NC 51015  1964-08-07   F   \n2  471 Ball Club Apt. 514\\nNew Nicholasside, NM 9...  2010-11-11   M   \n3  3817 Olson Way Suite 925\\nSouth Michael, FL 22932  1934-09-27   M   \n4         717 Eric Skyway\\nEast Justinstad, DE 58337  1941-07-16   F   \n\n                         job          ssn          current_location  \\\n0                 Counsellor  717-15-7333    (60.839217, 76.433347)   \n1                 Cabin crew  308-31-6788  (-48.5682775, 36.649682)   \n2           Ambulance person  428-24-5476   (-82.916594, 57.271374)   \n3  Conference centre manager  415-91-3935   (36.159786, 158.210498)   \n4           Engineer, mining  790-12-2326  (-64.4659925, 15.749643)   \n\n                         mail  \n0    johnsoncynthia@yahoo.com  \n1  harrisjonathan@hotmail.com  \n2         james54@hotmail.com  \n3       tonifritz@hotmail.com  \n4       ggonzalez@hotmail.com  <\/code><\/pre>\n\n\n\n<p>    b. Simple Profile<\/p>\n\n\n\n<p>This is more a simple profile only includes username, name, birthday, gender, address and email. Similar to the complete profile showed above just need to change <em>fake.profile()<\/em> to <em>fake.simple_profile()<\/em>. <\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>3. Fake Location \/ Coordinate <\/p>\n\n\n\n<p>This is more used to generate fake locations and coordinate for create some geographical dashboard or do some analysis on ArcGIS.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1   from faker import Faker\n2\n3   fake = Faker()\n4   for i in range(5):\n5       print(fake.latlng())\n6       print(fake.local_latlng())\n\n(Decimal('-70.975188'), Decimal('98.941459'))\n('38.06084', '-97.92977', 'Hutchinson', 'US', 'America\/Chicago')\n(Decimal('-58.7434245'), Decimal('-78.066040'))\n('44.73941', '-93.12577', 'Rosemount', 'US', 'America\/Chicago')\n(Decimal('-24.594395'), Decimal('92.413023'))\n('34.06635', '-84.67837', 'Acworth', 'US', 'America\/New_York')\n(Decimal('59.935252'), Decimal('173.438339'))\n('30.16688', '-96.39774', 'Brenham', 'US', 'America\/Chicago')\n(Decimal('32.8298765'), Decimal('-170.584877'))\n('33.92946', '-116.97725', 'Beaumont', 'US', 'America\/Los_Angeles')<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p><em>Faker.location_on_land()<\/em>, very self-explanatory syntax, used to randomly generate(&#8220;select&#8221; is more appropriate) location on earth and provide the coordinate as well.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1   from faker import Faker\n2\n3   fake = Faker()\n4   for i in range(1):\n5       print(fake.location_on_land())\n\n('55.54028', '89.20083', 'Sharypovo', 'RU', 'Asia\/Krasnoyarsk')<\/code><\/pre>\n\n\n\n<iframe src=\"https:\/\/www.google.com\/maps\/embed?pb=!1m18!1m12!1m3!1d2257.545818488071!2d89.19864131622134!3d55.54027998049697!2m3!1f0!2f0!3f0!3m2!1i1024!2i768!4f13.1!3m3!1m2!1s0x0%3A0x0!2zNTXCsDMyJzI1LjAiTiA4OcKwMTInMDMuMCJF!5e0!3m2!1sen!2sus!4v1627438109430!5m2!1sen!2sus\" width=\"400\" height=\"300\" style=\"border:0;\" allowfullscreen=\"\" loading=\"lazy\"><\/iframe>\n\n\n\n<p>Pretty cool isn&#8217;t it!<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>3. Bank Information<\/p>\n\n\n\n<p>This is another function I used a lot, although right now the data types still very limited but it is good to be able to have some fake bank information generate when you need it.<\/p>\n\n\n\n<p>Bank information data you can generate are Bank Account Number, Routing Number, SWIFT Code, which perfectly fit my need of creating a wire transaction reporting dashboard. You can check my dashboard to see how it works!<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>4. Others<\/p>\n\n\n\n<p>There are so many other functions I didn&#8217;t cover here, like phone numbers, company, color, lorem, or even a bar code! If you are interested or have the need of creating some dummy data, this is definitely the cool tool you gonna love. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>The time I got to know Faker the python package is back to 1 year ago. When we urgently need some dummy dataset to do a Customer Profile Analysis. Since we want to create a large data sample of about 1M rows, doing it in excel became not realistic. ( I think you still can [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":446,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[11,12],"tags":[15,13],"class_list":["post-285","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analytics","category-python-data-analytics","tag-dataset","tag-python"],"jetpack_featured_media_url":"https:\/\/leonsworkshop.com\/wp-content\/uploads\/2021\/08\/Slide3.png","_links":{"self":[{"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=\/wp\/v2\/posts\/285","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=285"}],"version-history":[{"count":2,"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=\/wp\/v2\/posts\/285\/revisions"}],"predecessor-version":[{"id":287,"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=\/wp\/v2\/posts\/285\/revisions\/287"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=\/wp\/v2\/media\/446"}],"wp:attachment":[{"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=285"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=285"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/leonsworkshop.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=285"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}