Growing up I had the same dentist from childhood to adulthood. My dentist’s office was run by Dentist Chung (in Vietnamese I called him Bác Sĩ Chung – which means Dr Chung translated directly) and his sister running the office.
The office was in Garden Grove, in between the Korean and Vietnamese districts. Walking in I would always smell the incense from an herbal shop next door.
The office looked like it was from the 1970s. They had this really old but comfortable couch and constantly played oldies music from the local radio station.
I distinctly recall being afraid as a kid going in, and somehow the office manager convinced me if I did a good job with a cleaning I could someday get the dentist’s chair. With my warped sense of rationalizing things, it all made sense and I calmed down.
When I was in early high school Dr Chung said, “you should think about getting braces and fixing your underbite.” I really had no issues with my teeth so far, but I entertained his proposal. I went to an Orthodontist consult.
The Orthodontist I saw was in the heart of Little Saigon – the Vietnamese area of Westminster. When coming in I waited in the reception area for a bit, where the Orthodontist admitted me in the office.
He asked me to bite down and said pretty quickly – “class 3 malocclusion jaw surgery – recommend jaw surgery.” He explained to me that the process would be to remove my 4 wisdom teeth, have braces for 2 years, have jaw surgery, and then have braces again for potentially another year. He didn’t explain much any pros and cons and ushered me away to talk to the assistant for more details.
In another room, the assistant put on some DVD of the process of dealing with class 3 malocclusions. It meant that I had an underbite, and what they needed to do is remove my wisdom teeth to make space, and then crack my jaw and move it back. The recovery would involve sewing my lips (?) and going on a liquid diet for a while.
The assistant also said that some people liked having this jaw surgery because of improvements to their facial profile. She also mentioned that some people don’t even recognize them after the surgery.
The assistant ended with saying, “You know, Vietnamese are a superstitious bunch, so some say that doing jaw surgery will change your destiny!”
Okay, count me in for not believing in superstition, but really that is the absolute worst thing you could say to a teenager after getting a quick 5 minute consult, a gory video on the treatment of an underbite, and somebody saying it will change your destiny. At that point, I decided not to go along with my surgery and went along my merry way.
A couple years after the consult, I called the dentist’s office to book an appointment, and I was told the dentist had a heart attack! He evidently had been eating a pretty unhealthy diet (I know correlation isn’t causation, but he did eat McDonalds every day for lunch). Fortunately he bounced back and started working again.
A couple years after the heart attack, he actually had another heart attack and this time fatal. When he passed away, my family went to his funeral and saw his grieving sister, and the dentist’s daughter who I talked on and off with throughout going to the office. Oddly enough, the dentist’s daughter did a quick internship at one of my old startups back in the day.
After grieving the loss of my dentist, there were the practical issues of finding a new dentist. Pausing for a moment, I remembered, my optometrist’s brother (whose parents live next to my parents) was a dentist.
Dr Tan Huynh was also in the heart of Little Saigon, but when I drove into his office, they had computers that could do x-rays, and an efficient staff to make cleanings and appointments way easier. I had realized at that point I had been going to Dr Chung’s office with technology from the stone ages.
With the first consult, the dentist asked me to bite down and asked if I considered braces and jaw surgery to fix my underbite. This time being older, I peppered him with questions on pros and cons. He mentioned my teeth were functionally fine at the moment, but in the future I might not be able to chew as my teeth wore down. Asking what age I might not be able to eat, he threw out what seemed to be the random number of 60.
Remembering the experience at my last Orthodontist, I wasn’t convinced the pros outweighed the cons (eg – cons meaning my destiny would change).
When I moved up to Vancouver, I was faced yet again on finding a new dentist. Jason recommended me to visit an office nearby, where Dr M was the first to see me.
He did the whole consult and analysis, but this time they took pictures and some fancy 360 xray scan. He brought up again my underbite, and we again talked through the pros and cons. I asked whether I should try to fix it and he said a lot of people have underbites and just manage it. Apparently when eating I push food through my back teeth immediately.
During the pandemic when I got my first cleaning I saw Dr F, a younger dentist who was one of the co-owners of the office. She saw my bite and asked if I wanted to fix my underbite, and after the 4th mention in my life it got my thinking a little bit more seriously about it. This time she said Invisalign might be able to fix it.
I came back to another appointment after my cleaning to get an Invisalign consult. They did some scans and because of the pandemic they wanted to limit in person meetings, so the follow-up was a zoom call.
Dr F proceeded to say that she initially thought she could take out my middle bottom tooth, but to fix my underbite. However she concluded Invisalign wouldn’t work and that I should see an Orthodontist.
This time I was a little more open to it because I was no longer traveling as a consultant during the pandemic, and wearing a mask would make it pretty easy to hide the fact I had braces.
Weeks later I saw the orthodontist Doctor D and they did the initial analysis. He basically said I have two options. First, remove 2 wisdom teeth, braces for 2 years, jaw surgery, then braces for 2 years. Second, remove 6 teeth, braces for 2 years and you are done.
I peppered him with questions on the pros and cons health wise, and he said functionally both would lead to the same outcome. He said the jaw surgery would change my profile, but would come with more risks since it was a surgery. I decided to go with option 2. I also wondered why when I was a teenager I wasn’t presented with a non jaw surgery option, but I’m guessing it was because the technology of modeling these outcomes weren’t available.
Dentistry is an interesting field because most dentists and orthodontists can’t tell you definitely what will happen with your teeth in the future. It all seems to be what risk/reward you are comfortable with. As part of the assessment I had to pay $500. If I chose to move forward with braces they would credit my account, but if not, I would lose it. I think sunk cost fallacy nabbed me this time as this pushed me over the edge to do a final commitment of the decision.
Before putting on braces, and I had to get 6 teeth extracted. To ease the pain, I got 3 extracted from my regular dentist, and 3 extracted from an extraction specialist doctor. Let’s just say, the extraction specialist finished the entire job in about 30 minutes while my regular dentist took about 1.5 hours. My regular dentist felt so guilty taking so long she gave me her cell phone number and told me to call her if I had any post extraction complexities.
The process of wearing braces involved seeing the orthodontist about every 6 weeks for an adjustment, and compliance to get the results you want. In addition to braces, you have a wire running across and little hooks where you can attach rubber bands to. Throughout the process compliance meant always wearing and rotating the rubber bands as needed as well as avoiding eating really hard food (like nuts), to avoid breaking your bracket. Slipping up on compliance inevitably leads to a longer total process.
When I saw my Orthodontist, I noticed I was the oldest person in the office as it was mostly kids and teenagers. Often I would overhear my Orthodontist sternly warn the kids that they weren’t being compliant by either not brushing their teeth well or not wearing their rubber bands. I would then hear parents berating their children in one sentence, and in the next sentence begged them to be compliant. It usually ended with the parents trying to guilt trip their children by saying seemingly unhelpful things like, “don’t you want good teeth like your brother.”
Getting braces as an adult is a bit different as I was on a mission to be compliant and to finish it as soon as possible because I paid for every penny of it. Psychologically, something different clicks in your head when it is your money on the line.
The initial side effects I had were teeth sensitivity. There were times hard food was difficult to eat (like sandwiches, cucumbers, steak, etc), so I bought these tiny tots scissors originally intended for parents to use when cutting food for their babies. The scissors were an obnoxious bright blue color, but I liked it because it was compact and had a case.
One time I had a business meeting with a customer at a restaurant and when the food came I took out the scissors. The person next to me paused and asked why I had bright blue scissors. I explained to him the whole dental situation, and then the whole table caught wind of the conversation and asked me about the scissors. It was a bit awkward in the beginning, but then the whole table spent the next hour talking their dental issues. Also through this experience I learned bringing scissors is generally helpful at restaurants if you are sharing food.
2.5 years later (6 months behind schedule mind you), I had an appointment to remove my braces. The doctor told me saying, “there was a lot of movement of your teeth, we probably need to install a permanent wire retainer behind your bottom front teeth”. And at the same time I was told I needed to wear a retainer full time for 6 months, and then at night time for the rest of my life.
I was a little shocked as I never really put two and two together that after the braces I would have to wear a retainer at night in my mouth for the rest of my life. I wonder if ortho offices gave a really honest assessment of the entire process (brackets breaking, wires poking, teeth sensitivity, retainers for the rest of your life), if fewer people would opt in.
Am I happy with the result? Well my underbite is fixed now, but really the whole intended health outcome of being to chew when I’m 60 might require another blog post in 20ish years.
Chip Huyen, who came out of Stanford and is active in the AI space recently wrote an article on what she learned by looking at the 900 most popular open source AI tools.
In data engineering, one of our primary usages of AI is really just prompt engineering.
Use Case 1: Data Migration
Before LLMs, when we did data migrations, we would use Amazon Schema Conversion Tool (SCT) first to help convert source schemas to a new target schema. Let us say we are going from SQLServer to Postgres, which is a major language change.
From there, the hard part begins where you need to manually convert the SQL Server SQL business logic code to Postgres. Some converters do exist out there, and I assume they work on a basis of mapping a language grammar from one to another (fun fact – I almost pursued a PhD in compiler optimization, but bailed from the program).
Now what we can do is use LLMs to convert a huge set of code from one source to a target using prompt engineering. Despite a lot of the new open source models out there, Chat GPT 4 still seems to be outperforming the competitors for the time being in doing this type of code conversions.
The crazy thing is with the LLMs, we can convert really one source system to any source system. If you try it out Java to C#, SQL to Spark SQL, all work somewhat reasonably well. In terms of predictions of our field I see a couple things progressing
Phase 1 Now:
Productivity gains of code conversions using LLMs
Productivity gains of coding itself of tools like Amazon Code Whisperer or Amazon Q or LLM of your choice for faster coding
Productivity gains of learning a new language with LLMS
Debugging stack traces by having LLMs analyze it
Phase 2: Near Future
Tweaks of LLMs to make them more deterministic for prompt engineering. We already have the ability to control creativity with the ‘temperature’ parameter, but we generally have to give really tight prompt conditions to get some of the code conversions to work. In some of our experimentations with SQL to SparkSQL, doing things like passing in the DDLs have forced the LLMs to generate more accurate information.
An interesting paper about using chain of thought with prompting (a series of intermediate reasoning steps), might help us move towards this Arxiv paper here – https://arxiv.org/abs/2201.11903
In latent.space’s latest newsletter, they mentioned a citation of a paper adding “Let’s think step by step” improved zero shot reasoning from 17 to 79%. If you happen to DM me and say that in an introduction I will raise an eyebrow. latent.space citation link
Being able to use LLMs to create data quality tests based on schemas or create unit tests based off existing ETL code.
Phase 3: Future
The far scary future is where we tell LLMs how to create our data engineering systems. Imagine telling it to ingest data from S3 into an Open Table Format (OTF) and to write business code on top of this. I kind of don’t see this for at least 10ish years though.
Open Table Format Wars – Continued
The OTF wars continue to rage with no end in site. As a refresher, there are 3 players
Apache Hudi – which came out of the Uber project
Apache Iceberg – which came out of the Netflix project
Databricks Deltalake.
As a reminder, OTFs provide features some as time travel features, incremental ETL, deletion capability, and schema evolution-ish capability depending on which one you use.
Perhaps one of the biggest subtle changes which has recently happened is that the OneTable project is now Apache X Table.
Apache X Table is a framework to seamlessly do cross-table work between any of the OTFs. I still think this is ahead of its time because I haven’t seen any project that have needs to combine multiple OTFs in an organization. My prediction though is in 5-10 years this format will become a standard to allow vendor interoperability, but it will take a while.
Apache Hudi Updates
Newsletter – https://hudinewsletter.substack.com/ – because we all can’t get enough Substack in our lives, Hudi now has a newsletter you can check for updates
Lake Formation, which still is a bit weird to me as one part of it is blue prints which we really don’t use, and the other part which deals with access control, rolled out some new changes with OTF integration and ACL
It is still kind of mess, and there still really aren’t any clear winners. There are also multiple options where you can choose to go the open source branch or with a hosted provide with One House or Tabular.
The false promises of AWS announcements – S3 Express Zones
Around Re:invent, there are always a huge set of announcements, and one stood out, S3 Express Zones. This feature would allow retrieval of data in S3 in the single digit milliseconds with the tradeoffs of storage being in one zone (so no HA). You can imagine if this actually works, datalakes can hypothetically start competing with databases as we wouldn’t need to worry about the SLA time penalty you usually get with S3.
Looking at the restrictions there are some pretty significant drawbacks.
As you can see here Hudi isn’t supported (not sure why Iceberg tables aren’t there), and Deltalake has partial support. The other consideration is this is in one zone, so you have to make sure there is a replicated bucket in a standard zone.
I kind of feel that Amazon seems to test the waters by launching not fully formed products, to get feedback from us. Unfortunately that makes us the guinea pigs
TLDR – This service works for Glue jobs, but for OTFs, it is dead in the water for the time being.
Amazon Q
I remember being in an AWS roundtable representing data consulting companies at Re:invent and a complaint from other firms was that Amazon had too many confusing products. As we are all guinea pigs in their early services, Amazon Q is no exception.
Use Case
Features
Amazon Q for Business
Chatbot for internal enterprise data that is managed by Amazon. No dev work required
Chatbot
Amazon Q For Developers
Best for doing basic coding and coding with AWS specific services.
Broader coding is probably better with a foundational model
Code completion – Code whisperer
Chat – Amazon Q
TLDR
Amazon Q for business is a managed product where you click and add data sources and a chatbot is used
Amazon Q for developers contains Code completion (Code Whisperer) AND a chat in Visual Studio IDE with, yes, Amazon Q again as the chat. Confused yet?
Quicksight Q
I’d like to confuse you one more time with the history of Quicksight Q. Pre ChatGPT and LLM craze, Quicksight Q in 2021 went Generally Available (GA) being powered by Machine Learning
After Chat GPT came out, Quicksight Q went back into Preview
With LLM integration, but they kept the same name.
One of the things to really keep in mind is as you do your solutions architecture, you need to keep in mind of a service is in preview or GA. Things in preview typically only support a couple regions and don’t have production support. If you are interested in a service in preview (like Amazon Q), it is advisable to wait a bit.
A Framework for Processing Uber Uber Large Sets of Data – Theseus
I show this diagram very often, and as a refresher, a lot of the work we do in data engineering is yellow and in red, and often involves OTFS.
Voltron Data, who created a GPU Query Engine called Theseus, put out these benchmarks comparing their framework Theseus vs Spark
Image Credit: Voltran’s Blog1 Their guidance also quite interesting
For less than 2TBs: We believe DuckDB and Arrow backed projects, DataFusion, and Polars make a lot of sense. This is probably the majority of datasets in the world and can be run most efficiently leveraging these state-of-the-art query systems.
For up to 30TBs: Well-known data warehouses like Snowflake, Google BigQuery, Databricks, and distributed processing frameworks like Spark and Trino work wonders at this scale.
For anything over 30TBs: This is where Theseus makes sense. Our minimum threshold to move forward requires 10TB queries (not datasets), but we prefer to operate when queries exceed 100TBs. This is an incredibly rare class of problem, but if you are feeling it, you know how quickly costs balloon, SLAs are missed, and tenuously the data pipeline is held together.
I mostly work in the AWS space, but it is interesting to peek on what innovations are going on outside of the space.
The author of Apache Arrow also made this observation
</= 1TB — DuckDB, Snowflake, DataFusion, Athena, Trino, Presto, etc.
You might ask, what my guidance might be for the Amazon space?
< 100 gigabytes – your run of the mill RDS or Aurora
>= 100 gigabytes – 30 TB – Redshift, or OTF
>30 TB – We haven’t really played in this space but things like Apache Iceberg are probably better candidates
TLDR – you probably will never use Theseus, so this is just a fun article.
American Privacy Rights Act (APRA)
There was a bit of surprising news coming out of the US Congress that there is now draft legislation for a national data privacy rights for Americans. In the United States, data privacy has consisted of an odd patchwork of legislation state to state (like CCPA in California or the Colorado Privacy Act). The US really is quite behind in legislation as the rest of the world has some type of privacy legislation.
Deletion Requests: Companies are required to delete personal data upon an individual’s request and must notify any third parties who have received this data to do the same.
Third-Party Notifications: Companies must inform third parties of any deletion requests, ensuring that these third parties also delete the relevant data.
Verification of Requests: Companies need to verify the identity of individuals who request data deletion or correction to ensure the legitimacy of these requests.
Exceptions to Deletion: There are specific conditions under which a company may refuse a deletion request, such as legal restrictions, implications for data security, or if it would affect the rights of others.
Technological and Cost Constraints: If it is technologically impossible or prohibitively expensive to comply with a deletion request, companies may decline the request but must provide a detailed explanation to the individual.
Frequency and Cost of Requests: Companies can allow individuals to exercise their deletion rights free of charge up to three times per year; additional requests may incur a reasonable fee.
Timely Response: Companies must respond to deletion requests within specified time frames, generally within 15 to 30 days, depending on whether they qualify as large data holders or not.
Who is this applicable for?
Large Data Holders: The Act defines a “large data holder” as a covered entity that, in the most recent calendar year, had annual gross revenue of not less than $250 million and, depending on the context, meets certain thresholds related to the volume of covered data processed. These thresholds include handling the covered data of more than 5 million individuals, 15 million portable connected devices identifying individuals, or 35 million connected devices that can be linked to individuals. Additionally, for handling sensitive covered data, the thresholds are more than 200,000 individuals, 300,000 portable connected devices, or 700,000 connected devices.
Small Business Exemptions: The Act specifies exemptions for small businesses. A small business is defined based on its average annual gross revenues over the past three years not exceeding $40 million and not collecting, processing, retaining, or transferring the covered data of more than 200,000 individuals annually for purposes other than payment collection. Furthermore, all covered data for such purposes must be deleted or de-identified within 90 days unless retention is necessary for fraud investigations or consistent with a return or warranty policy. A small business also must not transfer covered data to a third party in exchange for revenue or other considerations.
A while back I worked on a data engineering project which was exposed to the European GDPR. It was interesting because we had meetings with in-house counsel lawyers to discuss what kind of data policies they had in place. One of the facets of GDPR which is similar here is the ‘right to remove data.’
We entered some gray areas as when talking with lawyers the debate was occurring which data would be removed? Removing data from a database or data lake is clear if it contained customer data, but what if it was deeply nestled in Amazon Glacier?
I don’t really have any great answers, but if this legislation actually does pan out, it makes a strong case for large companies to use OTFs for their data lakes otherwise it would be extremely difficult to delete the data.
TLDR – if you are a solution architect, do ask what kind of data policy exposure they have. If this legislation does pass, please pay attention when you start your projects based in the USA whether this legislation is applicable to them based of the final legislation.
Fortunately uhh I don’t think anyone in our team is named Devon, but this video has been making its rounds the Internet as the first ‘AI software engineer’
Grandma and the Vietnam War When I was young, friends would visit, and there was one photo on the shelf that caught their attention in my room. It was a photo of an elderly Caucasian lady and their first question to me was, “How come you didn’t take the stock photo out of the frame?” I replied that she was my grandma, and they became even more confused because they thought I was 100% Vietnamese, so why would my grandma be white?
In 1975, my mom was working in the Saigon Adventist Hospital in Vietnam, and around April 20, conditions were deteriorating quickly in the capital with rumors that the communists would take over soon. She witnessed firsthand horrors of the war working in the emergency room, with one memory of treating an 8-year old where a grenade had exploded near his head. Due to the severity of the injuries, the child passed away and she grieved heavily with her family.
Similar to the fall of Afghanistan in 2021, people became desperate to get out of the country, especially if they were associated with the Americans. Charter flights were leaving around the clock organized by the US State Department to evacuate as many people out of Vietnam as possible.
There was one charter flight where one lady was a no-show and my mom took her place. At that moment, she left everything behind, her family, her possessions, and was only left with a US $20 dollar bill given to her.
On the other side of the Pacific Ocean, my foster grandma, Beryl Bason heard calls from the Loma Linda Adventist church about sponsoring Vietnamese refugees to help get them on their feet. My grandma ended up hosting my mom, and two of her nursing school classmates in San Diego for a couple of months where they all went back to nursing to become certified nurses to work in the US.
My mom met my dad after immigrating to the US and settled in Orange County, California where it would end up having one of the biggest populations of Vietnamese people outside of Vietnam.
Return to the home land part 1
It is a bit strange, but according to my parents, my first language was actually Vietnamese. They were afraid I would be confused learning two languages, so they switched to speaking to me in English when I was young. Since then new research has shown kids can learn multiple languages without issue. Because I never learned Vietnamese formally, my proficiency was stunted, unlike my Spanish which I consider myself semi-fluent in due to four great years of education in high school.
There was a running joke that since I really didn’t look Vietnamese, my friends bought me a 23andme genetic test to settle the issue once and for all. Funnily enough, the first result of the test showed 1% speculative European, but the results eventually tightened up to confirm that my origins are indeed 100% Vietnamese.
Most kids of immigrants make some type of ‘return to the homeland’ type journey when they are young, and for me it was when I was 13. A priority for my parents was to meet my grandparents while they were still alive and to meet my extended family.
I have to admit, that was a rough trip. My mom’s hometown of Tam Kỳ wasn’t really well developed and I remember when I had to go to the bathroom, it was in an outhouse not unlike a camping trip. Hardly anyone spoke much English so I struggled talking with my cousins and just about everyone else
For some reason, my parents also had wacky expectations that after I went to Vietnam I would become fluent in Vietnamese. Let’s just say that didn’t happen; becoming fluent in something requires understanding the foundational basics of grammar, language, and some schooling which I didn’t get.
As an adult, Vietnam was never really on my radar to visit. I befriended some Europeans at a previous job, and they told me they spent months in the country visiting every nook and cranny. Because of my last trip, my memories of Vietnam were primarily associated with seeing family, extended family, and more family so I didn’t get to see the country on my own terms (although I would have been too young anyways to make my own decisions).
Return to the homeland part 2– 27 years later
In 2014, I took a trip to Mexico with my partner and parents since they had a time share in Cabo San Lucas. It was the first international-ish trip with my parents and I was a bit nervous as I had never traveled with them as adults.
I was pleasantly surprised that we all had a great time with each other, and the trip went really well. There was even a time where I said, let’s go snorkeling in Cabo Pulmo (about 2 hours away) and they were okay with the drive. On the way the road seemed to end and I took the right fork when I should have taken the left fork and got the car stuck in the ditch in the sand. There was nobody around, and I didn’t have any cell phone data (at that time North America plans didn’t exist yet). I don’t know why exactly but my partner and my dad pushed the car while I hit the accelerator and we got the car out of the ditch. After such an incident like that, I’m happy my parents didn’t disown me after such a scary incident. Happy to say, that snorkeling was probably some of the best I’ve ever seen, with a travel adventure to back it up.
Since then I’ve been intentional to travel with my parents as much as possible as I know they are getting older, and there will be a time they won’t be able to travel anymore due to their age or health. Unfortunately this type of thinking boded true as when my dad passed away, I am grateful that I could travel with him quite a bit.
In March 2020, we scheduled a trip to visit Vietnam all together, and it would have been 27 years since I had last visited. However at around February 2020 we were getting news that schools in Vietnam were getting shut down due to Covid. At the time there weren’t hard shut downs, but out of an abundance of caution we cancelled the trip.
In early 2023, my partner proposed we go to Vietnam all together around Thanksgiving time. My parents initially were going to go, but then declined to hang out with the grandkids. Initially I was relying on my parents to do a lot of the planning, but now that we were on our own we began doing research on what to do. My parents not going ended up being fortunate, as my dad got seriously sick around the same time, so it was good he was in North America.
With the Lonely Planet book, I began researching things to do and oddly enough realized that I actually didn’t know much about Vietnam in terms of the cities, regions, or even things to do. After much research (and talking to my cousins and friends), we decided to stick mainly in the north because it was drier, and the central area was rainy season so we didn’t spend too much time there.
Vietnamese – Forked
We started our trip in the city of Hanoi, which is the capital in the north. Probably the most important expression everyone learns when visiting a foreign country is, “where is the bathroom?”. My proficiency in Vietnamese is kind of like a bunch of lego blocks in my head with limited abilities of building certain structures. Everything kind of comes out in bits and pieces, but at the least I know how to say
Cầu tiêu ở đâu? (Where is the bathroom – but literal translation is ‘where is the toilet’)
Of course the waiter in the restaurant gives me a really confused look, and says, you mean
Nhà vệ sinh ở đâu? (‘Where is the bathroom – but literal translation is ‘where is the hygienic house’)
Now I give the puzzled look as I don’t understand. I’ve never heard of a bathroom called “Nhà vệ sinh” in my life growing up in Southern California at all.
One of the things which is important to realize is when the refugees came from Vietnam to North America and other parts of the world in 1975, two Vietnamese diaspora now existed in different locations around the world. One in Vietnam the main branch, and another branch in North America.
Talking to my mom and some relatives about this, my guess is the etymology of “where is the bathroom” in Vietnam pre 1975 probably was “Cầu tiêu”, and that at some time post 1975 it changed to “Nhà vệ sinh”. Even the English language, has radically changed much in 40 years. When I talk with the newest generation in high school, there are a bunch of words that I have no idea what they mean.
Soleil Ho, talks about this how Vietnamese food in North America is basically food from Vietnam from the 1970s. Again, it makes sense because the food traditions were from the initial wave of refugees.
I talked to a good friend living in Vietnam about this, and she mentioned how Vietnamese people living in North America now have a Vietnamese – North American accent. The Vietnamese spoken in North America comes off as a lighter tone and people in the north consider this tone as someone who has studied formally.
This kind of explains a weird situation I had in a grocery market. I was asking what was in the center of this candy, and the lady remarked in Vietnamese, “Your Vietnamese is so good, how many years did you study for?”. Inside I was dying because I didn’t have the heart to tell her that I was an overseas Vietnamese (Việt kiều Mỹ). I’m sure if she knew that she probably would have instead asked, “why is your Vietnamese so bad?”
The other surprising fork which I hadn’t really considered is how different the northern dialects are different from the southern dialects. I distinctly recall several friends learning Vietnamese through duolingo, and their parents asking why they are learning the northern accent. In the eyes of the north, their dialect is often viewed as the gold standard of speaking. Below are examples of English Word – Northern Vietnamese Dialect – Southern Vietnamese Dialect as explained by one of my friends in Vietnam.
English Word
Northern Vietnamese Dialect
Southern Vietnamese Dialect
Cup
cóc
ly
Fruit
hoa quả
trái cây
10 thousand
10 nghìn
10 ngàn
Pineapple
dứa
thơm
Passion fruit
chanh leo
chanh dây
Bowl
bắt
chến
You don’t have to be proficient in Vietnamese just to see these are totally different words. I am happy to report when I later went to my parent’s hometown Đà Nẵng (central Vietnam), I did have an easier time understanding and speaking to people.
Reconciliation
Maybe it isn’t me not giving Vietnam enough credit, but the museum scene in Hanoi was unexpectedly stellar. There is the Hoa Lo Prison Museum (made famous where John McCain was held as a POW), Ethnology Museum (about minority populations), Women’s Museum, Ho Chi Minh Museum, and the list actually does go on and on quite a bit.
One day we were kind of tired so we went to a museum right next to the hotel, aptly called the Hanoi Museum. On the second floor was an exhibit on the American War. That’s right, in Vietnam, it is not called the Vietnam War, it is called the American War. Much of the panels were spent talking about Americans as the aggressors, and the breaking of the Paris peace accords. However, the last panel discussed a lot about reconciliation between Vietnam and America.
It is amazing to me that 40 years later, formerly bitter enemies are now actually allies. In the midst of some of the grand global conflicts occurring now, it is helpful to have hope that I truly believe anything is possible in terms of peace.
Hà Giang Loop Tour
From family friends we heard about a trek that was not exactly off the beaten path, but not the first thing tourists do. There is a 3 day loop tour that most people rent motorcycles and drive up the Ma Li Peng pass bordering China. Given we didn’t want to be riding motorcycles on mountain roads for 3 days, we booked a 3 day car trip.
We booked with the Yesd travel agency (huge plug for them here, I do recommend this agency), where the tour guides consist of ethnic minorities of the region. Talking to the guide, there are over 54 ethnic minorities there. I was kind of shocked, and humbled that I really knew so little of Vietnam. The drive consisted of driving through spectacular greenery on the mountain roads and we took many stops for photos.
Our accommodations were at houses belonging to the ethnic tribes in the region. We first stayed with the Tay people, and I was pretty surprised that our accommodation, despite being on the second floor of a traditional wood structure had a shower, heat pump, and pretty fast wifi.
For dinner, we would eat with the families, and it was nice in a way that it wasn’t performative. The families didn’t talk about their lives or their minority status. It was just a regular dinner you would eat with a family. With the eco-tourism booming there, it’s just easier to act like you normally do when you have visitors and not need to put on a show.
On the second day we did a tour through a Hmong village. It was rural in its location, but not in the stereotypical sense because everyone there had fancy cell phones. The Hmong people there still adhered to their traditional of marrying early and their agricultural routines.
We sat down for tea with one of the Hmong shop owners when I noticed he was wearing a French beret. I kind of asked why he was wearing a beret, and he explained that from French colonial rule, this clothing item was introduced. Ever since that time, the beret in that region became traditional wear.
Thinking about this further, it really has helped me alter my thinking of the word ‘authentic’. There is this Vietnamese Instant Pot Facebook group which constantly argues about recipes not being ‘authentic’. Some claim it is only authentic when it is from the source home country. Others argue a recipe is authentic when you cater towards the spirit of it.
Rachel Ray a little while back then sowed a bit of controversy from her pho video. 2 years ago people were raging against her, “how dare she change the recipe, that’s not authentic!”.
Coming back the Hmong person I wonder, who is the arbiter of when something is authentic and traditional? After hearing his story, I think it’s difficult to nail down authenticity to a static period of time. And that the reality is the traditions change through time, and perhaps there is no such thing as authenticity.
Ha Long Bay
After Ha Giang, we went to Ha Long Bay, a large bay of water where cruise ships sail around for about 1-2 days. We took a shuttle and from Hanoi the trip was only 2 hours. Expecting the trip to be a straight shot, I was surprised we stopped half way for a ‘restroom stop’. The restroom stop was next to a pretty fancy gift shop.
At the check-in area of Ha Long Bay, there were tons of people representing a diverse set of people all over the world similar to an airport. The check-in area was a bit chaotic and you board a small boat to join the big boat.
On our particular cruise boat, we somehow joined everyone from Portugal as there were probably only two other couples who weren’t from Europe. We befriended one couple who was from New York while watching a cooking demonstration. The guy was half Vietnamese and half Indian, which was pretty fascinating to me as I have never met anyone of that mix. He was willing to entertain some of my questions on which side he felt most comfortable with. I imagine he probably gets asked this question quite a bit being mixed.
After dinner I stood towards the front of the boat looking at the view and the cruise manager was out there also. He looked pretty young, probably early 20s so I greeted him with “Chào em” (hello little brother). In Vietnamese when you talk to anybody you address people relative to the age of your parents, or relative to your own age. Snafus always happen about people incorrectly assuming age causing people to correct you on how you should address them.
I don’t know if that was a good or bad thing, but he took that as a cue that I was in Vietnamese and began speaking to me in Vietnamese. I explained to him my Vietnamese wasn’t that great and it was my first time visiting Vietnam since I was 13. He said something really surprising during the conversation said, “Welcome home, even though you forgot a lot of your Vietnamese, it will come back to you.”
We were pretty mixed about the experience especially coming after an awesome cultural and nature experience from Ha Giang. Ha Long Bay oddly enough was one of our least favorite parts of our trip more so for the feeling of having a Disneyland type experience of tons of people and long lines. Ha Giang in the next couple years will have a faster road built from Hanoi cutting the drive time to about 2 – 2.5 hours so expect tourism in that region to increase soon.
Da Nang
The last part of our trip was to the central area of Da Nang. My uncle (my dad’s brother) still lives in Da Nang, and my cousins flew from Saigon to meet us up there. On the first night being there, they took us out to eat at a famous Banh Xeo place.
Bánh Xèo is crispy turmeric rice crepe where the ingredients are either chicken, pork, or shrimp. The name of the dish is a fun play of words roughly translating to sizzling item. Bánh is kind of a weird weird as it could refer to a lot of things depending on the word it is paired with. My cousins know I don’t eat pork and shrimp, so I was surprised when they mentioned we could order Beef Banh Xeo.
I inquired further why there is a beef banh xeo, as I never heard of that before. Apparently because of the Korean influences into the city due to tourism and business, the people of Da Nang began changing their food to cater to the Koreans. Because of that, a couple new dishes emerged like Banh Xeo Bo and Mi Quang Bo. The latter dish is traditionally made of chicken, turmeric, and shrimp, but now they made a beef version. What previously was an authentic dish of only chicken, pork, and shrimp now added beef as an option.
Next to Da Nang is Hoi An, the famous town known for its lanterns at night and on the river. On the way there, my cousins wanted me to visit a street named after my grandfather, Quach Xan. I asked what he was known for, and she replied that he was a great leader for the country. Just kind of putting things together, I know that my dad and his brother had a classic story of joining different sides of the war and that would mean my grandfather was a communist.
I had kind of mixed feelings standing there on the street taking pictures by the street sign with an ode to my family name. As a great leader to Vietnam, I wonder what he did? What is interesting about Vietnam is that half of the population was born after the war so it is something that isn’t talked about much.
In Hanoi, we did a walking tour with a student from Hanoi University. We asked what his generation thought about politics and said that mostly everybody was apolitical and most were concerned more about their economic futures rather than the political state of the country.
When we think of wars, history tends to paint everything in black and white strokes. There were the good guys and bad guys. I inquired about my grandfather a little bit with my mom and the war in general and she said really back then, both sides were rather corrupt, and her opinion was neither side really had the people’s interest at heart.
During my time in Vietnam I had a sense that even with my cousins they didn’t want to talk about the war so it is a topic I didn’t approach.
My cousin gave me a quick history of Da Nang, explaining how the city managed to develop quickly and in general Vietnam has managed to raise the living standards of most people around the country. Da Nang used to be a poorer city, but not has attracted a lot of foreign business and tourists.
After the visit to my grandfather’s street we arrived in Hoi An. Half the city had been flooded (which apparently is pretty typical), so we walked around the areas we could. Hoi An is recommended by everyone to visit via people and guidebooks, but I found the city a bit touristy, almost something akin to Venice in Italy. It was overly crowded and touristy by day and early night, but as further nightfall set, there was a charm to the city as the crowds dissipated.
Áo Dài
On the last day of the trip there, I wanted to buy an áo dài (literally translates to long dress). It is a traditional Vietnamese dress that has gone through its own evolution. Originally it was only available in blue and red, but as time has progressed different fabrics, styles, and colors have emerged to make it more modern.
My cousin took me to a mini mall to shop around, and after great negotiation, I bought one that was beautifully hand decorated with a bit more of a modern sensibility. I asked my cousins how often they wear it, and they said not often at all. People used to wear it regularly growing up, but now it really is worn on very rare occasions. They continued to tell me that people now don’t even own any, they just rent it when they need it for pictures on occasions such as weddings.
Growing up I never wore an ao dai, and I kind of wonder why now do I feel the need to reconnect with this item. I wore it for the first time two years ago at international day at church where I asked to borrow my dad’s ao dai. It was very traditionally blue, and I think it was made of silk with a hat. It was a bit janky and needed a bunch of pins to hold it in place. I still have it at home next to the new one I bought from Vietnam, and I have a bit of sadness when I look at the blue ao dai as since my dad is no longer around I’m not really sure what to do with it. My intention was to give it back to him last year, but that never came to fruition.
I wore the new áo dài during Lunar New Year at church last year. Having grown up in an all-Vietnamese church, I’ve adjusted surprisingly quickly to being one of only a few Vietnamese people in my church in Vancouver. This transition makes me wonder: am I simply flexible with my identity, or is it still core to my heart?”
Some Closing Thoughts
Overall Vietnam was not what I initially expected. English for the most part was readily accessible, and everyone super kind. Food scene wise, we mostly stuck to what was advertised on the Vietnam Lonely Planet Book (which was a good foundation of exploring), and friend’s recommendations. We found the high-end food scene in Hanoi spectacular
Book the water puppet show in advance as early as possible. It is a free reservation so there is no harm
Customs
Be aware for North America there is a requirement of a visa in advance. You can do this through an e-portal, although the site looked like it was made during the geocities era. Using one of the passport apps on your phone of taking a photo worked for us. If you plan to do multiple visits, there is a better multiple-entry visa you can get in your local country, otherwise a single entry works.
Transit In terms of getting around, the Grab app is essential. It is pretty much the Uber of Vietnam, but we had a lot of problems of using our foreign North American cards so be sure to bring multiple to get the app to work.
Sim Cards
This is a pretty big problem in the touristy cities. Sim Cards at the airport can be about 4-5x more expensive so you have to go a little bit outside of the main tourist areas to get regular rates. Be comforted though that they rip everyone off equally, locals and foreigners.
Tours
The most memorable trip we did was with Yesd, and we booked the 4 days, 3 nights with a car through Ha Giang
As in all SE Asian countries, you get a better exchange rate if you bring brand new crisp bills. Any bills which have rips or tears get deducted at money exchangers.
Fish Sauce
The Vincom markets have a totally different selection of fish sauce you get from North America. Here the fanciest one we have is Red Boat 40, but they have several brands similar to that there which are able half the price. However, bringing fish sauce home is quite the risky endeavor.
Hope I was happy to hear that throughout the country was a sense of hope. What I mean is most people seem optimistic about the future economic opportunities and that living standards have gradually improved throughout the years.
During the early pandemic, a topic came up I never thought I would need to worry about. Who would cut my hair? My Asian hair is a bit funky where it is similar to Wolverine in X-men. After a couple weeks the sides get really pointy and uncomfortable so I would go to the barber probably every 3-4 weeks.
Since all the barber shops were closed for a couple months I managed to get really cheap scissors and a Wahl peanut corded clipper, both which were incredibly hard to come by at the time. On Youtube, I must have watched this video at least 10 times for some guidance
The video describes how to cut your own hair and was probably one of the most methodical ones out there.
Coming from an engineering background, I like to plan and design as much as possible before doing an implementation. So based off the video I created a diagram on what to do.
In my whole life I never really had a need to cut my own hair as there was always a barber around somewhere. My barber at the time also was incredibly kind as we facetimed for her to give me some tips. There are obvious, but important things she taught me as when the clipper is going over your hair, if you don’t hear anything, nothing is being cut. It seems like common sense, but if you never have cut your own hair, these are important little tips.
When thinking about what I drew versus my plan, I totally deviated from my plan, (the whole numbering system, and up down didn’t work), and the result wasn’t bad, but definitely not great. In May 2020 as I walked around I noticed most men’s haircuts were rather uneven probably from everyone cutting their own hair or asking someone inexperienced to do it.
I was pretty overjoyed when I could go to my barber in June 2020. However after that I continued to cut my hair in between a professional cut so I would now only go a couple times a year.
Gradually, and after watching many many more Youtube videos I slowly got better and then started going to my barber less. After every haircut, I took some notes on what needed to be improved (mostly fading) and I also upgraded my equipment which made it easier to cut my hair
I bought a ridiculous looking umbrella haircut cape. I know I look like a dog in a cone of shame, but it was helpful to catch most of the hair
From Sallys, I bought Wahl Magic Cordless Clippers this was a huge upgrade from my Wahl Peanut as it was bigger and cordless. I read reviews on Amazon, and for some reason a lot of people would get refurbished packages so I didn’t buy it from them.
In 2020 I went to my barber 4 times, then in 2021, 2022, 2023, and 2024 I went to my barber 8 more times.
Below is the cost savings for the past 4 years
52 weeks / 3.5 = 14.8 visits (I would get haircuts about ever 3.5 weeks)
14.8 visits * $50 haircut = $740 / savings year
$740 * 4 years = $2,960 – $600 (haircuts I paid for) – $300 (cost of the clippers and scissors) = $2,060 savings for the past 4 years
Projected savings the next 41 years
Let’s say I cut my own hair for the next 41 years it would come out to a savings of
$740 * 41 = $30,340
This experience taught me to evaluate in life what should I do for myself versus what should I pay for? Anything you choose to do for yourself does have some upfront investment, but can potentially have long term financial savings down the line.
Around US Thanksgiving, my mom let me know that my dad had been diagnosed with stage 4 colon cancer with a life expectancy of 1-2 years. In mid December my dad was hospitalized and one of the doctors suggested all family members should come urgently. When I arrived, the days consisted of going back and forth to the hospital and I commandeered a corner of the cafeteria as my remote office.
I think we all have different ways of dealing with stress, and my routine was prayer, hitting the gym, and doing yoga at night like clockwork. The first days were overwhelming with uncertainty, but this routine helped me to stay focused on things on the support tasks that needed to be done for the day.
Towards the end of the hospital stay, the doctors stated that there wasn’t much they could do, revising his life expectancy of weeks to months and suggested the best course of action was to be on hospice care.
My mom and I had a meeting with the hospice staff, who explained what hospice care entails. Compared to regular medical care which is to save a life, hospice care aims to provide comfort care to prepare for a patient’s end of life within a couple months. Since my dad had been diagnosed with a terminal condition of less than 6 months, he was eligible for the care via insurance.
As I’m sure no one is surprised, my dad hated being in the hospital. The hardest thing to see was him exclaiming in English and Vietnamese that, “Ba muốn đi về (I want to go home)” There were some serious medical complications preventing him from coming home, but fortunately, one of the doctors managed to do a treatment plan that enabled him to improve just enough to leave the hospital. One of his last prayers and wishes was to be at home. Thankfully, throughout the ordeal he didn’t have any pain in the hospital.
When he was discharged, we got him set-up at home successfully with a patient bed, and oxygen machine provided by hospice care. However, it was then we realized the magnitude of care needed. Now we would need to take care of my dad 24/7 as the cancer had robbed my dad of his independence. By a fortunate turn of events, God in his good graces lined up a caretaker who was a contact at my mom’s old home church to help watch him at night. Without that caregiver, everyone would have been exhausted to the point of feeling like zombies.
All kinds of questions began to arise, requiring us to adapt quickly. How would he communicate? How would we monitor him? One of the most low tech, but successful things we got was one of those bells that you ding when your order is ready at the diner. Another was a baby monitor where we could see him when we weren’t in the room.
The first couple days were okay, where the new sounds of the house consisted of the whir of an oxygen machine to support his lungs, and an occasional ring from my dad requesting some type of service. It was kind of cute in the beginning, like a customer asking for some food or water. Things kind of seemed normal, where he would read the news on the iPad and even have short conversations with us.
Meanwhile my family were having discussions about finances and the financial implications of having a night caretaker if this lasted weeks or months, and what are the financial thresholds a family can bear.
I think as a society we don’t talk enough about end of life and what is a good way to die? When a parent isn’t able to take care of themselves, what do we do? How much do we pay? Who is going to take care of the person? What kind of hardships would be spread amongst family? Do you want to be there to witness last moments? It seems cruel to equate finances in context of one’s life, but it is an important topic to broach.
When hospice care is at home, there is an unfair burden placed on the caregiver as they are expected to help manage medication for comfort vs lucidity of a person. Each day felt like an impossibility of choices. Administering medication for comfort often results in sedation, while withholding it can lead to suffering.
I give my mom a lot of credit for having numerous conversations with my dad about advance directives and his end-of-life wishes, ensuring that the family had clear expectations about the path to be taken.
As the days progressed, one of the nurses noticed his breathing and said he was struggling. We had a frank conversation about what does it looks like when a person is about to die. She warned us that a common pattern is that people have moments like they are completely normal with a day of a burst of energy, then crash quickly.
In the first week of January he passed away, peacefully and comfortable in the evening.
Sometimes, we reflect on this situation and ask where God was in these moments, why he wasn’t healed, and why a life expectancy of 1-2 years shortened to just weeks.
My approach to prayer is to ask for a specific outcome, such as healing, but if it doesn’t occur, I trust in God’s grand plan regarding life and death.
Throughout this ordeal, there have been many small blessings. First off, his wish and desire to go home were fulfilled, and the last medical treatment plan enabled him to improve enough to leave the hospital.
The second blessing was having a caregiver to cover the nights, starting from the first night after his discharge, allowing my mom to get some sleep. We were panicking when he got home because I knew my mom was not in a condition to stay up all night.
I’ve heard that losing a parent is one of the hardest experiences a person can go through. I’m still processing the loss, but surprisingly, I don’t feel a sense of guilt. By this, I mean that while he was healthy, we, including my mom and partner, spent a lot of time traveling together and had a good relationship. However, it doesn’t mean that there isn’t pain in my heart as I wish there were 10-15 more years to enjoy with him.
In 2014, my partner and I went to Mexico with my parents and this was the first international trip I took with them as adults. We had tons of adventures where uh, I literally got the car stranded in the middle of nowhere driving to a snorkeling spot in Mexico and I was surprised at his grace because my dad was super chill about. There was another trip to Mexico where uhh the car overheated when we went to a mountain town (buyer beware caution, if you ever travel with me, expect some shenanigans). And most recently my parents always wanted to go to Europe so we went to Italy only this past June. The trip was successful in my eye because a) they didn’t lost b) they didn’t get robbed. My dad was usually quiet, but as we took some private tours through Rome, I was surprised at his inquisitive nature about the surroundings around us about Roman culture and life there.
Since 2014 I have been intentional about traveling with my parents as much as possible as I know there would be some point of time, they would not be physically able to travel due to mobility issues. But nowhere in my wildest dreams did I expect this adventuring to be cut short by a fast-moving cancer.
The loss of a parent is strange, and grief comes in waves. It’s not like the world stops, but there are certain intense memories when I reflect that cause tears to fall. It is a balance of a completely normal day, then a realization you no longer can say “my parents.”
The days after a loved one passes through involves the immediate grief to be processed, the need to support family members, but also the huge logistical task of planning a funeral. It is no different than planning a party oddly enough.
I do have to give credit to my mom, as she had an inkling of the seriousness of my dad’s condition, so she already bought a funeral package near her house. This significantly alleviated our stress, as the costs were covered and the arrangements mostly preselected.
We went to the funeral home the day after my dad’s passing, encountering a surreal experience akin to buying a car. There was the base package that was already taken care of, but if you want, you can pay more to upgrade to a fancier casket, or pay more for a fancier box for the ashes. Fortunately, the staff had the sensitivity to inquire about upgraded packages, but not to push anything.
The funeral staff also had an odd warning for us that we might get calls from scammers posing as a funeral home to verify information. A couple days later, on my dad’s cell phone there were actually a couple of voice mails from fake funeral homes asking to verify information. A part of me wanted to call them back just to see where the scam would go, but I dropped the issue. Googling around, this is actually quite a big issue
It makes me wonder when someone dies, how exactly did these people get my dad’s cell phone number? I wonder if there is a black market for death records someone on the dark web. It is quite sad that people would prey on people at their most vulnerable.
The reality is when someone passes away, financial considerations come into play. One option presented was to keep the ashes at the funeral home, which would cost an additional $4,000. My mom thought about this for briefly, but since my dad expressed wishes for ashes to be returned to Vietnam, we realized that amount of money could essentially cover a trip there.
I was responsible for creating the memorial photo slideshow during the service and wanted to give a slideshow of dad throughout the decades. Fortunately, both my mom’s and my Google Photos were active, allowing me to gather photos via image auto-tagging. I then wrote a Python script to rename the files by date for chronological organization and added date-time stamps on the bottom right of each image.
At home, my dad, and as I later learned his brother in Vietnam, were both significant packrats. A theory suggest that growing up in conditions of scarcity may lead to hoarding as a protective mechanisms.
Over several days, I sifted through my dad’s stuff, finding a collection of old cables, cell phones, old laptops, random trinkets until I discovered an old mini dv camcorder and about 30 tapes. After locating the correct power adapter, I played the tapes back and found that the tapes spanned 2007-2010. During that time, my dad had just set-up the camcorder and recorded special occasions with the camera and tripod just sitting there.
While creating the slideshow, I felt a pang of sadness at having many photos, but few videos of him. However, this discovery filled that gap with raw footage of him interacting and talking with family – precisely the memories I longed for.
In today’s society there is a strong craving for the perfect ‘Instagram’ photo, a trend that I have fallen to also. However, I’m come to realize the most important media is ones that captures the raw authenticity of one’s self without filters or edits. The videos of dad just walking around and doing mundane stuff really has brought me the most joy. Maybe it is because I am afraid as the days and years go by I might forget what he was like, his speech, and mannerisms.
The next puzzle was how do I digitize such an ancient format as the only input was firewire. After a bit of googling, I bought a PCI-e firewire card on an old windows desktop at home, and managed to digitize all the videos after a lot of fiddling. I captured it first in .avi, then converted it to h.265 which is a newer video codec.
There was an 1.5 hour video that my dad recorded which took place in Christmas 2009. The video was just of us eating and opening gifts. Maybe because of smart phones, the whole set up a tripod and record for hours during an event isn’t too popular, but maybe this is a tradition worth reviving.
Dealing with the grief has been tricky as we don’t have many playbooks in life to learn about this. However, there are two things that have stood out to me which were helpful.
When I saw a friend after the passing of my father he asked me, “do you want a normal day, or do you want to talk about it”. I never really thought about it, but as the person dealing with grief, you do want to control the narratives of how your day goes. Some days I want to talk about it, some days I don’t.
A friend sent me a text message and said, “as much as you are there supporting family, don’t forget to take time to grieve for yourself.”
Another unexpected blessing, and something to consider with elderly parents is their online accounts and access to their e-mail. I fortunately set myself as the 2 factor authentication back-up so I could log in to my dad’s e-mail to get access to important documents. Also having all his phone pin codes so I could long in was helpful, as some apps were sending SMS messages to log-in.
When I encounter people I don’t often see, I briefly mention the major news, talking about it for a minute or two, before shifting the topic. I feel it’s important for them to be aware of this change in my life, but at the same time, I’m conscious of not letting it dominate our entire conversation.
When he was diagnosed with prostate cancer the first time around and beat it, he was talking to me about this verse and how he enjoyed it.
Ecclesiastes 3
A season for everything
3 There’s a season for everything and a time for every matter under the heavens: 2 a time for giving birth and a time for dying, a time for planting and a time for uprooting what was planted, 3 a time for killing and a time for healing, a time for tearing down and a time for building up, 4 a time for crying and a time for laughing, a time for mourning and a time for dancing, 5 a time for throwing stones and a time for gathering stones, a time for embracing and a time for avoiding embraces, 6 a time for searching and a time for losing, a time for keeping and a time for throwing away, 7 a time for tearing and a time for repairing, a time for keeping silent and a time for speaking, 8 a time for loving and a time for hating, a time for war and a time for peace.
It’s kind of an interesting choice because this is not really one of those traditional Bible verses used for comfort. But this choice shows his character because at the end of his life he openly and bravely accepted his mortality. He told us, don’t worry about me, I’m ready to go.
As tough as this was to hear, this was his last gift to us accepting God’s will and to be at peace, thereby bestowing it to us when he passed away.
The current state of data engineering offers a plethora of options in the market, which can be challenging when selecting the right tool We are approaching a period where the traditional boundaries between between databases, datalakes, and data warehouses are overlapping. As always, it is important to think about what is the business case, then do a technology selection afterwards.
This diagram is simple, but merits some discussion.
Most companies in the small and medium data fields can get away with simpler architectures with a standard database powering their business applications. However it is when you get into big data and extremely large data do you want to start looking at more advanced platforms.
The Open Source Table Format Wars Revisited
A growing agreement is forming around the terminology used for Open Table Formats (OTF), also known as Open Source Table Formats (OSTF). These formats are particularly beneficial in scenarios involving big data or extremely large datasets, similar to those managed by companies like Uber and Netflix. Currently, there are three major contenders in the OTF space.
Every datalake eventually suffers from a small file problem. What this means is if you have too many files in a given S3 partition (aka file path), performance degrades substantially. To alleviate this, compaction jobs are run to merge files to bigger files to improve performance. In managed paid platforms, this is done automatically for you, but in the open source platforms, developers are on the hook in needing to do this.
I was surprised to read that now if you use Apache Iceberg tables, developers no long have to deal with compaction jobs. Now to the second announcement:
Amazon Redshift announces general availability of support for Apache Iceberg
Microsoft, the Hudi team, and the Databricks team got together to create a new standard that serves as an abstraction layer on top of an OTF. This is odd to me, because not too many organizations have these data stacks concurrently deployed.
However probably in the next couple years as abstraction layers get created on top of OTFs, this will be something to watch.
Amazon S3 Express One Zone Storage Class
Probably one of the most important but probably buried news from re:Invent was the announcement of Amazon S3 Express One Zone
With this, we can now have single digit millisecond access to data information to S3, which leads to a weird question of datalakes encroaching onto database territory if they now can meet higher SLAs. However there are some caveats with this as there is limited region availability, and it is in one zone so think about your disaster recovery requirements. This is one feature I would definitely watch.
Zero ETL Trends
Zero ETL is the ability for behind the scenes replication for Aurora, RDS, and Dynamo to replicate to Redshift. If you have a use case where Slowly Changing Dimensions (SCD) Type 1 is acceptable, these are all worth taking a look at. From my understanding, when replication occurs, there is no connection penalty to your Redshift cluster.
It is exciting to see the OTF ecosystem evolve. Apache Hudi is still a great and mature option, with Apache Iceberg now being more integrated with the AWS ecosystem.
Zero ETL has the potential to save your organization a ton of time if your data sources are supported by it.
Something to consider is that major shifts in data engineering occur every couple of months, so keep an eye on new developments, as they can have profound impacts on enterprise data strategies and operations.
As we roll towards the end of the year data engineering as expected does have some changes, but now everyone wants to see how Generative AI intersects with everything. The fits are not completely natural, as Generative AI like Chat GPT is more NLP type systems, but there are a few interesting cases to keep an eye on. Also Apache Iceberg is one to watch now there is more first class Amazon integration.
Retrieval Augmented Generation (RAG) Pattern
One of the major use cases for data engineers to understand for Generative AI is the retrieval augmented generation (rag) pattern.
There are quite a few articles on the web articulating this such as
What is important to realize is that Generative AI is only providing the light weight wrapper interface to your system. The RAG paradigm was created to help address context limitations by vectorizing your document repository and using some type of nearest neighbors algorithm to find the relevant data and passing it back to a foundation model. Perhaps LLMS with newer and larger context windows (like 100k context) may address these problems.
At the end of the data engineers will be tasked more with chunking, and vectorizing back end systems, and debates probably will emerge in your organization whether you want to roll out your own solution or just use a SAAS to do it quickly.
Generative AI for Data Engineering?
One of the core problems with generative AI is eventually it will start hallucinating. I played around with asking ChatGPT to convert CSV to JSON, and it worked for about the first 5 prompts, but by the 6th prompt, it started to make up JSON fields which never existed.
Things I kind of envision in the future is the ability to use LLMs to stitch parts of data pipelines concerning data mapping and processing. But at the moment, it is not possible because of this.
There is some interesting research occurring where a team has put a finite state machine (FSM) with LLMs to create deterministic JSON output. I know that might not seem like a big deal, but if we can address deterministic outcomes of data generation, it might be interesting to look at
1. Engineers using LLMs to help create SQL or Spark code scaffolds
2. Creation of synthetic data – basically pass in a schema and ask an LLM to generate a data set for you to test
3. Conversion of one schema to another schema-ish. This kind of works, but buyer beware
Apache Iceberg
Last year our organization did a proof of concept with Apache Iceberg, but one of the core problems, is that Athena and Glue didn’t have any native support, so it was difficult to do anything.
However on July 19, 2023 AWS quietly released an integration with Apache Iceberg & Athena into production
Since then, AWS has finally started to treat Iceberg as a first class product with their documentation and resources
Something to keep track of is that the team which founded Apache Iceberg, founded a company called tabular.io which provides hosted compute for Apache Iceberg workloads. Their model is pretty interesting because what you do is give Tabular access to your S3 buckets and they will deal with ingestion, processing, and file compaction for you. They even can point to DMS CDC logs, and create SCD Type 1, and query SCD Type 2 via time travel via a couple clicks which is pretty fancy to me.
However if you choose to roll things out yourself, expect to handle engineering efforts similar to this
One of the core criticisms of traditional datalakes the difficulty to perform updates or deletes against them. With that, we have 3 major players in the market for transactional datalakes.
Also don’t even consider AWS Governed Tables and focus on the top 3 if you have these use cases.
Redshift Serverless Updates
There has been another major silent update that now Redshift Serverless only requires 8 RPUs to provision a cluster. Before it was 32 RPUs which was ridiculously high number
8 RPUs x 12 hours x 0.36 USD x 30.5 days in a month = 1,054.08 USD
Redshift Serverless cost (monthly): 1,054.08 USD
Ra3.xlplus – 1 node
792.78 USD
So as you can see provisioned is still cheaper, but look into Serverless if
· You know your processing time of the cluster will be 50% idle
· You don’t want to deal with the management headaches
· You don’t need a public endpoint
DBT
Data Built Tool (dbt), has really been gaining a lot of popularity at the moment. It is kind of weird for this pendulum to be swinging back and forth as originally many years ago we had these super big SQL scripts running on data warehouses. That went out of fashion, but now here we are
A super interesting thing that got released is a dbt-glue adapter.
This is pretty intriguing meaning SCD Type 1 views should be replicated without doing any work of putting data through a datalake. However it is still in preview, so I can’t recommend it until it goes into general release.
In November 2020, I read the book Apollo’s Arrow after hearing Dr Christakis on NPR’s Fresh Air. Somewhere midway through this book, this paragraph stood out to me:
“Either way, until 2022, Americans will live in an acutely changed world—they will be wearing masks, for example, and avoiding crowded places. I’ll call this the immediate pandemic period. For a few years after we either reach herd immunity or have a widely distributed vaccine, people will still be recovering from the overall clinical, psychological, social, and economic shock of the pandemic and the adjustments it required, perhaps through 2024. I’ll call this the intermediate pandemic period. Then, gradually, things will return to “normal”—albeit in a world with some persistent changes. Around 2024, the post-pandemic period will likely begin.“
Given we were only 7 months into the pandemic I was intrigued at the timeline specificity of the pandemic. At that time, there was uncertainty in the media landscape on where this was going. Public health agencies also didn’t make any bold predictions about this.
Fast forward about 3 years to the current day, and this prediction has seemed to be accurate, maybe off by a factor of 6-12 months.
Hindsight of course is always 20/20, but would it be beneficial if we could identify experts who made accurate predictions? Or will uncertainty always rule the day?
When looking at data engineering for your projects, it is important to think about market segmentation. In particular, you might be able to think about it in four segments
Small Data
Medium Data
Big Data
Lots and Lots of Data
Small Data – This refers to scenarios where companies have data problems (organization, modeling, normalization, etc), but don’t necessarily generate a ton of data. When you don’t have a lot of data, different tool sets are in use ranging from low code tools to simpler storage mechanisms like SQL databases.
Low Code Tools
The market is saturated with low code tools, with an estimated 80-100 products available. Whether low code tools work for you depends on your use case. If your teams lack a strong engineering capacity, it makes sense to use a tool to help accomplish ETL tasks.
However, problems arise when customers need to do something outside the scope of the tool.
Medium Data– This refers to customers who have more data, making it sensible to leverage more powerful tools like Spark. There are several ways to solve the problem with data lakes, data warehouses, ETL, or reverse ETL.
Big Data – This is similar to medium data, but introduces the concepts of incremental ETL (aka transactional data lakes or lake houses). Customers in this space tend to have data in the hundreds gigabytes to terabytes.
Transactional data lakes are essential because incremental ETL is challenging. For example, consider an Uber ride to the airport that costs $30. Later, you give a $5 tip, and now your trip costs $35. In a traditional database, you can run some ETL to update the script. However, Uber has tons of transactions worldwide, and they need a different way of dealing with the problem.
Introducing transactional data lakes requires more operational overhead, which should be taken into consideration.
Lots and Lots of Data – Customers in this space generate terabytes or petabytes of data a day. For example, Walmart creates 10 pb of data (!) a day.
When customers are in this space, transactional data lakes with Apache Hudi, Apache Iceberg, and Databricks Deltalake are the main tools used.
Conclusion
The data space is large and crowded. With the small and lots of data sizes, the market segment is clear. However, the mid-market data space will probably take some time for winners to emerge.
In the data engineering space we have seen quite a few low code and no code tools pass through our radar. Low code tools have their own nuances as you will get to operationalize quicker, but the minute you need to customize something outside of the toolbox, you may run into problems. That’s when we usually deploy our custom development using things like Glue, EMR, or even transactional datalakes depending on your requirements.
This list is split into open source, ELT (reverse ETL), streaming, popular tools, and the rest of the tools. In the space, one thing I have been looking for is a first class open source product. I know that many of these products start as open source and end up releasing a managed version of the product. Personally of course I am all in for open source teams to make back their money somehow, but it would be ideal to have the platforms still contain an open source license.
One thing my team has been noticing is the traction dbt has been gaining in the market. It flips the paradigm a bit doing ELT (Extract Load Transform – reverse ETL), where everything is loaded to your data warehouse first then you start doing transformations on it.
Another project I have been watching with Zach Wilson’s recommendation is mage.ai. It is a pretty spiffy way of creating quick DAGs with executable Python notebooks. The platform is pretty active soliciting feedback on Slack and is one to watch for the future. Airbyte and Meltano are newer to me and I hope to take some time to play with those tools. This list is by no means the most exhaustive, but let me know if there is anything I have missed.
Opensource Tools
Product: Airbyte Description: Airbyte is an open-source data integration platform that allows users to replicate data from various sources and load it into different destinations. Its features include real-time data sync, robust data transformations, and automatic schema migrations. Link: https://airbyte.io/ Github Link: https://github.com/airbytehq/airbyte Cost: Free, with paid plans available Release Date: 2020 Number of Employees: 11-50
Product: mage.ai Description: mage.ai is a no-code AI platform that enables businesses to automate and optimize workflows. It includes features such as visual recognition, natural language processing, and predictive analytics, with a focus on e-commerce applications. Link: https://mage.ai/ Github Link: https://github.com/mage-ai Cost: Open source Release Date: 2020 Number of Employees: 11-50
Product: Meltano Description: Meltano is an open-source data integration tool that allows users to build, run, and manage data pipelines using YAML configuration files. Its features include source and destination connectors, transformations, and orchestration. Link: https://meltano.com/ Github Link: https://github.com/meltano/meltano Cost: Free, with paid options available Release Date: 2020 Number of Employees: 11-50
Product: Apache Nifi Description: Apache Nifi is a web-based dataflow system that allows users to automate the flow of data between systems. Its features include a drag-and-drop user interface, data provenance, and support for various data sources and destinations. Link: https://nifi.apache.org/ Github Link: https://github.com/apache/nifi Cost: Free Release Date: 2014 Number of Employees: N/A
Product: Apache Beam Description: Apache Beam is an open-source, unified programming model for batch and streaming data processing. It provides a simple, portable API for defining and executing data processing pipelines, with support for various execution engines. Link: https://beam.apache.org/ Github Link: https://github.com/apache/beam Cost: Free Release Date: N/A Number of Employees: N/A
ELT
Product: dbt (data build tool) Description: dbt is an open-source data transformation and modeling tool that enables analysts and engineers to transform their data into actionable insights. It provides a simple, modular way to manage data transformation pipelines in SQL, with features such as version control, documentation generation, and testing. Link: https://www.getdbt.com/ Github Link: https://github.com/dbt-labs/dbt Cost: Free, with paid options available for enterprise features and support Release Date: 2016 Number of Employees: 51-200
Streaming
Product: Confluent Description: Confluent is a cloud-native event streaming platform based on Apache Kafka that enables organizations to process, analyze, and respond to data in real-time. It provides a unified platform for building event-driven applications, with features such as data integration, event processing, and management tools. Link: https://www.confluent.io/ Github Link: https://github.com/confluentinc Cost: Free, with paid options available for enterprise features and support Release Date: 2014 Number of Employees: 1001-5000
Popular Tools
Product: Fivetran Description: Fivetran is a cloud-based data integration platform that automates the process of data pipeline building and maintenance. It provides pre-built connectors for over 150 data sources and destinations, with features such as data synchronization, transformation, and monitoring. Link: https://fivetran.com/ Github Link: https://github.com/fivetran Cost: Subscription-based, with a free trial available Release Date: 2012 Number of Employees: 501-1000
Product: Alteryx Description: Alteryx is an end-to-end analytics platform that enables users to perform data blending, advanced analytics, and machine learning tasks. It provides a drag-and-drop interface for building and deploying analytics workflows, with features such as data profiling, data quality, and data governance. Link: https://www.alteryx.com/ Github Link: https://github.com/alteryx Cost: Subscription-based, with a free trial available Release Date: 1997 Number of Employees: 1001-5000
Product: Informatica Description: Informatica is a data management platform that enables users to integrate, manage, and govern data across various sources and destinations. It provides a unified platform for data integration, quality, and governance, with features such as data profiling, data masking, and data lineage. Link: https://www.informatica.com/ Github Link: https://github.com/informatica Cost: Subscription-based, with a free trial available Release Date: 1993 Number of Employees: 5001-10,000
Product: Matillion Description: Matillion is a cloud-native ETL platform that enables users to extract, transform, and load data into cloud data warehouses. It provides a visual interface for building and deploying ETL workflows, with features such as data transformation, data quality, and data orchestration. Link: https://www.matillion.com/ Github Link: https://github.com/matillion Cost: Subscription-based, with a free trial available Release Date: 2011 Number of Employees: 501-1000
Orchestration Tools
Sure! Here are the entries for Prefect, Dagster, Airflow, Azkaban, Luigi, and Oozie:
Product: Prefect Description: Prefect is a modern data workflow orchestration platform that enables users to automate their data pipelines with Python. It provides a simple, Pythonic interface for defining and executing workflows, with features such as distributed execution, versioning, and monitoring. Link: https://www.prefect.io/ Github Link: https://github.com/PrefectHQ/prefect Cost: Free, with paid options available for enterprise features and support Release Date: 2018 Number of Employees: 51-200
Product: Dagster Description: Dagster is a data orchestrator and data integration testing tool that enables users to build and deploy reliable data pipelines. It provides a Python-based API for defining and executing pipelines, with features such as type-checking, validation, and monitoring. Link: https://dagster.io/ Github Link: https://github.com/dagster-io/dagster Cost: Free, with paid options available for enterprise features and support Release Date: 2019 Number of Employees: 11-50
Product: Airflow Description: Airflow is an open-source platform for creating, scheduling, and monitoring data workflows. It provides a Python-based API for defining and executing workflows, with features such as task dependencies, retries, and alerts. Link: https://airflow.apache.org/ Github Link: https://github.com/apache/airflow Cost: Free Release Date: 2015 Number of Employees: N/A (maintained by the Apache Software Foundation)
Product: Azkaban Description: Azkaban is an open-source workflow manager that enables users to create and run workflows on Hadoop. It provides a web-based interface for creating and scheduling workflows, with features such as task dependencies, notifications, and retries. Link: https://azkaban.github.io/ Github Link: https://github.com/azkaban/azkaban Cost: Free Release Date: 2010 Number of Employees: N/A (maintained by the Azkaban Project)
Product: Luigi Description: Luigi is an open-source workflow management system that enables users to build complex pipelines of batch jobs. It provides a Python-based API for defining and executing workflows, with features such as task dependencies, retries, and notifications. Link: https://github.com/spotify/luigi Github Link: https://github.com/spotify/luigi Cost: Free Release Date: 2012 Number of Employees: N/A (maintained by Spotify)
Product: Oozie Description: Oozie is a workflow scheduler system for managing Hadoop jobs. It provides a web-based interface for defining and scheduling workflows, with features such as task dependencies, triggers, and notifications. Link: https://oozie.apache.org/ Github Link: https://github.com/apache/oozie Cost: Free Release Date: 2009 Number of Employees: N/A (maintained by the Apache Software Foundation)
Tools
3forge – https://3forge.com/ – 3forge delivers software tools for creating financial applications and data delivery platforms.
Ab Initio Software – https://www.abinitio.com/ – Ab Initio Software provides a data integration platform for building large-scale data processing applications.
Adeptia – https://adeptia.com/ – Adeptia offers a cloud-based, self-service integration solution that allows users to easily connect and automate data flows across multiple systems and applications.
Aera – https://www.aeratechnology.com/ – Aera provides an AI-powered platform for enterprises to accelerate their digital transformation by automating and optimizing business processes.
Aiven – https://aiven.io/ – Aiven offers managed cloud services for open-source technologies such as Kafka, Cassandra, and Elasticsearch.
Ascend.io – https://ascend.io/ – Ascend.io provides a unified data platform that allows users to build, scale, and automate data pipelines across various sources and destinations.
Astera Software – https://www.astera.com/ – Astera Software offers a suite of data integration and management tools for businesses of all sizes.
Black Tiger – https://blacktiger.io/ – Black Tiger provides an open-source data pipeline framework that simplifies the process of building and deploying data pipelines.
Bryte Systems – https://www.brytesystems.com/ – Bryte Systems offers an AI-powered data platform that helps organizations manage their data operations more efficiently.
CData Software – https://www.cdata.com/ – CData Software provides a suite of drivers and connectors for integrating with various data sources and APIs.
Census – https://www.getcensus.com/ – Census offers an automated data syncing platform that allows businesses to keep their customer data up-to-date across various systems and applications.
CloverDX – https://www.cloverdx.com/ – CloverDX provides a data integration platform for building and managing complex data transformations.
Data Virtuality – https://www.datavirtuality.com/ – Data Virtuality offers a data integration platform that allows users to connect and query data from various sources using SQL.
Datameer – https://www.datameer.com/ – Datameer provides a data preparation and exploration platform that enables users to analyze large datasets quickly and easily.
DBSync – https://www.mydbsync.com/ – DBSync provides a cloud-based data integration platform for connecting and synchronizing data across various systems and applications.
Denodo – https://www.denodo.com/ – Denodo provides a data virtualization platform that allows users to access and integrate data from various sources in real-time.
Devart – https://www.devart.com/ – Devart offers a suite of database tools and data connectivity solutions for various platforms and technologies.
DQLabs – https://dqlabs.ai/ – DQLabs provides a self-service data management platform that automates the process of discovering, curating, and governing data assets.
eQ Technologic – https://www.eqtechnologic.com/ – eQ Technologic offers a data integration platform that enables users to extract, transform, and load data from various sources.
Equalum – https://equalum.io/ – Equalum provides a real-time data ingestion and processing platform that enables organizations to make data-driven decisions faster.
Etleap – https://etleap.com/ – Etleap offers a cloud-based data integration platform that simplifies the process of building and managing data pipelines.
Etlworks – https://www.etlworks.com/ – Etlworks provides a data integration platform that allows users to create and manage complex data transformations.
Harbr – https://harbr.com/ – Harbr is a data exchange platform that connects and facilitates secure data collaboration between organizations.
HCL Technologies (Actian) – https://www.actian.com/ – Actian provides hybrid cloud data analytics software solutions that enable organizations to extract insights from big data and act on them in real time.
Hevo Data – https://hevodata.com/ – Hevo Data provides a cloud-based data integration platform that enables companies to move data from various sources to a data warehouse or other destination in real time.
Hitachi Vantara – https://www.hitachivantara.com/ – Hitachi Vantara provides data management, analytics, and storage solutions for businesses across various industries.
HULFT – https://www.hulft.com/ – HULFT provides data integration and management solutions that enable businesses to streamline data transfer and reduce data integration costs.
ibi – https://www.ibi.com/ – ibi provides data and analytics software solutions that help organizations make data-driven decisions.
Impetus Technologies – https://www.impetus.com/ – Impetus Technologies provides data engineering and analytics solutions that enable businesses to extract insights from big data.
Infoworks – https://www.infoworks.io/ – Infoworks provides a cloud-native data engineering platform that automates the process of data ingestion, transformation, and orchestration.
insightsoftware – https://insightsoftware.com/ – insightsoftware provides financial reporting and enterprise performance management software solutions that help organizations improve their financial and operational performance.
Integrate.io – https://www.integrate.io/ – Integrate.io provides a cloud-based data integration platform that enables businesses to integrate and manage data from various sources.
Intenda – https://intenda.net/ – Intenda provides a data integration and analytics platform that enables businesses to unlock insights from their data.
IRI – https://www.iri.com/ – IRI provides data management and integration software solutions that enable businesses to integrate and manage data from various sources.
Irion – https://www.irion-edm.com/ – Irion provides a data management and governance platform that enables businesses to automate data quality and compliance processes.
K2view – https://www.k2view.com/ – K2view provides a data fabric platform that enables businesses to connect and manage data across various sources and applications.
Komprise – https://www.komprise.com/ – Komprise provides an intelligent data management platform that enables businesses to manage and optimize data across various storage tiers.
Minitab – https://www.minitab.com/ – Minitab is a statistical software package designed for data analysis and quality improvement.
Nexla – https://www.nexla.com/ – Nexla offers a data operations platform that automates the process of ingesting, transforming, and delivering data to various systems and applications.
OpenText – https://www.opentext.com/ – OpenText is a Canadian company that provides enterprise information management software.
Palantir – https://www.palantir.com/ – Palantir is an American software company that specializes in data analysis.
Precisely – https://www.precisely.com/ – Precisely provides data integrity, data integration, and data quality software solutions.
Primeur – https://www.primeur.com/ – Primeur is an Italian software company that offers products and services for data integration, managed file transfer, and digital transformation.
Progress – https://www.progress.com/ – Progress is an American software company that provides products for application development, data integration, and business intelligence.
PurpleCube – https://www.purplecube.ca/ – PurpleCube is a Canadian consulting company that specializes in data integration, data warehousing, and business intelligence.
Push – https://www.push.tech/ – Push is a French software company that provides products and services for data processing and analysis.
Qlik – https://www.qlik.com/ – Qlik provides business intelligence software that helps organizations visualize and analyze their data.
RELX (Adaptris) – https://www.adaptris.com/ – Adaptris, now a RELX company, offers data integration software that helps organizations connect systems and applications.
Rivery – https://rivery.io/ – Rivery is a cloud-based data integration platform that allows businesses to consolidate, transform, and automate data.
Safe Software – https://www.safe.com/ – Safe Software provides spatial data integration and spatial data transformation software.
Semarchy – https://www.semarchy.com/ – Semarchy provides a master data management platform that helps organizations consolidate and manage their data.
Sesame Software – https://www.sesamesoftware.com/ – Sesame Software offers data management solutions that simplify data integration, data warehousing, and data analytics.
SnapLogic – https://www.snaplogic.com/ – SnapLogic provides a cloud-based integration platform that enables enterprises to connect cloud and on-premise applications and data.
Software AG – https://www.softwareag.com/ – Software AG offers a platform that enables enterprises to integrate and optimize their business processes and systems.
Stone Bond Technologies – https://www.stonebond.com/ – Stone Bond Technologies offers a platform that enables enterprises to integrate data from various sources and systems.
Stratio – https://www.stratio.com/ – Stratio offers a platform that enables enterprises to process and analyze large volumes of data in real-time.
StreamSets – https://streamsets.com/ – StreamSets offers a data operations platform that enables enterprises to ingest, transform, and move data across systems and applications.
Striim – https://www.striim.com/ – Striim offers a real-time data integration and streaming analytics platform that enables enterprises to collect, process, and analyze data in real-time.
Suadeo – https://www.suadeo.com/ – Suadeo provides a platform that enables enterprises to integrate and manage their data from various sources.
Syniti – https://www.syniti.com/ – Syniti offers a data management platform that enables enterprises to integrate, enrich, and govern their data.
Talend – https://www.talend.com/ – Talend provides a cloud-based data integration platform that enables enterprises to connect, cleanse, and transform their data.
Tengu – https://tengu.io/ – Tengu offers a data engineering platform that enables enterprises to automate the process of ingesting, processing, and delivering data.
ThoughtSpot – https://www.thoughtspot.com/ – ThoughtSpot offers a cloud-based platform that enables enterprises to analyze their data in real-time.
TIBCO Software – https://www.tibco.com/ – TIBCO Software offers a platform that enables enterprises to integrate and optimize their business processes and systems.
Tiger Technology – https://www.tiger-technology.com/ – Tiger Technology offers a platform that enables enterprises to manage, move, and share their data across systems and applications.
Timbr.ai – https://timbr.ai/ – Timbr.ai provides a platform that enables enterprises to manage and process their data in real-time.
Upsolver – https://www.upsolver.com/ – Upsolver offers a cloud-native data integration platform that enables enterprises to process and analyze their data in real-time.
WANdisco – https://wandisco.com/ – WANdisco offers a platform that enables enterprises to replicate and migrate their data across hybrid and multi-cloud environments.
ZAP – https://www.zapbi.com/ – ZAP offers a data management platform that enables enterprises to integrate, visualize, and analyze their data.
Domo – https://www.domo.com/ – Domo is a cloud-native platform that gives data-driven teams real-time visibility into all the data and insights needed to drive business forward.
Dell Boomi – https://boomi.com/ – Dell Boomi is a business unit acquired by Dell that specializes in cloud-based integration, API management, and Master Data Management.
Stitch – https://www.stitchdata.com/ – Stitch is a cloud-first, open-source platform for rapidly moving data. It allows users to integrate with over 100 data sources and automate data movement to a cloud data warehouse.
Sparkflows – https://sparkflows.io/ – Sparkflows is a low-code, drag-and-drop platform that enables organizations to build, deploy, and manage Big Data applications on Apache Spark.
Liquibase – https://www.liquibase.com/ – Liquibase is an open-source database-independent library for tracking, managing, and applying database schema changes.
Shipyard – https://shipyardapp.com/ – Shipyard is a container management platform that makes it easy to deploy, manage, and monitor Docker containers.
Flyway – https://flywaydb.org/ – Flyway is an open-source database migration tool that allows developers to evolve their database schema easily and reliably across different environments.